Monthly Archives: April 2014

HOTEL alfa sierra kilo echo lima×2

From earlier posts, you know that I’ve been learning Haskell. Now it’s one thing to memorize some syntax rules, and be able to discuss intelligently the semantics and ramifications of some particular language feature, even typing a line or two at an interactive prompt. It’s another to face an empty page in a text editor with the intent of writing a program.

For whatever reason, it occurred to me to do something with the spoken (phonetic) alphabet.  It’s good to pick something you easily know how to do, as to concentrate on the mechanical details of the tools instead of the problem.

The Main Loop

To that end, I started with looking up how to fetch command-line arguments (simple output was already covered from hello world) and spent a long time pondering just how to structure the loop over the arguments.

Pretty basic, right?  With pure functions, given something to do on one input value (say, foo) and an ordinary list of input values, you can process all of them using any of several ways of saying “map”.

map foo values
foo <$> values
mapF foo values
mapM foo values
forM values foo
ff

The problem is that both the thing I want to do on each value and the function that obtains the list of values is “in” the IO monad. So instead of one token to put these two things together (whether a word like map or a infix operator like <$>), I need two separate tokens for monad composition and the mapping, and had further trouble combining the two things.

Separate steps for the monad and the map isn’t too unreasonable, but is just a little bit. There are different levels of wrapping, like having different levels of indirection. f $ g is one way to write simple function application, and the entire thing g is the argument of one call to f; f <$> g uses a different symbol to mean apply f to what’s inside g treated as a container, so f is called multiple times with values taken from inside g.  There are any number of different wrapping layers and different things can be wrapped in different numbers of layers, so having unique marks for each possibility is prohibitive.  There are ways to bump things up in general, so f g is plain function application and liftM f g tells f to look inside its argument.

So, given that the values I have are double wrapped and I’m targeting the middle layer (neither the innermost nor the outermost) it seems reasonable that an extra mark of some kind is needed to specify, in addition to a mark that says “put these two things together”, making two in all.

The other combining problem is harder to explain:  Given one function (foo) that produced something that you wanted to feed to another function (bar), simply combining them is a no-brainer, you would think.  In a C-like language, bar(foo()) is pretty obvious.  In Haskell you don’t even use the parenthesis, so it’s just bar foo.

You could use an intermediate variable name, like

s= foo();
bar (s);

but you don’t have to.  In fact, that’s one of the things wrong with using so-called OUT parameters, is that it ruins the expressiveness and you can’t simply chain them together.  And this kind of chaining is very much embraced by functional style and Haskell in particular.

If foo is actually getArgs, and bar is a function that takes a list of strings (don’t even worry about the mapping at this point—just write a function that explicitly takes a list), you can’t do it!

main = do
    s <- getArgs
    mapM_ body s

badmain = mapM_ body getArgs

Writing that without the named intermediate value and do-block, the “obvious” main = mapM_ body getArgs doesn’t work.

When I was pondering this as I was working on it, I concluded that the only thing that can be done to getArgs is the bind operation (>>=).  Now, maybe that’s not quite true if common helpers exist that themselves use bind, such as the liftM mentioned earlier.  Ideally I’d somehow mark the argument that is wrapped too much for the function, as the first argument (the body) is fine and there may be variations like liftM2 but not one that skips lifting some arguments and then lifts others.  Would ap or <*> mixed with $ work here?  That’s something to try next time.

Meanwhile, my first concern was with writing the for-each construct in a clear manner, without separating body into its own named function since it isn’t that complicated.  I wondered what the usual common idiom might be?

No matter how you arrange the pieces though, I needed to name the parameter for the individual value/iteration being processed.  In common imperative “structured” languages, naming the loop control variable is a basic part of the syntax of a for-each looping construct.  Using a higher-order function instead, writing a lambda expression for the body was a bit clunky.  In any case, the layout started giving me trouble, so I gave up on that and just made body a separate named function.

What I ended up with is (starts with, anyway)

main =
    getArgs >>= mapM_ body

and with the body being a single word, the arrangement of the components (looping construct, list of inputs, body) is not nearly so important.

 Look-up Table

Haskell’s go-to data structure is a list.  There is no simple way to make a O(1) random-access array.  I mean, there are in fact such arrays as libraries, but to populate it you need a list anyway.  Such an array would help random-access speed, but only makes creating the table more complicated.

For key-value lookup, the basic feature is a function called lookup.  It operates on a list of (key,value) pairs, and I really didn’t want to write out the keys a through z and pairs, but rather just list the values (the alfa bravo keywords) in order.  The zip function took care of that.

Since it is most definitely not taking any advantage of contiguous range of keys, I decided to use that as a feature and add keywords for some odd punctuation marks and such.

phon_table= nato_table ++ digit_table ++ 
    [('.',"Period"), ('-',"Dash"), ('#',"TicTacToe"), ('_',"Underscore"), (' ',"Space"), ('@',"Snail") ]

nato_table= zip ['a'..'z'] ["alfa", "bravo", "charlie", "delta", "echo", "foxtrot", "golf", "hotel",
                "india", "juliett", "kilo", "lima", "mike", "november", "oscar", "papa", "quebec",
                "romeo", "sierra", "tango", "uniform", "victor", "whiskey", "xray", "yankee", "zulu" ]

digit_table=  -- FAA style 
    zip ['0'..'9'] ["Zero","One","Two","Three","Four","Five","Six","Seven","Eight","Niner"]

Thinking ahead, variations can be accommodated by building the master list out of different pieces.  For example, use ITU instead of FAA code words for the digits, and add application-specific words.

Function Decomposition

Functions need to be written in small pieces.  In real-world code (using common imperitive languages) I’ve seen people write what I call meandering functions, which is a lack of decomposition: one defined unit (procedure or function in the high-level language) does stuff, then does more stuff, then goes on to work on something else, etc.  Often they are much much too long, running hundreds of lines of source code.

In Haskell, pure functions don’t have multiple statements.  That makes it hard to meander, or, without being adept at functional programming yet, even writing more than one line!

I also see lots of code that is not meandering in that the function does “one thing”, but yet is not decomposed well in that the thing it’s doing is obscured (even overwhelmed) with lower-level details that should be unimportant to “the thing” being expressed in this function.  Many programmers have a hard time with that, apparently not recognizing or even understanding what that is.

Those same programmers would more naturally break things up more in Haskell, since it is awkward to stuff more detail inside the argument of a function.  Writing a list of statements, it’s all to easy to stop what you’re really doing and issue lower-level detail work, and then put in more higher level work.  When everything is an argument to a single function, it’s more in-your-face that you are interrupting your expression with details that should be elsewhere.

Haskell facilitates writing very small functions with features such as writing multiple branches as separate top-level functions, and nested functions which close over the variables in their parent scope.

So is it “smaller is better”, period?  How much logic should go into one function as subexpressions rather than separate named functions?  Since the stuff in a where block have the same meaning as being in-line, thanks to the parent variables still being available, naming many individual pieces of a function seems to be the way to go.  What to not break up would be logical constructs that are read as a unit, and idiomatic uses that are recognized as one phrase but would be disguised if broken up.

When I first started to formulate some code, I ran into trouble with multiple nestings of local definitions, or trying to combine them in ways that either were not supported or couldn’t work in the layout rules.

A harder thing to know how to get just right is when to make local nested functions vs top-level functions.  My instinct is to nest everything and not expose internal details, since there isn’t any level of organization and name hiding smaller than the module.

Once I got past the initial main-loop panic, most of my time was spent fiddling with just how to decompose the functions.  As an anchor point, it made sense that the principle function solving this problem would be a named function and the driver (getting the args and displaying the results) would be a sample usage of that function.  In a real project, that is what I’d be interested in, and what I have here as main would just be test code, as the function would be called by other parts of the program.

One interesting detail is that my function phon processes one character, producing one code word as a result.  It doesn’t provide a function for processing an entire string, which is something you’d probably add in typical programming languages.

Given the function phon, you can call it on a character using ordinary function application, but it’s just as easy to process an entire string so there is no need to provide another function for that!

codeword = phon ch -- mundane call
codeword = phon $ ch -- means the same thing
codewords = phon <$> s -- process whole string of input
codewords = map phon s -- same thing

It’s also especially important in Haskell to keep the physical realization of the output separate from the computation. In most common programming languages, it would be just as easy to have the principle function print the result, and multiple calls would print successive items. In Haskell it is not just as easy — The function phon in this case cannot call putStr or anything like that.

In this toy program, where phon is called right from main and nothing else is happening, writing phon as an IO function would not seem awkward.  But it’s still a deliberate act, and you’d certainly notice when making a recursive call (or trying to).

Of course, that’s good in general.  In my toy program printing the results was part of the specification, but in real projects such a function will be consumed by other code, so the result needs to be in a form that’s still internalized and can be handled however the caller wants.  Unfortunately, it also makes it difficult to add debugging trace statements to existing code, but that’s another story.

Here is the complete program, as I left it.

module Main (main) where

import System.Environment (getArgs)
import Control.Applicative
import Data.Char


main =
    getArgs >>= mapM_ body
    where
        body s = do 
            putStrLn $ announce s 
            putStr $ format $ phon <$> s
        announce s = "string \"" ++ s ++ "\" spoken as:"

format =
    concatMap decorate
    where decorate w =  "\t" ++ w ++ "\n"


phon :: Char -> String
phon c | isUpper c = toUpper <$> (phon $ toLower c)
phon c = 
    maybe (quoted c) (id) result
    where result= lookup c phon_table
          quoted c= ['"',c,'"']
    
    
phon_table= nato_table ++ digit_table ++ 
    [('.',"Period"), ('-',"Dash"), ('#',"TicTacToe"), ('_',"Underscore"), (' ',"Space"), ('@',"Snail") ]

nato_table= zip ['a'..'z'] ["alfa", "bravo", "charlie", "delta", "echo", "foxtrot", "golf", "hotel",
                "india", "juliett", "kilo", "lima", "mike", "november", "oscar", "papa", "quebec",
                "romeo", "sierra", "tango", "uniform", "victor", "whiskey", "xray", "yankee", "zulu" ]
                
digit_table=  -- FAA style 
    zip ['0'..'9'] ["Zero","One","Two","Three","Four","Five","Six","Seven","Eight","Niner"]

I welcome comments from experienced Haskell programmers on these issues, and overall style and idioms in general.

A Spriral Approach

I’ve heard of spiral approaches to learning, meaning that material is covered without much detail and making successive passes with more detail.

My recent experience makes me think of a slightly different spiral:

You understand, then you learn more, and then you are more confused than ever.

zeta

Maybe that’s why I like this particular representation of the Riemann ζ function?

−7/4 and Interactive Play

I saw this video concerning iterative function sequences

and thought that would be a good example to play around with in Haskell.  Haskell has arbitrary precision integers, and also has rational numbers, standard.  (I hesitate to say “built in” because, like many languages, the true built-in core is minimal with layers of library code on top of that.  But it was available out of the box.)

It also has an iterate function. What’s interesting is that the result is naturally presented as an infinite list. Other languages with abstract iteration or generators will provide some kind of interface to make your custom thing handleable in a common way. Perl even has a tie feature to make your custom thing seem exactly like the built-in collections, which pushes the interface contract down to include native datatypes, rather than being layered on top of them.

But in Haskell, the generator is the native list. Rather than make some custom generating code support the iteration mechanism, the same old list holds what appears to be the complete result, as if done as an earlier step.

In some purely imperative language, you might ordinarily do things like:

  1. Select the common, standard, container type with the appropriate properties (e.g. linear, sequential).
  2. Run a computation loop and fill the container with the results.
  3. Pass the container to another procedure, which can use the contents for other work (e.g. print them to a file).

Suppose that was unsuitable for some reason and you wanted to delay the computations of step 2, running the producer in parallel with the consumer, or not bothering until/unless a value was actually used by step 3, or not knowing how many items will really be needed.  You would have to explicitly program in a more complex sequence of commands to achieve that.

Instead of a plain common collection (e.g. a C++ vector) you’d need something else.  You need to arrange producer/consumer threads with a work queue between them, or callbacks, or something.  Step 2 would be preparing the whatsit, and step 3 would still be easy enough using standard iterators to abstract the actual details of the collection/stream/whatever.

Now look at it in Haskell.  Steps 1 is not much to think about as the list is king.  Step 2 is:

lz = iterate (λx→2*x+1) 0

Later, other code can get values from this list, e.g.

take 10 lz

produces a result of

[0,1,3,7,15,31,63,127,255,511]

The value lz is just another list of Integer, as far as the language is concerned. But clearly it could not have been filled in step 2 because it is as long as I need it to be, and never specified a size when filling it. It is actually an infinite list!

This certainly looks like the simple approach of: create the values, then use them. However, it actually doesn’t do any work when lz was defined, and only computes values when subsequent code reads them. Here I asked for the first 10 elements, and the list then went ahead and realized those values. If I ask for the value at index 30 (lz!!30) it needs to run the computation all the way to that element before telling me it’s 1,073,741,823.

This illustrates that the program here is not imperative. I didn’t program in “do this step, do the next step, etc.” Rather, I just specified that I wanted the 30th item of the list, and it did the work of figuring out just how to do that based on the definitions. It’s more like a spreadsheet: cells are not computed in order, but based on what they depend on. Haskell uses a top-down approach: when a specific value is really needed, it looks at the formula for it. Gathering the values needed for that function will set off other chains of downward dependent calculations.

Interactive Play

Using the GHCi interactive shell, it’s easy to play with little programs like this.  A more difficult problem featured in the video is

zz = abs <$> numerator <$> iterate (λx→x^2−7%4) 0

Trying a line of code, then recalling it to adjust and submit again, reminded me of a time long ago when microcomputers ran BASIC (or assembly language) and were attached to the household television.

Interactive play is the way to really learn something, and with a more complex build environment or larger programs you still try and do that.  It’s the combination of a direct Read-eval-print-loop and the ability to write interesting things in a single line of text that makes this experience especially efficient.

Want to try it yourself without installing anything?  Try Haskell! is a browser-based experience that invites you to “type these out and see what happens”  (psst: copy/paste is faster if you really can’t wait to find out).

Back to the Sequence

The actual sequence generated by iteration is rational numbers, but we want to ignore the denominator.  Then, I saw that the sign of the number was preserved with numerator, so the abs stripped that off.

The result starts off as [0,7,21,7,114639] and then the numbers get very large. Index 7 is 595096023596888261906480183623645954687, and it quickly gets overwhelming and impractical to display.

  • n=4 has 6 digits (is exactly 114639)
  • n=7 has 39 digits
  • n=28 has 80,807,125 digits
  • n=29 has 161,614,249 digits
  • n=30 has 323,228,496 digits
  • n=31 has 646,456,993 digits

That’s the largest I could compute with this program.  The Windows version compiles to a 32-bit program, but I used editbin to set the 3G flag, so it had 3G of user address space instead of 2.  That allowed it to computer n=30.  Running on a Mac, which is 64-bit, only increased that by one more before getting Segmentation fault.  I suspect there are things in the implementation that top out at 4G, even when more memory is available.