From earlier posts, you know that I’ve been learning Haskell. Now it’s one thing to memorize some syntax rules, and be able to discuss intelligently the semantics and ramifications of some particular language feature, even typing a line or two at an interactive prompt. It’s another to face an empty page in a text editor with the intent of writing a program.
For whatever reason, it occurred to me to do something with the spoken (phonetic) alphabet. It’s good to pick something you easily know how to do, as to concentrate on the mechanical details of the tools instead of the problem.
The Main Loop
To that end, I started with looking up how to fetch command-line arguments (simple output was already covered from hello world) and spent a long time pondering just how to structure the loop over the arguments.
Pretty basic, right? With pure functions, given something to do on one input value (say, foo) and an ordinary list of input values, you can process all of them using any of several ways of saying “map”.
map foo values foo <$> values mapF foo values mapM foo values forM values foo ff
The problem is that both the thing I want to do on each value and the function that obtains the list of values is “in” the IO monad. So instead of one token to put these two things together (whether a word like
map or a infix operator like
<$>), I need two separate tokens for monad composition and the mapping, and had further trouble combining the two things.
Separate steps for the monad and the map isn’t too unreasonable, but is just a little bit. There are different levels of wrapping, like having different levels of indirection.
f $ g is one way to write simple function application, and the entire thing
g is the argument of one call to
f <$> g uses a different symbol to mean apply
f to what’s inside
g treated as a container, so
f is called multiple times with values taken from inside
g. There are any number of different wrapping layers and different things can be wrapped in different numbers of layers, so having unique marks for each possibility is prohibitive. There are ways to bump things up in general, so
f g is plain function application and
liftM f g tells f to look inside its argument.
So, given that the values I have are double wrapped and I’m targeting the middle layer (neither the innermost nor the outermost) it seems reasonable that an extra mark of some kind is needed to specify, in addition to a mark that says “put these two things together”, making two in all.
The other combining problem is harder to explain: Given one function (
foo) that produced something that you wanted to feed to another function (
bar), simply combining them is a no-brainer, you would think. In a C-like language,
bar(foo()) is pretty obvious. In Haskell you don’t even use the parenthesis, so it’s just
You could use an intermediate variable name, like
s= foo(); bar (s);
but you don’t have to. In fact, that’s one of the things wrong with using so-called OUT parameters, is that it ruins the expressiveness and you can’t simply chain them together. And this kind of chaining is very much embraced by functional style and Haskell in particular.
foo is actually
bar is a function that takes a list of strings (don’t even worry about the mapping at this point—just write a function that explicitly takes a list), you can’t do it!
main = do s <- getArgs mapM_ body s badmain = mapM_ body getArgs
Writing that without the named intermediate value and do-block, the “obvious”
main = mapM_ body getArgs doesn’t work.
When I was pondering this as I was working on it, I concluded that the only thing that can be done to
getArgs is the bind operation (
>>=). Now, maybe that’s not quite true if common helpers exist that themselves use bind, such as the
liftM mentioned earlier. Ideally I’d somehow mark the argument that is wrapped too much for the function, as the first argument (the body) is fine and there may be variations like
liftM2 but not one that skips lifting some arguments and then lifts others. Would
<*> mixed with
$ work here? That’s something to try next time.
Meanwhile, my first concern was with writing the for-each construct in a clear manner, without separating body into its own named function since it isn’t that complicated. I wondered what the usual common idiom might be?
No matter how you arrange the pieces though, I needed to name the parameter for the individual value/iteration being processed. In common imperative “structured” languages, naming the loop control variable is a basic part of the syntax of a for-each looping construct. Using a higher-order function instead, writing a lambda expression for the body was a bit clunky. In any case, the layout started giving me trouble, so I gave up on that and just made body a separate named function.
What I ended up with is (starts with, anyway)
main = getArgs >>= mapM_ body
and with the body being a single word, the arrangement of the components (looping construct, list of inputs, body) is not nearly so important.
Haskell’s go-to data structure is a list. There is no simple way to make a O(1) random-access array. I mean, there are in fact such arrays as libraries, but to populate it you need a list anyway. Such an array would help random-access speed, but only makes creating the table more complicated.
For key-value lookup, the basic feature is a function called lookup. It operates on a list of
(key,value) pairs, and I really didn’t want to write out the keys a through z and pairs, but rather just list the values (the alfa bravo keywords) in order. The
zip function took care of that.
Since it is most definitely not taking any advantage of contiguous range of keys, I decided to use that as a feature and add keywords for some odd punctuation marks and such.
phon_table= nato_table ++ digit_table ++ [('.',"Period"), ('-',"Dash"), ('#',"TicTacToe"), ('_',"Underscore"), (' ',"Space"), ('@',"Snail") ] nato_table= zip ['a'..'z'] ["alfa", "bravo", "charlie", "delta", "echo", "foxtrot", "golf", "hotel", "india", "juliett", "kilo", "lima", "mike", "november", "oscar", "papa", "quebec", "romeo", "sierra", "tango", "uniform", "victor", "whiskey", "xray", "yankee", "zulu" ] digit_table= -- FAA style zip ['0'..'9'] ["Zero","One","Two","Three","Four","Five","Six","Seven","Eight","Niner"]
Thinking ahead, variations can be accommodated by building the master list out of different pieces. For example, use ITU instead of FAA code words for the digits, and add application-specific words.
Functions need to be written in small pieces. In real-world code (using common imperitive languages) I’ve seen people write what I call meandering functions, which is a lack of decomposition: one defined unit (procedure or function in the high-level language) does stuff, then does more stuff, then goes on to work on something else, etc. Often they are much much too long, running hundreds of lines of source code.
In Haskell, pure functions don’t have multiple statements. That makes it hard to meander, or, without being adept at functional programming yet, even writing more than one line!
I also see lots of code that is not meandering in that the function does “one thing”, but yet is not decomposed well in that the thing it’s doing is obscured (even overwhelmed) with lower-level details that should be unimportant to “the thing” being expressed in this function. Many programmers have a hard time with that, apparently not recognizing or even understanding what that is.
Those same programmers would more naturally break things up more in Haskell, since it is awkward to stuff more detail inside the argument of a function. Writing a list of statements, it’s all to easy to stop what you’re really doing and issue lower-level detail work, and then put in more higher level work. When everything is an argument to a single function, it’s more in-your-face that you are interrupting your expression with details that should be elsewhere.
Haskell facilitates writing very small functions with features such as writing multiple branches as separate top-level functions, and nested functions which close over the variables in their parent scope.
So is it “smaller is better”, period? How much logic should go into one function as subexpressions rather than separate named functions? Since the stuff in a where block have the same meaning as being in-line, thanks to the parent variables still being available, naming many individual pieces of a function seems to be the way to go. What to not break up would be logical constructs that are read as a unit, and idiomatic uses that are recognized as one phrase but would be disguised if broken up.
When I first started to formulate some code, I ran into trouble with multiple nestings of local definitions, or trying to combine them in ways that either were not supported or couldn’t work in the layout rules.
A harder thing to know how to get just right is when to make local nested functions vs top-level functions. My instinct is to nest everything and not expose internal details, since there isn’t any level of organization and name hiding smaller than the module.
Once I got past the initial main-loop panic, most of my time was spent fiddling with just how to decompose the functions. As an anchor point, it made sense that the principle function solving this problem would be a named function and the driver (getting the args and displaying the results) would be a sample usage of that function. In a real project, that is what I’d be interested in, and what I have here as main would just be test code, as the function would be called by other parts of the program.
One interesting detail is that my function
phon processes one character, producing one code word as a result. It doesn’t provide a function for processing an entire string, which is something you’d probably add in typical programming languages.
Given the function
phon, you can call it on a character using ordinary function application, but it’s just as easy to process an entire string so there is no need to provide another function for that!
codeword = phon ch -- mundane call codeword = phon $ ch -- means the same thing codewords = phon <$> s -- process whole string of input codewords = map phon s -- same thing
It’s also especially important in Haskell to keep the physical realization of the output separate from the computation. In most common programming languages, it would be just as easy to have the principle function print the result, and multiple calls would print successive items. In Haskell it is not just as easy — The function
phon in this case cannot call
putStr or anything like that.
In this toy program, where
phon is called right from
main and nothing else is happening, writing
phon as an IO function would not seem awkward. But it’s still a deliberate act, and you’d certainly notice when making a recursive call (or trying to).
Of course, that’s good in general. In my toy program printing the results was part of the specification, but in real projects such a function will be consumed by other code, so the result needs to be in a form that’s still internalized and can be handled however the caller wants. Unfortunately, it also makes it difficult to add debugging trace statements to existing code, but that’s another story.
Here is the complete program, as I left it.
module Main (main) where import System.Environment (getArgs) import Control.Applicative import Data.Char main = getArgs >>= mapM_ body where body s = do putStrLn $ announce s putStr $ format $ phon <$> s announce s = "string \"" ++ s ++ "\" spoken as:" format = concatMap decorate where decorate w = "\t" ++ w ++ "\n" phon :: Char -> String phon c | isUpper c = toUpper <$> (phon $ toLower c) phon c = maybe (quoted c) (id) result where result= lookup c phon_table quoted c= ['"',c,'"'] phon_table= nato_table ++ digit_table ++ [('.',"Period"), ('-',"Dash"), ('#',"TicTacToe"), ('_',"Underscore"), (' ',"Space"), ('@',"Snail") ] nato_table= zip ['a'..'z'] ["alfa", "bravo", "charlie", "delta", "echo", "foxtrot", "golf", "hotel", "india", "juliett", "kilo", "lima", "mike", "november", "oscar", "papa", "quebec", "romeo", "sierra", "tango", "uniform", "victor", "whiskey", "xray", "yankee", "zulu" ] digit_table= -- FAA style zip ['0'..'9'] ["Zero","One","Two","Three","Four","Five","Six","Seven","Eight","Niner"]
I welcome comments from experienced Haskell programmers on these issues, and overall style and idioms in general.