Monthly Archives: March 2014

Foods for Thought

Here is a little problem that is isomorphic to something in real life that I was discussing with a co-worker.  Given a list of items (of a simple enumeration type), and knowing implicitly that each item has an associated priority, return the best item of the list.  Further refinements:  an empty list produces a default 0-priority item, as would a list that contains only negative-priority items.  Oh, and there might be unexpected items present whose priority is not known to the code.

To give some concrete examples, let me define the item type Snack as the set { Nothing, Gorp, Doughnut, Broccoli, Cheese_puffs, Apple, Cookie }, and a definition of the priorities, you can give the sequence (Gorp, Cheese_puffs, Apple) and the function returns Cheese_puffs.  Given the empty sequence () it returns Nothing.  Given the sequence containing only (Broccoli), it still returns Nothing.

Since I’ve been reading about and watching videos on the subject of functional programming, I recognize this an example of a foldl operation.  In particular, see C9 Lectures: Dr. Erik Meijer – Functional Programming Fundamentals Chapter 7 of 13.

Even if we’re not using higher-order functions, it still qualifies as a design pattern even if the language doesn’t do all the work for us.  In C++ pseudo-code, the solution ought to look like this:

result= initialvalue;
for (item : collection) result= f(result,item);

The logic of interest ought to be a succinct expression, something like result= std::max(result,item, comp); where comp is a comparison function that knows to compare the corresponding priorities rather than the numbers themselves.

There is a max_element algorithm in std (and in boost::range) that will go through a sequence and apply the max comparison to each element, as in the previous listing.  But, it doesn’t like empty lists.  It’s hard to beat a foreach loop for ease and clarity!  In C++11, the range-based for loop is the easy way to go through a collection.  You put other statements before, after, and inside it to control the details.  It’s not a function as a first-class object, but that’s still an easy way to specify such things.

I’ve been a fan of the “range” idea for a long time, and think it’s a nice way to make STL more friendly.  Boost has range (whole container) versions of std algorithms, but where is the humble map algorithm?  In Boost::range, you don’t use a range algorithm for this at all, but rather the concept of map is filled via range adaptors, with a generic map higher-order function called transformed.

Applying ideas from functional programming, the function (lets call it g) that takes a Snack value and returns the corresponding priority can return a pair, containing the original argument as well as the result.  That is, it decorates the value but does not destroy it.  Something like this:

pair<Snack,int> x = g(Nothing);
for (item : input | boost::transformed(g))  x= max(x,item,compare_second);
result= x.first;

compare_second should do the normal operator< on the .second, but it doesn’t appear as a predefined standard function.  It could be done using a lambda expression, but (in C++11) you would have to declare all the types and parameter arguments.

But remember the earlier musing on the utility of for loops — there’s nothing wrong with using them.  Instead of putting details inside a lambda expression, don’t bother.  max and compare_second are so simple to express naturally that it’s not worth any fuss to compose canned implementations of them.  (Now if we were working with parts that had more complexity, it would be another story)

pair<Snack,int> x = g(Nothing);
for (auto item : input | boost::transformed(g))   {
    if (item.second >= x.second)  x= item;
}
result= x.first;

Note that this naturally handles empty lists and lists containing only undesired items, and if g defaults to 0 for items not known at compile time, this handles all the desired cases without any special-case code.

Giving up on using a single function expression as the body of the loop, there is no need to have g return pairs and then extract first and second.  But, it still has an advantage.  If you kept two distinct variables instead:

Snack x = Nothing;
int pri= 0;
for (auto item : input)   {
    int newpri= g(item);
    if (newpri >= pri)  {
        x= item;
        pri= newpri;
    }
}
result= x;

It turns out longer!  The two variables x and pri, kept separately, need to be updated as a unit.  In more complex code, that is not nice as mistakes can be made.  We still want both the item from the input list and the associated priority, so yet another variable is created.

It is more natural in C++ to have g just return the result, though.  So let’s take the best features of both and split the difference:

pair<Snack,int> x  (Nothing, g(Nothing));
for (auto item : input)   {
    pair<Snack,int> temp (item, g(item));
    if (temp.second >= x.second)  x= temp;
}
result= x.first;

Now remembering the associated priority with the best-so-far assumes that g has some overhead.  E.g. it’s a configuration file read a run-time and stored in a fancy data structure where items with priority defined are sparsely represented.

But if that’s not the case, and there’s no good reason not to make redundant calls to g, you don’t have to remember it.

Snack x = Nothing;
for (auto item : input)   {
    if (g(item) >= g(x))  x= item;
}
result= x;

It’s just so annoying to me that g is called for again for the same value it just used on the previous iteration!  If the priorities are known at compile time, you might suppose the compiler to inline and optimize out the calls to g.

I will continue this later with an exploration of how it works in a functional language.

Haskell Exploration 2, or “who needs loops?”

The previous post used a one-liner to solve the problem, using a list comprehension. Trying different limits meant editing the number in the middle of the line and entering again. That’s really the only thing you can easily do in the GHCi interactive environment—any functions or more complex code needs to be in an actual source file. At least GHCi can load your source file and then let you interact with the functions defined in it.

So, here’s a function which takes the max in the problem statement. To completely generalize, it also takes the list of factors as a parameter. After all, the original code was starting to look repetitive on the handling of the two hard-coded factors.

 

divisors ∷ [Int]IntInt
divisors factors max =
sum [x | x ← [1‥max], factor factors x]
 
factor factors n =
any (λx → n `mod` x ≡ 0) factors
-- swap out these alternatives by leaving exactly one un-commented
-- or [b | x ← factors, let b= n `mod` x ≡ 0]
-- or [True | x ← factors, n `mod` x ≡ 0]

 

As well as using recursion, the list comprehension syntax offers a way to write what would be a loop in imperative languages.  The generator portion of the construct, x ← [1‥max], is like a foreach loop, though it is not limited to using one generator.  The things after the vertical bar can include generators, constraint tests, and named values to use in other things.  However, it doesn’t seem directly suited to using the factors list directly as a generator with other tests — the built-in meaning is an AND between constraints, not an OR.  So, push that down to another function, top-down style.  That is, sum all the numbers up to max where the number passes the test of function factor.

I knew that there are logical functions that apply across all elements in a list, but didn’t recall how they were named.  Looking through the library, I found any first, which is even more suitable.  It reads more clearly to indicate the intent, rather than having to know that a multi-input OR can be used in that manner.  But, it needs the test packaged as a separate function.  Here I did that in-line as a lambda.

Since the point of this exercise is to work with list comprehensions, I tried it using the or function, too.  Mapping the list of factors to a list of True/False is done within the list comprehension syntax, doesn’t need extra parens, but isn’t really clearer: it trades the lambda for a let.

But, there is no reason to populate all the False entries.  The natural use of the filtering clause is to reject elements that don’t meet the criteria.  So, accept the ones that would give True, rather than mapping to True.  A form similar to this would be useful if I wanted to list the ones that fit, rather than just finding out if any fit.

Interestingly, the first form can be written without using a lambda expression.  Nice normal function composition doesn’t work out because the argument is deep in the center rather than last, but section syntax lets you use either the left or right argument in an operator.  The section syntax uses parenthesis, and you end up needing parens around the whole thing because the infix ∘ has a lower precedence than function application.

factor factors n = any ((0≡)∘(n `mod`)) factors

First attempt at Haskell

Haskell is what’s called a functional programming language.  Compared to BASIC, Pascal, batch-file, C, C++, Java, JavaScript, C♯, Ada, FORTRAN, assembly language, and any any other procedural (whether structured or OO) language, Haskell is very different.

And that’s really the point:  I don’t need to pick up on a slightly different syntax and somewhat different environment for representing the same kinds of constructs I’m used to.  There might be a small number of new and interesting concepts to inspire you, but for the most part they are “me too” on the current engineering paradigms and if anything leaning back to simplicity and leaving out features that could separate the top coders from the mundane (since the latter will need to maintain the code later).

But brushing up on techniques and ways of thinking that are truly different helps improve your engineering skills in general.

For example, a number of years ago I re-implemented a piece of code for laying out images on a page.  The old code had been extended, fixed, patched up, and “maintained” to the point where it was impossible to understand what was going on.  Cleaning it off would be a temporary measure if new requirements and changes continued as they always have.  So, I approached it using concepts from functional programming, even though written in (an early version) of C♯.  Defining the constraints and properties of the desired layout, rather than a spaghetti flowchart of patched procedures that manipulate all the state variables, means at the very least that it could be extended without breaking it.  The logic was unfathomable because of state dependance—what happened before affects what some block of code will do to it next.  So I had no state variables at all.  The layout rectangle of an item was expressed as a pure function, and the various bottom level functions could easily be reviewed for correctness.

Functional programming has been making inroads into mainstream programming, with otherwise procedural and OO languages acquiring features friendly to techniques gained from FP.

In particular, template metaprogramming in C++ (which I’ve seen described as “a slow descent into madness”) exists because the template system is a Turing-complete, albeit crudely primitive, functional programming language.

Meanwhile, I’ve read that monads have a strange property: anyone who comes to understand it loses any ability to explain it to others.  That reminds me of the science fiction novel Babel-17.  In fact as in the story, language influences thought and perception, which is what I was getting at earlier in this essay.  Being a writer on programming topics, I thought I’d take that as a challenge.  Maybe I’ll write a truly good explanation of monads; or maybe it will end up joining the hundreds of others that are are either indecipherable or lack proper deep meaning.  (See also monad tutorial fallacy)

chart

A lot of what I see of beginners’ Haskell examples remind me of Prolog.

Anyway, I just tried crafting my first Haskell snippet, other than things like 2+2 or copying lines from the book.  Project Euler, problem 1,

If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.

Find the sum of all the multiples of 3 or 5 below 1000.

Here is the session from the interactive environment:

Prelude> [x | x <- [1..10], x `mod` 3 == 0]
[3,6,9]
Prelude> [x | x <- [1..10], x `mod` 3 == 0 || x `mod` 5 == 0]
[3,5,6,9,10]
Prelude> sum [x | x <- [1..10], x `mod` 3 == 0 || x `mod` 5 == 0]
33
Prelude> sum [x | x <- [1..9], x `mod` 3 == 0 || x `mod` 5 == 0]
23
Prelude> sum [x | x <- [1..999], x `mod` 3 == 0 || x `mod` 5 == 0]
233168
Prelude>  [x | x <- [1..999], x `mod` 3 == 0 || x `mod` 5 == 0]
[3,5,6,9,10,12,15,18,20,21,24,25,27,30,33,35,36,39,40,42,45,48,50,51,54,55,57,60,63,65,66,69,70,72,75,78,80,81,84,85,87,90,93,95,96,99,100,102,105,108,110,111,114,115,117,120,123,125,126,129,130,132,135,138,140,141,144,145,147,150,153,155,156,159,160,162,165,168,170,171,174,175,177,180,183,185,186,189,190,192,195,198,200,201,204,205,207,210,213,215,216,219,220,222,225,228,230,231,234,235,237,240,243,245,246,249,250,252,255,258,260,261,264,265,267,270,273,275,276,279,280,282,285,288,290,291,294,295,297,300,303,305,306,309,310,312,315,318,320,321,324,325,327,330,333,335,336,339,340,342,345,348,350,351,354,355,357,360,363,365,366,369,370,372,375,378,380,381,384,385,387,390,393,395,396,399,400,402,405,408,410,411,414,415,417,420,423,425,426,429,430,432,435,438,440,441,444,445,447,450,453,455,456,459,460,462,465,468,470,471,474,475,477,480,483,485,486,489,490,492,495,498,500,501,504,505,507,510,513,515,516,519,520,522,525,528,530,531,534,535,537,540,543,545,546,549,550,552,555,558,560,561,564,565,567,570,573,575,576,579,580,582,585,588,590,591,594,595,597,600,603,605,606,609,610,612,615,618,620,621,624,625,627,630,633,635,636,639,640,642,645,648,650,651,654,655,657,660,663,665,666,669,670,672,675,678,680,681,684,685,687,690,693,695,696,699,700,702,705,708,710,711,714,715,717,720,723,725,726,729,730,732,735,738,740,741,744,745,747,750,753,755,756,759,760,762,765,768,770,771,774,775,777,780,783,785,786,789,790,792,795,798,800,801,804,805,807,810,813,815,816,819,820,822,825,828,830,831,834,835,837,840,843,845,846,849,850,852,855,858,860,861,864,865,867,870,873,875,876,879,880,882,885,888,890,891,894,895,897,900,903,905,906,909,910,912,915,918,920,921,924,925,927,930,933,935,936,939,940,942,945,948,950,951,954,955,957,960,963,965,966,969,970,972,975,978,980,981,984,985,987,990,993,995,996,999]
Prelude>

After expressing the shorter example, first to get the list of terms to see if that part is right thus far, and then summing them, I changed the bound to the larger value and got an answer of 233168. Just to see what it was, I then backed off the final sum to get the list of terms.

So far, so good.