Foods for Thought

Here is a little problem that is isomorphic to something in real life that I was discussing with a co-worker.  Given a list of items (of a simple enumeration type), and knowing implicitly that each item has an associated priority, return the best item of the list.  Further refinements:  an empty list produces a default 0-priority item, as would a list that contains only negative-priority items.  Oh, and there might be unexpected items present whose priority is not known to the code.

To give some concrete examples, let me define the item type Snack as the set { Nothing, Gorp, Doughnut, Broccoli, Cheese_puffs, Apple, Cookie }, and a definition of the priorities, you can give the sequence (Gorp, Cheese_puffs, Apple) and the function returns Cheese_puffs.  Given the empty sequence () it returns Nothing.  Given the sequence containing only (Broccoli), it still returns Nothing.

Since I’ve been reading about and watching videos on the subject of functional programming, I recognize this an example of a foldl operation.  In particular, see C9 Lectures: Dr. Erik Meijer – Functional Programming Fundamentals Chapter 7 of 13.

Even if we’re not using higher-order functions, it still qualifies as a design pattern even if the language doesn’t do all the work for us.  In C++ pseudo-code, the solution ought to look like this:

result= initialvalue;
for (item : collection) result= f(result,item);

The logic of interest ought to be a succinct expression, something like result= std::max(result,item, comp); where comp is a comparison function that knows to compare the corresponding priorities rather than the numbers themselves.

There is a max_element algorithm in std (and in boost::range) that will go through a sequence and apply the max comparison to each element, as in the previous listing.  But, it doesn’t like empty lists.  It’s hard to beat a foreach loop for ease and clarity!  In C++11, the range-based for loop is the easy way to go through a collection.  You put other statements before, after, and inside it to control the details.  It’s not a function as a first-class object, but that’s still an easy way to specify such things.

I’ve been a fan of the “range” idea for a long time, and think it’s a nice way to make STL more friendly.  Boost has range (whole container) versions of std algorithms, but where is the humble map algorithm?  In Boost::range, you don’t use a range algorithm for this at all, but rather the concept of map is filled via range adaptors, with a generic map higher-order function called transformed.

Applying ideas from functional programming, the function (lets call it g) that takes a Snack value and returns the corresponding priority can return a pair, containing the original argument as well as the result.  That is, it decorates the value but does not destroy it.  Something like this:

pair<Snack,int> x = g(Nothing);
for (item : input | boost::transformed(g))  x= max(x,item,compare_second);
result= x.first;

compare_second should do the normal operator< on the .second, but it doesn’t appear as a predefined standard function.  It could be done using a lambda expression, but (in C++11) you would have to declare all the types and parameter arguments.

But remember the earlier musing on the utility of for loops — there’s nothing wrong with using them.  Instead of putting details inside a lambda expression, don’t bother.  max and compare_second are so simple to express naturally that it’s not worth any fuss to compose canned implementations of them.  (Now if we were working with parts that had more complexity, it would be another story)

pair<Snack,int> x = g(Nothing);
for (auto item : input | boost::transformed(g))   {
    if (item.second >= x.second)  x= item;
}
result= x.first;

Note that this naturally handles empty lists and lists containing only undesired items, and if g defaults to 0 for items not known at compile time, this handles all the desired cases without any special-case code.

Giving up on using a single function expression as the body of the loop, there is no need to have g return pairs and then extract first and second.  But, it still has an advantage.  If you kept two distinct variables instead:

Snack x = Nothing;
int pri= 0;
for (auto item : input)   {
    int newpri= g(item);
    if (newpri >= pri)  {
        x= item;
        pri= newpri;
    }
}
result= x;

It turns out longer!  The two variables x and pri, kept separately, need to be updated as a unit.  In more complex code, that is not nice as mistakes can be made.  We still want both the item from the input list and the associated priority, so yet another variable is created.

It is more natural in C++ to have g just return the result, though.  So let’s take the best features of both and split the difference:

pair<Snack,int> x  (Nothing, g(Nothing));
for (auto item : input)   {
    pair<Snack,int> temp (item, g(item));
    if (temp.second >= x.second)  x= temp;
}
result= x.first;

Now remembering the associated priority with the best-so-far assumes that g has some overhead.  E.g. it’s a configuration file read a run-time and stored in a fancy data structure where items with priority defined are sparsely represented.

But if that’s not the case, and there’s no good reason not to make redundant calls to g, you don’t have to remember it.

Snack x = Nothing;
for (auto item : input)   {
    if (g(item) >= g(x))  x= item;
}
result= x;

It’s just so annoying to me that g is called for again for the same value it just used on the previous iteration!  If the priorities are known at compile time, you might suppose the compiler to inline and optimize out the calls to g.

I will continue this later with an exploration of how it works in a functional language.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>