Category Archives: Computer Enthusiasm

The real answer to the FizzBuzz interview question

A popular interview question is coding FizzBuzz.  Now I’ve had similar experiences to what is shown at Coding Horror — I had a description for a trivial piece of code that would offer insight into the depth of engineering skill, with my careful posing of the requirements.


I had a sheet with this written down, so I would always say it in the same carefully-crafted way.  I lost that some years ago, but here is the problem from memory:

Write a class whose constructor takes two integral arguments, and has members that return their sum and their difference.

Trivial enough?  Not even a “problem” with logic to be considered; just a routine microtask that anyone with any fluency can express without having to think about it.

What I expected was to see how well the applicant processed the statement of requirements: making assumptions, or asking for clarification.  I figure an experienced engineer would ask what was meant by integral, or use a typedef to abstract the decision of which integer type to use and specify it at only one point.  More fluent coders would write it as a template.

I was very surprised to find that most applicants who got past the screening process thus far had trouble writing a C++ class at all!  The best insight I had was from someone who explained that he usually uses the IDE and doesn’t know the details of syntax on how to write a class definition.  Because of the reliance on what Visual Studio calls wizards, we dubbed these people Wizards’ Apprentices with a reference to the segment from Disney’s Fantasia.  The story is simple: the young apprentice tries to use his master’s power but cannot control it; what we would recognize as a buggy program.

What are you showing them?

Now suppose that you are fluent in C++ (or whichever language you are using) and would have no trouble jotting down a piece of code that does what you want.

I see many solutions to FizzBuzz, as well as many other problems, that are written in a way that is completely unlike what we see in “real” code.  Someone who can effortlessly produce this might be in “scripting mode” which is used for single-use small programs — written quickly, sloppy, and highly specialized.  It does not illustrate good software engineering practices, and the interviewer can’t tell if the applicant always (only) writes like that, or is capable of working on projects like they have: large bodies of code, need for maintainability, testability; follows good practices, etc.

So, I thought about FizzBuzz as if it were a small utility feature that was ordered for use in a large project.


Certainly, a short piece of code can be written using up-to-date style and best practices as applicable for software development at the scale that is involved in real projects at work.

But I also thought about how much “good engineering” I can illustrate without making the result not-so-trivial.  Here is what I came up with:

  1. Separate the logic from I/O.
    Most solutions to simple problems like this mix console I/O with the logic, which is completely unlike real programs in multiple ways.  It should not directly solicit input from the user as if he was a text file, or print output in the middle of the work; the “solution” should feature a function that takes parameters and produces results to return.
    As with real projects, this can be called by a unit-testing main program as well as find a home in the application that needs that feature.
  2. One thing that bugs me about the typical implementation of FizzBuzz is the explicit repetition of code, checking each factor and often the combination as well.  Don’t duplicate code that varies only by a data value!  Instead, store the values and loop over them.
  3. Be mindful of future enhancements.
    Code always gets more complex over time.  It is an art forged by long experience to balance future maintenance with extra work now.  Don’t program in features that are not needed — rather, anticipate what will be changed or added later and consider that in the overall design, and allow for such things when it does not add complexity to the task at hand.  (I can probably write a whole essay on this topic.  Message me if you have examples or general thoughts.)


I was very successful in the point 2 makes the code simpler and shorter.  Point 3 takes the form of putting the configuration at the very top of the file, and showing how it can be extended; and the extensibility is free due to the architecture noted in point 2.

There are the needed #includes at the top of the file, and then the interesting part starts off with:

constexpr std::pair<int, const char*> fbchart[] {
    { 3, "Fizz" }, { 5, "Buzz" },  // standard FizzBuzz
    { 7, "Boom" }  // easily generalized and extended by adding to this chart!

When I wrote this, Visual Studio’s C++17 compiler is still in per-release, and part of the exercise for me was to use new features and see where style needs to be updated.

constexpr is the new static const.”  Naturally, this tabular data will be declared as constexpr.

Now, why did I use const char* rather than an object type?  This was a deliberate choice.  First of all, the use of this table (described later) is flexible in what it can take, so I don’t require a std::string here even though that’s what will be used in the constructed object.  There is no need to store a std::string object here, which would make another copy of the lexical string literals, and string does not have a constexpr constructor.

The usual worries about primitive C-style strings do not apply since this is constant data.  After all, do you worry about writing cout<<"hello"; and demand that you store that in a string object before passing it to the function?

A hot-off-the-press alternative would be std::string_view.  But there is simply no need here.  I chose not to use gsl::zstring, since it would be the only use of the Guidelines Support Library in the program, and there is no question that C-strings are being used since they are right there, and only right there.  This is not a function parameter that needs documenting.

Likewise for the use of the plain array, rather than std::array.  Arrays are not as poor as they used to be: with the preference for free functions begin, end, size, etc. what does the object wrapper do that the plain doesn’t?  Only the regular value semantics of passing and assigning — and I’m not doing that.

string fbencode (int n)
    string retval;
    for (auto& [div, codeword] : fbchart) {
        if ((n%div)==0)  retval += codeword;
    if (retval.empty())  retval= std::to_string (n);
    return retval;

Here is the meat:  a simple stateless function that accepts the number as a parameter and produces the results to return as a string.

The loop has a one-line body, and it is automatically iterated over the data table shown earlier.  That is why adding another item just works.  This code is smaller and simpler than the cascade of tests you normally see.

// and to drive it: 
int main() {
    using boost::irange;
    for (auto n : irange(1,101)) {
        cout << n << ": " << fbencode(n) << '\n';

In the main, I avoided a legacy for-loop by using Boost.Range 2.0. That is tried and true for production work. I’ve not tried to get the latest Visual Studio compiler drop to swallow Range-v3 yet.


Not only does this show fluency in C++ with up-to-date skills, it is well crafted and provides a number of discussion points.  As summarized along with the code samples, I can explain why I did things a certain way, alternatives considered, what I might do under different circumstances, etc.

It is good to understand the constructs used and choose them purposefully, not apply things in a cargo-cult or voodoo programming situation.  On either end of the interview process, understand that simple coding problems can give far more insight than simply finding out whether the applicant can “write code” at all.

Error Handling Concepts (again)

In Ancient Times, I was privileged to be among those who deliberated upon a fundamental question in C++ programming philosophy: “Whither exception handling?”  Exception handling was still in the conceptual stages then, and the air was charged with excitement.

Twenty two years ago, I wrote an article for Windows Tech Journal recounting my experience a few years earlier in comparing error handling mechanisms of exceptions vs. error code returns.  In particular, I discovered that using both was an impedance mismatch, with code always wrapping the other kind to convert it:  catch the exception and return an error code; or test for an error code and throw an exception.  This blather is a road leading towards a pure exception handling approach.

I’ve scanned in the magazine pages and it can be seen on my web server.

Now, the C++ community is coming full circle it seems.  Everything old is new again.

In the 2017 standard we have std::optional which is the most basic “sum type”. In a talk Andrei Alexandrescu started with the example of the variant and asked the audience “Why is it called a ‘sum type’?” My own answer: because you don’t know its type but it returns something!

Seriously, in this presentation he also said “Remember when I told you the worst thing that happened to humankind was? I see the std::optional.” He introduces the expected class (still being argued about in committee as I write this), as the right solution giving the best of both worlds:  It can return a result or an error code, and the caller can take the step of checking the error code or just access the result without checking, in which case it throws an exception if it’s actually an error state.

Meanwhile, Niall Douglas rips expected.  Based on the peer review results of Boost.Outcome, he notes for consideration that many of the findings apply to the current expected proposal.

There’s no substitute for experience

This is part of a series of posts I’m writing about using toy projects and other exploration to get a hands-on feel for the new C++17 features, library, and style, as well as the behavior and real limitations of the compiler.

I coded up a very simple recursive descent parser, since it’s been noted that the new std::string_view is especially suited for parsers.  (There is a lot of sentiment about string_view being problematic and evil, but that’s another story.)

Now a recursive descent parser is one of the few places, it is generally acknowledged, where throwing an exception as a form of control flow is genuinely a good decision.  But, this simple grammar doesn’t have enough depth to need anything like that.

The interpreter (it does the work as it parses, as opposed to building a tree representation to be evaluated later) throws exceptions in case of errors.  The user of the class will know this as the way it gives errors on bad input or run-time conditions.

The parsing step itself, though, makes heavy use of std::optional.  As is the nature of such a parser, a production (grammar rule) might be called where this thing may or may not exist: optional parts in the syntax and alternatives and lists all lead to logic that says, “read one of these; did you get one?  No? OK, skip it (or try reading one of those instead).”

Other callers need to read something, period.  In that case, the caller needs to check for an error (just as it did when it was optional) and throw an exception.  This code is what gave me déjà vu:

skip_ws (input);
auto LHS= read_identifier(input);
if (!LHS)  raise_error(input, 4);
read_required ('=', input);
auto terms_result= read_terms(input);
if (!terms_result)  raise_error (input, 5);
if (!input.empty())  raise_error (input, 3);
set_value (*LHS, terms_result);
return *LHS;

The function read_required throws an exception itself if it can’t do what was asked.  read_terms and read_identifier, like most functions modeling grammar productions, return an optional-wrapped value.

Call a function, then check the return value and throw.  This is done repeatedly in these functions.  That is exactly the kind of mismatch I saw all those years ago.

From the nature of the optional returns, it is the caller who decides on the error code.  In a more complex grammar, I can easily see wanting to propagate that or modify it as it passes back up the call tree.  But in the case of optional, there is no error code coming up — just the lack of a result.

In testing this with different syntax error test cases, I found places where I was not checking the return code.  This can coredump because dereferencing the optional does not do any checking on whether it contains a value.  (On the other hand, there is a value() member function that does check.)  I guess I’m so used to writing such that functions do what they are asked (or don’t return at all) that writing in a style where every call is followed up by an explicit test is challenging as well as ugly and obfuscating.

It’s back to the assessment I made when promoting exceptions in the first place:  look at this block of code — what does it do?  It’s main purpose is a bunch of if statements and errors.  Where is the real logic I came here for?  The testing is drowning out the real work.

Would using expected/outcome be better for this?

If the called function loaded the result with an error code rather than just the lack of result, and the attempted dereference would throw an exception, then I would not have to check after every call but still could when I wanted to.

But… it is the caller that knows what error code to assign.  Furthermore, what about the specifics of the exception and its payload?  If productions returned an error code (just an enumeration), the exception still needs to be the parser_error type and include the cursor position.

Its like the (deeply nested) called function needs to interact with the caller to formulate the proper error.  Catching and modifying and re-throwing is something that will not go over well with modern implementations, as throwing is very slow.

I’m reminded of the exception class I wrote and used long before C++ was a standard: I included a feature to simulate dynamic scoped values-of-interest.

Parsing is one case where we really care about details across several levels of function call, in order to get meaningful feedback on the error.  In more normal circumstances, you tend to rely on logging.  Even so, having a speculative log — include this line if an error is logged; flush it upon success — is very helpful and cuts back on the spew of irrelevant information logged.

But the same technique could be used to generate a more meaningful exception, adding information as the stack is unwound.

Besides catch to deal with a problem after it occurs, and destructors to automatically clean up resources, we need some way to prepare for a possible error.  This would be primarily for reporting, not for recovery actions.  That’s probably why it has been neglected as a topic — destructors are perfectly good at cleanup and using them means we don’t have to explicitly prepare.

This entry should appear on .

Writing C++ in 1992

In the previous post, you may have noticed a few odd things about the specifics and the design.  For example, switches were described as TRUE and FALSE.

And here is a small utility header in the project:

// a few simple things used by the rest of the code.

typedef int bool;
typedef unsigned char byte;
#define TRUE 1
#define FALSE 0

There was no built-in bool type or true/false keywords at this time! You might find that hard to fathom, but as I write this in 2017 the existence of std::byte is brand new and almost no code uses it yet.

Of course, when I learned C there was no such thing as the void type for making void*, names of structure members had to be globally unique and if you used the wrong member name for a variable there was no error — you just got the fixed offset represented by that member name, applied to the wrong type of variable. And functions declarations did not have arguments declared, and the compiler did not check what you passed when calling a function anyway.

Some of the names were scoped as being in cmdl, but that’s only the flag enumerations defined inside the class (enumerators scoped to the enumeration name itself didn’t come until C++11). The various classes used though are defined globally. Why? Because there were no such thing as namespaces.

The cmdl_int type is written specifically for the int type. Why not a template? Because there were no such thing as templates. And BTW, int for me was 16 bits, running in “real mode” 8086 code.

The makefile shows a symbol that’s defined if I’m compiling under Borland C++ 3, which supports the 2.1 version of the C++ specification.

#ifdef VERSION21
cmdl::errval cmdl::error= OK;
errval cmdl::error= OK;

Originally, the type (to the left of the name being defined) was implicitly known to be in the same scope. This was changed in version 2.1

I also spotted this little gem:

for (int loop= 0; loop < len; loop++)
string[loop]= commandline[loop];
// _fmemcpy() not available in TC++1.01 (a.k.a. "second edition"). Bummer
string[len]= '\0';

Turbo C++ 1.0 was released in May 1990, and TC++ 1.01 was released February 28, 1991. Borland C++ 3.0 was released in 1991. That should indicate the true vintage of this code.

Wikipedia chronicles that C++2.0 was released in 1989.  As it so happens, I was a reviewer of the spec and documentation before it was finished, and got my name in the Annotated C++ Reference Manual.  This added, of note, multiple inheritance, abstract classes, static member functions, and const member functions, and placement new.

Version 2.1, noted above as this code was used when 2.0 and 2.1 compilers were both in use, added partial nesting of classes. So that explains why none of the other types were nested inside cmdl — you could not do such a thing!

Composable Command Line Parser in C++ — in 1992 !

I just watched Phil Nash’s presentation “A Composable Command Line Parser” and that made me reflect on the subject.

Once upon a time, I developed an easily-used command line parser in C++ with review and feedback from the community (Fellow TeamB members, and regulars on CompuServe’s DDJ and CLM forums including other authors).  Note that this pre-dated the world wide web and mass access to the Internet.

I ended up using it a lot, for most every testing, benchmark, or demo program that I created.

Philosophically, declaring gobal variables that reflect the command line parameters is essentially declaring the parameters being passed to the program.  It was designed to be composable so that a library could come with its own (possibly obscure) arguments to control it, and the program would just automatically respond to them.  For example, a logging component might have options for specifying the output location, verbosity, archiving behavior, etc.  Any program that used that logger would have those options available.

Here is the original documentation, last saved December 22, 1992.

I’ll continue in the next post with some observations about the C++ language circa 1992.

C++ Library for Easy Command-Line Parsing
by John M. Dlugosz

I’ve always felt that the argv[] array was difficult to use.  Not bad, just primitive.  If all you have are a couple arguments, it is not too hard.  But you still have to check for the correct count and convert each value to the proper type.

If your program has various flags and switches, things can get much more difficult.  How many programs have you written and suffered through the argument processing?  In how many programs have you wished you had a better way?  In my case, I’ve written many simple programs that could benefit from command line arguments, but found it more trouble than it was worth.  So I was stuck with a simpler, less flexible program.  For test code and such, I would even change a value and recompile, instead of adding a nice command line processing.

Now, I do have a simple way.  It has revolutionized the way I write small programs.  Rich command line argument processing, sign-on messages, and help on usage are now trivial.

Here is an example.  Consider a program that takes a -v switch for verbose mode.  Using this library, this is accomplished by including the definition

cmdl_flag v ('v', "requests verbose mode");

to make the program recognize the flag, and code such as

if (v()) {  //do this in verbose mode

to respond to the state of this flag.  There is no messy string manipulation, error checking, or anything.  The library automatically handles -v or /v forms, disabling a switch with -v-, cascading switches such as -vbx, and other features.

Notice the definition of v above takes two constructor arguments. The second argument is a string that provides usage information.  The library will automatically generate the usage message, collecting the messages from all the parameters in the program.


The basic idea is to model command-line parameters as program arguments. That is, they should be analogous to arguments passed to a function. In a function call, each value passed is bound to a name in the called function.  By analogy, a program argument is a name which gets bound to something which can be specified on the command line.  To provide for command line input, you declare those arguments you want to receive, along with their types.

The cmdl library has a type for each type of command line parameter:  flags, integers, strings (more can be added).

The constructor is given the name of the parameter, as used on the command line.  It can also be given a help string, and flags.  Here are some examples:

typedef cmdl_flag flag;
flag v ('v', "requests verbose mode");
flag s ('s', "specifies alternate algorithm");
flag T ('T', "prevents the foobar from clearing (debugging)"  ,cmdl::once);
cmdl_string pos1 ((char*)0, "first positional parameter", cmdl::required);
cmdl_string pos2 ((char*)0, "second positional parameter");
cmdl_string pos3 ((char*)0, "third positional parameter");
cmdl_int count ('c', "iteration count");
cmdl_help helper;

This shows the following types:

  • Type cmdl_flag is a simple switch.  Using that flag makes the parameter TRUE, if absent it is FALSE.  You can also turn off the switch by using the name with a trailing - sign.  (The library takes care of cascading switches, too.)
  • Type cmdl_string allows input of an arbitrary string.  The syntax is somewhat flexible, with the argument separated from the keyword by a space or an =, and the string can be in quotes.
  • Type cmdl_int allows input of an integer.  The input is checked for valid syntax.
  • Type cmdl_help provides for an automatically generated help screen if the command line is empty, or with the -? switch.

Except for the special cmdl_help class, the constructors take two or three arguments.  The first is the name of the command-line parameter. This can be given as a single char or as a string. If passed (char*)0, there will be no name and it is taken to be a positional parameter, explained later.

The second constructor argument is the usage help string.

The optional third argument to the constructors is a bank of flags. once indicates that the argument can only appear once in the command line. Ordinarily, repeating it will override the previous mention. The required flag means that it is an error to omit the parameter. There are others, detailed in the code listing.

A flag worth particular attention is keyword.  If present, then the command-line parameter name will not use the switchchar (- or /) to indicate that this is a parameter.  If a keyword is found anyplace outside of a quoted string it will be used as an instance of the parameter.

Using class cmdl in a program

The program that contains these definitions will kick off everything by calling cmdl::parseit();.

This works because the constructor for each command-line argument class linked them together into a linked list.  The command-line argument objects should be global, or defined in main before calling parseit().  In any case, no commnand-line object should ever go out of scope before parseit() is called.

Because the objects link themselves up, the complete collection of defined command line parameters is known.  parseit() will parse the command line, and compare what it finds with the list of possible arguments. It takes care of usage errors and such, so the program aborts if the command line is invalid.  No error checking is required by the main program.

Each command-line-parameter object contains an operator() which provides a succinct way to get the value of that parameter.  There is a default value in case it was not specified on the command line.  If you would rather check for its presence, use the hasvalue() member.

Before calling parseit(), you can use the static member signon() to note a string used during the usage help message.

See the listing of TEST3.CPP and other files for usage of the examples described above.

Kinds of argument names: char, string, and positional

The first argument to the constructor of the command-line parameter objects is the name of the parameter that will be used on the command line. The constructor has two forms.  It can take a char, used for a single letter switch.  Or it can take a string (char*), for arbitrary names.

In addition, the string form responds to a special name of NULL.  Passing in (char*)0 for the name makes it a positional parameter.  The parser will not assign it based on a name.  Instead, it is used for unnamed parameters.

If a parameter does not start with a - or /, and it does not match the name of a keyword parameter (those that don’t use the -), it is taken to be a positional parameter.  It is assigned to the first unused positional parameter you defined.  This lets you mix switches with non-named parameters such as filenames.

Note that positional parameters can be flagged as required.

The Use of C++

A few C++ language concepts may need explaining.

Note the syntax of the flags in the third constructor argument.

cmdl::required | cmdl::keyword

The names here are enumeration constants.  They are created with an enum definition (see CMDL.H, line 50).  The names are defined within the class, and are in the scope of the class.  They are not global, and don’t pollute the global namespace.  So, you have no conflict with a name keyword used elsewhere in the program, for example.  The downside of this is that you qualify the name with its classname, as shown.  Note that in C, you probably would have seen CMDL_KEYWORD instead — the name would contain its “family” identifier as part of itself.  So it really is not additional typing to use class-scoped names like these.

The enumeration constants are given explicit values as powers of 2, so they behave as flags which can be combined with | or +.  The function’s parameter taking the flags are defined as unsigned, not as an enum type (in fact, the enum type has no name.  It just defines the constants). This is necessary because the result of | or + is an int, not an enum type.

The class contains two definitions of enum names for flags that share the same flags variables.  But some are public and some are protected.

Another interesting feature is the use of operator().  See CMDL.H lines 84, 97, and 109.  The operator is defined with the name operator() which is then followed by the parameter list.  Here it has no parameters, so you see two sets of ()’s in a row.  The operator is invoked by following the object name with the parameter list, as shown in the test programs.

The positional parameter ability requires you to pass (char*)0 instead of just 0 because 0 is ambiguous — 0 can be a char '\0' or a null pointer.

The Parser

The core parser code breaks up the command line into tokens and looks up names of parameters.  The value of parameters is sent to the matched object for conversion to the proper type.  The virtual scan() function does this final part.  An earlier version of this library had a seemingly more flexible system that allowed significant customization in the specific parameter type’s code.  However, it proved too clumsy and was never really used.  This points out a good design philosophy:  Make a thing just flexible enough.  If it is too configurable, it can become as difficult, or more so, to use as writing code each time; which is exactly what the library is supposed to avoid.

The parser uses a class cmdlscan for low level character manipulation and tokenizing.  It is planned to give this more power in the future, for better error reporting.  Some of the implementation details are implemented as they are for that reason.  The need for a parser class was indicated because several related values, including the string and its current scan position, were always being passed together.  When things like this happen, think about combining them into an object.

Error and Report Output

I did not want the code to simply use cerr and cout for output. This may be used in programs that have their own idea of I/O, including programs that run in graphics mode.  For maximum flexibility, all output is separated.  The final results are funneled through a pair of functions called cmdl::output(), both defined in OUTPUT.CPP.  If linking in CMDL.LIB, you can supply your own versions of these two functions to handle output your way, without having to recompile the cmdl library.

On MIPS and speed

Last time, I found that Haswell core executes 8.57 instructions per clock cycle, which is odd because there is a hard limit of 4 instructions per clock.

If the benchmark is calibrated by setting the performance of the VAX 11/780 and IBM System/370 model 158-3 as equal to 1 MIPS, I can think of a couple reasons:

First, those machines were marketed and claimed to be 1 MIPS, and the real performance of the Dhrystone benchmark is considerably less than that theoretical value.

Second, the number of “instructions” in a high level language procedure might translate to more than one machine opcode each.  Modern compilers are more efficient and generate far better code.  In fact, it’s gotten to the point of being hard to compile such a benchmark program as-intended as the compiler will eliminate it as being pre-computed or useless!

Finally, maybe the modern CPU takes fewer machine instructions to do the same amount of work.  That’s certainly true, but does the benchmark code use the larger registers?  Looking at an implementation, I notice a couple things along these lines.  Control statements can produce fused macro-ops combining the test and branch, which picks up another simultaneous instruction.  Procedure calls might be “free” as that simply starts prefetching from the new location and the unconditional jump itself is not seen as an instruction at all.  Even if the compiler is not optimizing out work, the CPU itself might be!  Assignment statements are also part of the mix, and if these variables are in registers than a MOV can be free, taken care of in the register mappings in the pipeline and never actually moving anything.  Finally, string manipulation might take place in bigger gulps, not one byte at a time.

The OCD Method

I brought up the fully-optimized version of the program in the debugger, and counted the instructions.  On the “short” path where it does the early-out, it does as few as 85 instructions and sometimes over a hundred.  To err on the side of caution (fewer actual instructions done in the given time) let’s use a weighted average of 90.  BTW, the early-out part at the top of the loop is only 21 instructions; the bulk of the expense of one iteration is permuting the array.

On the “long” path, the full code to decode the three strings with the current values is 110 instructions, and then it takes 3 to do the final comparison.  So let’s say 200 instructions in this case.

Without counting the rare times where it does a skip by reversing a few elements, this is about 135,693 full iterations at 200 and 219,360 short iterations at 90, or a total of 46,881,000 instructions.

Doing 47 million instructions in 3.4 milliseconds is about 11.8 billion instructions per second.  At 3.6 GHz this is 3.3 instructions per clock, where now “instruction” means an X64 opcode.

How fast was Knuth’s computer?

In my previous post, I wondered what the power of Knuth’s computers were, at the time TAOPC was being written.  Someone suggested the IBM S/360 series as an exemplar.  That turned out to be a good idea specifically, since I’ve written programs for the S/370 in assembly language, so I’m familiar with it.  Models 30, 40, and 50 were released in April 1965.  On the pricier side were models 65 and 75.  Here is a scanned “System Summary” describing the various models in detail.  So, I suppose somewhere between 100 and 900 kilo instructions per second.  A larger machine would probably be servicing multiple users.  Fifty years later, my Xeon E3-1276 is supposedly around 133 billion instructions per second.

Interestingly, the S/360 takes many (like 13 on average) clock cycles to perform one instruction.  Meanwhile each core of the Xeon performs 8½ instructions in one clock cycle.  I suppose the clock speed of the S/360 is the cycle time for the internal microcode.

But what’s an instruction?  On the S/360, I would not need the decode function at all, but would just sum the digits directly using unpacked decimal.

int decode (std::initializer_list<cellT> lst)
	int total= 0;
	for (auto digit : lst)
		total= total*10+digit;
	return total;

The modern CPU knows only binary arithmetic on various word sizes. So converting from a decimal digit-per-byte requires 4 iterations on two operands doing a multiply and add: at least 16 distinct instructions (if the loop is unrolled), plus the actual add once that’s all done.

Interestingly, the x64 code generated by the compiler doesn’t actually issue a multiply instruction in this loop. In fact, the entire expression does not use the regular ALU! There is neither a MUL or ADD instruction there. Instead, it exploits the address generator to do stuff that has nothing to do with actual pointer addresses. The complicated addressing modes of the CISC processor means that a separate address generator unit has a variety of things it can compute, yet it is far more limited than a completely general ALU. So, it is much simpler and thus faster.

In particular, Scaled Index mode looks like this: [ebx + ecx*S + constant] Register ebx is the base, and ecx is used as an index here. The index can be used directly, or scaled by 2, 4, or 8. If the same register is used in both positions, you can multiply it by five! The LEA instruction is Load Effective Address, and gives the processed address without fetching what it resolves to like a MOV would. So, if we have total in EAX and the digit in EBX,


The first instruction multiplies by five. The second instruction not only multiplies by two, but also adds in the digit as the base of the addressing mode.

I also found it interesting how the S/360 line anticipated what we have today:  one compatible instruction set, but pricey implementations have more pipelining and faster clocks; also they keep adding more layers of cache memory.  The “processor storage” housed with the CPU is analogous to the L2 cache.  Adding external memory modules gives more storage but slower: 8 microsecond access time.  If you add pairs of modules you can go dual-channel and double the throughput.  Finally, later high-end models added extra-high-speed memory to keep up with the CPU, and that is analogous to our L1 cache.

Back to the quantitative comparisons:  The modern machine has 4 independent cores, but my program only used one.  If a brute force problem required a significant amount of time, it could be split up into 4 tasks.  At full utilization, 133 billion vs 133 thousand, more or less.  That’s a factor of about one million.  With the single thread, a quarter of that.  30 ms on one core would be about 8½ hours on a S/360-50 using it exclusively for this job.

Knuth’s suggestion of 10! can be scaled up by a million.  That’s midway between 12! and 13!.  Now in terms of exponential growth, realize that an institutional computer like that cost about 1000 times more than a personal desktop computer today.  At computing power per constant dollars (not adjusting for inflation) is indeed about one billion.

For floating-point calculations, the difference in power over 50 years is a few orders of magnitude higher.  A $300 GPU card can do 4 teraflops?  That means it would be a world-class supercomputer as recently as 2005!


Permutation Puzzle: “send+more=money”


I saw this puzzle on Bartosz Milewski’s blog, with an entry on using monads in C++.  I’d like to hold it up as an example of a completely different lesson to C++ programmers:  A central idea I want to teach is know your libraries.

I recall it was in the early ’90’s, when the STL for C++ was submitted as a proposal to include in the upcoming standard.  I noticed there were algorithms called next_permutation and prev_permutation, and wondered how they work—how do you order them and re-arrange your collection to the next such, without keeping an auxiliary state?  Then I wondered what I would ever use such a thing for.  Well, nearly 25 years later, I found out!

You should look through the list of algorithms every once in a while just to see what’s there.  Otherwise you might only know about the common ones that you use.  Consider the analogy with a tool (say, a special bent screwdriver only used to attach doorknobs) that you know is in the very back of the drawer, though you may need to rummage around to locate it.  Remembering you have that tool makes for a quick job.  Having to rig up something from common parts instead (to continue the analogy, use locking pliers to grab a short screwdriver bit from the side) is not as good, and certainly more work.

So… 8 nested for loops followed by 7 if statements containing 28 conditions?  Get real!  If you have a line that reads });});});});});});});}); then the code review will be a sight indeed.

Solution in C++ w/standard library

Here’s the meat of my solution:

using cellT = int8_t;

cellT A[10] {0,1,2,3,4,5,6,7,8,9};

void solve1()
    do {
	auto [ig1,ig2, s,e,n,d,m,o,r,y] {A};
	int send= decode({s,e,n,d});
	int more= decode ({m,o,r,e});
	int money= decode ({m,o,n,e,y});
	if(send+more==money) {
    } while (std::next_permutation(std::begin(A), std::end(A)));

You’ll notice that besides the uniform initialization syntax introduced in C++11, this uses something you might not have seen before (if you’re reading this in 2017).  Hot off the press in C++17 is structured bindings.

	auto [ig1,ig2, s,e,n,d,m,o,r,y] {A};

This defines 10 variables and assigns all the elements of the array to them.  The first two are ignored so I used scratch names, and the others are simply the names of the letters in the puzzle.

One thing I have to point out from Milewski’s listing is the call to turn a word into a numeric value.  He writes:

int send = asNumber(vector{s, e, n, d});

This creates a std::vector on every use.  Let me elaborate: it allocates a block of memory from the heap (vectors can’t use a small-vector optimization).  Then after the call returns it is deallocated.  Then the next two lines to the same thing.  And that happens on every iteration.

The constructor for std::vector takes this handy literal list.  Now in C++ these containers are not special language features, but are ordinary libraries.  It should be clear that anything they do — cool syntax or unusual ability — you can do yourself on your own code!  My version of the same construct does not create a vector, doesn’t require more words to make the call, and most importantly does not have any overhead.

int send = decode({s,e,n,d});

And here is the function that takes such an initializer list:

int decode (std::initializer_list<cellT> lst)
	int total= 0;
	for (auto digit : lst)
		total= total*10+digit;
	return total;

The solving function doesn’t print the results because I want to time just the solving logic.  So the solutions are pushed onto a vector, and the caller prints them after stopping the clock.  In a real program, this might be an object (not globals) and the results available in the object afterwards, or as the return value from the solving function.  In another post I’ll make it lazy.

 Make it faster

This simple function found 50 solutions, one of which doesn’t have a leading zero.  It ran in 39.6 milliseconds, trying all 3,628,800 permutations.  That’s 92 million iterations per second.

The value of 10 factorial is an interesting number here.  Donald Knuth, in the The Art of Computer Programming, wrote that this is about the size that separates things that may be done by brute force from things that are impractical to simply try all possibilities.  Volume 3 was published in 1973.  I hazard to guess that computers now are about 230 (or about a billion) times the power of the machines that were around when he wrote that.  A billion times 30 milliseconds is 460 years.  So, I revise that to more like ten million times the speed, if I postulate that he could have run this program to completion in a matter of days.

Anyway, to make it faster, I need to skip over permutations that are “the same” as one I already rejected.  The order of the two ignored digits don’t change the solution, so if I decide that one order is canonical and when the other is detected I skip over the whole lot, that would cut the number of iterations in half.

So how do you skip over states in the next_permutation algorithm?  I looked it up — a StackOverflow post described it well and also pointed to a Wikipedia page on it.  The states are generated in lexicographical order, so when a given digit changes everything to the right is in reverse sorted order, and it “rolls over” to advance that digit by one and everything to the right is now in forward sorted order — the lowest value of that substring.

So, when I identify a digit value that I know will not be a solution no matter what the other digits are, I can skip to when that digit is advanced again by reversing everything to the right of it.

void skipfrom (int pos)
    std::reverse (std::begin(A)+pos, std::end(A));

    do {
	auto [ig1,ig2, s,m,e,d,y,n,o,r] {A};
	if(ig1 > ig2) {

Indeed, it still found 50 solutions but the iteration_count showed half the number: only 1.8 million times through the loop.  However, the time only went down to 26ms — about two thirds the time, not half.

We also don’t want solutions with a leading zero, so filter those out too.  Notice in the listing above I changed the order of declaring the digit variables.  It doesn’t matter to the solution algorithm, but putting these farther to the left means I can skip over more.

	if(s==0) {
	if(m==0) {

That didn’t save much though: 1.45 million iterations in 22 ms.

Another big constraint can be found on the right side of the puzzle.  I would think that parsing the arrangement of digits into ints would be slow, seeing as that involves multiplying by 10 and adding each digit.  Looking at the rightmost (units) digit only, the puzzle has d+e=y with a possible carry.  Test that before parsing the int values, and furthermore skip ahead until one of those three digits changes again.  To that end, the declaration order has d, e, and y next after the previous items we wanted on the left.  This leaves only 3 letters to the right, so each time the units digits don’t work it can skip 6 iterations.

I added a counter to that, and see that it happened 219,360 times.  The loop only executed 355,053 times now, taking a mere 4.7 ms.

Faster yet?!

Notice in the listing that I declared a type CellT for the array of digits and anything that holds a digit.  My thought was that keeping the array small would save in parameter passing to decode.  Keeping the size of the data small is generally good for memory caching, but it probably doesn’t matter here because I don’t have enough data to exceed the L1 cache anyway.

But, I can change the type in one place in the code and it affects everything in the program.  I changed it from int8_t to plain int (int is 32 bits here), and…  the time was down to 3.4 ms!

64-bit was the same.  16-bit values was the slowest at 5.2 ms.  So, the CPU is inefficient at loading 8-bit values and worse with 16.  I don’t know if the actual load is slower, or the compiler generated somewhat different code rather than just using MOVXS instead of MOV.  This was a difference of a quarter of the execution time over the entire algorithm, and loading values is only a part of what it does, with values kept in registers once loaded.

The Joy of Reading (21st century style)

I decided to read a science fiction novel.  Not a unique occurrence, as I have thousands of them in boxes and storage and shelves.  And that’s the problem, really.  Hard drive space is already increasing faster than my reading pace, so I could store books electronically and add to them indefinitely and keep the same physical volume.

Now I’m already well practiced with using my original Nook reader.  After downloading the file, I would next import it into calibre, not just to keep track of it but to massage the data.  But first it tells me that my version is woefully out of date and I go download and install the latest, which offers many improvements and rearranges the controls.

The book’s file is a bit strange, it seems, as it was not importing right.  Easiest thing is just to load it in Sigil instead (though I suppose I could figure out the import options or use new features to edit the files and not need Sigil anymore for its use in later steps) and save it again.

A first look at the formatted ebook is fair; I’ve seen much worse from some publishers.  Why can’t they do as well as, say, Project Gutenberg, and just put the text in a file?  But I digress.  I fire up calibre’s “heuristic” processing to clean up all the junk, and use its formatting features to optimize the file for my device’s liking and my reading preferences.  Ah, but that’s set for Nook.  Well, a new software reader probably doesn’t have as many peculiar issues as an old dedicated reader, so I probably don’t need that anymore.  Generic output profile to start with, but still specify traditional print-style rendering where paragraphs are indented on the first line as opposed to having double vertical space between them.  Margins and other stuff should be taken care of by the reader software.

Normally this is where I then load the resulting file into Sigil and see if there are any bizzare features that can be fixed with a simple global search-and-replace on the HTML source, if that is still necessary.  At the least I’ll manually retouch the css file to delete stuff that ought to be unspecified so the reader doesn’t feel it’s being bossed around, and get rid of the text-align: justify line since that doesn’t work as well on the old low-resolution e-paper display.  It looks better if the horizontal spacing is optimized for letterform appearance and not also trying to get a specific length too.

On the Nook, I then plugged in the USB cable (which was charging anyway) and had calibre export to it.  But how do I read it on the Android tablet?  USB filesystem hasn’t worked for a few years now and it’s futile to try.  It doesn’t have SMB file networking built in, but there are apps for that.  I know I’ve tried a fancy file manager that includes network access, and it doesn’t work.  I use the network plug-in for the well-regarded Midnight Commander port, and it doesn’t work.  I tried a few more programs, and nothing could get past the names of the file shares, if it got that far at all.  Must be some “security” thing?

Next I try a couple features in calibre.  One is wireless device access, and I’m not sure what that does, but a couple readers and stand-alone programs allow the Android device to use it, it seems.  Well, I can’t get anything to do anything with that.  The other feature is better:  a web server interface.  It tells me the local IP address and port, so I make that into a URL and feed it to Firefox.  Success!  It lets me browse the book collection on the Android tablet, and download files via HTTP.  So, now I have the book file on the tablet.

Next question:  which reader software?  A Google search turns up a few reviews.  Mostly they don’t address the features I’m looking for, or any real features pertaining to the core function of reading stuff presented on the screen.  I don’t care which stores they are integrated with, or how pretty the book chooser screen looks and all the skeuomorphisms present.  A shame that “able to load files on local storage” is a feature that needs to be checked for!  The supplied Google Play Read for example, has its collection of things you bought from them, and no way to point to an actual file.

I end up trying two, and spend the rest of the afternoon figuring out how to make it dance with the song I sing for it.  I’m glad to say that I had success in setting font appearance and size, getting the line spacing to look right, having it show margins rather than printing all the way to the edge of the screen, and so on.

The page is looking quite presentable.  I do mean “looks”, since I haven’t actually read the first page yet.  That’s a chore for next weekend.  It does seem like a lot of effort for a book I’m not going to like anyway, but that’s why I wanted to save five bucks for a remaindered copy plus shipping.

Desires of a Backup System by Bob Hyatt

You invited me to write more on this topic. I believe I will do so, but in parts, rather than in a 20 page university paper. I will start with an overview of the ‘desired’ home server backup, vs what seems to be available.

For most homes and small businesses, the primary desires / goals are (in my view):

A. secure backups. In this case I add, secure from failure for at least 5 years!

B. Easy to do backups. I do NOT mean automatically scheduled backups. They automatically back up things you are NOT interested in. Or that will be useless to you if there is a problem. For example, do you really want to back up all the Operating System files, settings, etc. Things that change a lot, and will occupy a lot of space on your backup drives? I do NOT.

C. Allow you to determine what is important to You!
Not every photo or video is important to you. Not every PDF is valuable to you. Not every MP3 or FLAC is valuable to you. So you need a way to differentiate between valued and chaff. And for the ‘tool’ to remember. OR for you to keep the data you want to keep separately from the chaff. Then have the ‘tool’ look only at the desired data.

D. Allow you to add storage, as needed, easily. Adding a new hard drive should not be ‘a hope and a prayer’ kind of task.

E. If I should desire to change from one backup product or tools to another, the existing should NOT act as a barrier or gatekeeper. (as in keeping your data as prisoner).

F. The running of the backup device or server should NOT wear out the disks, ever! The issue with the video I described in the earlier ‘letter’ here was that the one always busy was attempting to make the ‘distribution’ of data on the drives “perfect”.

If you are a corporation or a government agency, such wearing out of the disks would be considered ‘cost of doing business’. And they have the budget for this task. A home user or a small office does NOT have a budget for this willful destruction. Further, “Perfect distribution” is NOT what a home user or small office is looking for. They are looking for a safe place to put their data, and the exact location is not important.

Home or small business users will seldom read the backups. Perhaps, not ‘seldom’. But the reality is you will do plenty of writes to update the backup. In some cases you might do more reads (if you share the data with media players, other computers on your network, etc).

One more difference between the products that are most sold in this arena? Large organizations are also looking for quick access to the data. E.g. The usage of this kind of thing is for ‘backup or archival’ purposes for the home and small business, whereas the large organizations are using the data put onto such a device as production data and they expect lots of access and lots of updates.

SAN and / or NAS systems in the large organizations are for processing production data, not for backups. So you begin the issues with a fundamental difference in the use of these devices. This is partly why the components tend to be expensive.

The good news for the home or small business is the time tested components used by the large organizations, while expensive, are likely to last longer than your need.

If I get some feedback on this introduction, I will introduce ‘task vs tool’ in the next ‘letter’ I write.

Designing the home NAS

In an earlier installment, I pointed out that popular branded solutions are surprisingly expensive for low-performing hardware.  Reviews indicate that they have rather poor performance.  So for my comparison, I’ll use the Synology DiskStation DS1513+, which reportedly has good performance, more than what a single-link gigabit Ethernet connection can handle.

It has quite a few things in common with the home-made solution:  Multiple Ethernet ports that can be used for redundancy or increased throughput, the ability to host related servers as “apps”, and not user-friendly enough for novices to do the initial setup.

While I was doing this, the Synology DiskStation could be found for $830.  It contains a dual-core Atom D2700 running at 2.13 GHz and 2GB of DDR3 RAM.

Now, there are two ways to approach this.  Clearly a competent file server can run on low-end x86_64 processors with a small (by today’s desktop standards) amount of RAM.  The original FreeNAS was commonly used with hand-me-down hardware.

But, times have changed.  The new FreeNAS, a rewrite by iXsystems, was designed with more modern concerns in mind:  RAM is much cheaper now and the system can be more capable and easier to write if it doesn’t have to cope with low-RAM installations.  In addition, the safety of ZFS against mysterious data corruption relies on the RAM not having mysterious corruption too, and should be used with ECC RAM.  Then comes dire warnings about Windows file shares (CIFS) being single threaded and thus needing a fast CPU (as opposed to multiple slower cores), and features such as encryption demanding ever more CPU performance.  Oh, and the Realtek NIC used on many consumer motherboards is not good for FreeNAS; it needs an Intel NIC.

In short, I’m looking at a server-grade system, not a typical desktop or “gamer” enthusiast system.  What you don’t need is fancy overclock support, sound, lots of slots and multi-video-card support, etc. so a low-end server board is actually about the same price as a “fancy” desktop motherboard.

In particular, the Supermicro brand comes highly recommended.  I could have gotten an X9-series server motherboard and put a Xeon E3 v2 CPU on it.  But why stop there?  I spent more to go with the newer X10-series board and a Xeon E3 v3 “Haswell” CPU.  The X10-SL7-f in fact contains an 8-channel SAS controller as well as the usual 6 SATA channels, sprouting a whopping 14 SATA connectors on the motherboard.  It also features IPMI 2.0 on its own dedicated network port, which is a wonderful feature and I’ll have more to say about it later.

So without further ado, here is the breakdown of my build:

Parts List

Item Description
ICY DOCK MB153SP-B 3 in 2 SATA Internal Backplane Raid Cage Module$63.99
Intel Intel Xeon E3-1245V3 Haswell 3.4GHz LGA 1150 84W Quad-Core Server Processor$289.99
SUPERMICRO MBD-X10SL7-F-O uATX Server Motherboard$239.99
SeaSonic SSR-360GP 360W ATX12V v2.31 80 PLUS GOLD Certified Active PFC Power Supply New 4th Gen CPU Certified Haswell Ready$59.99
Fractal Design Define R4 Black Pearl w/ USB 3.0 ATX Mid Tower Silent PC Computer Case$99.99
ZALMAN CNPS5X Performa 92mm FSB (Fluid Shield Bearing) Powerful Cooling Performance CPU Cooler$19.99
2 × 8GB PC3-12800 DDR3-1600MHz ECC Unbuffered CL11 HYNIX Memory178.92
Total without drives$952.86
WD Red WD40EFRX 4TB IntelliPower 64MB Cache SATA 6.0Gb/s 3.5" NAS Internal Hard Drive -Bulk3×$189.99 = $569.97
Seagate ST4000DM000 Desktop 4TB 64MB Cache2×$155.49 = $310.98
Total for Build$1833.81

The raw power seriously outclasses the DiskStation, and is only $120 more.  With the X9/v2 option, it would have actually been less.

Oort build - inside

Above is the result, Oort, with the side open.  You can see the stack of 8 drive trays, and the large heat sink over the CPU.


Here is a front view.  The grill along the edges allow air intake from the front.   The blank front face is imposing and mysterious… I wonder if I can get some artwork over it?


And finally, with the front panel open.  There is foam sound-dampening on all the case surfaces including the inside of this door.  The ICY-Dock hot-swap bays are now accessible.  I plan to use these for backing up and mounting off-site volumes while they are resident.  The main drives require side access, which is simply a matter of removing two thumb screws.

Now back to the details.  The X10 (rather than X9) series mainboard allows the use of the newer Haswell processors, which run cooler and save power.  The onboard SAS saves what would be a hundred dollar PCI card, and is much easier as well since it provides common SATA-compatible connectors.  And finally, this motherboard has the wonderful IPMI 2.0 with full KVM-over-LAN.

For the CPU, I looked at the the chart in Wikipedia, along with the prices and availability at NewEgg.  I chose the lowest (cheapest) Xeon E3 that had onboard graphics and hyperthreading.  Why do I need onboard graphics if the system doesn’t have a monitor?  I think that the monitor-over-LAN feature still requires an actual VGA; it doesn’t emulate one, but just captures the output.  There is a more primitive remote management feature that allows for a TTY-style console (also over LAN), but I don’t think that helps with initial BIOS screen stuff.  Also, with the standard built-in GPU I can use it for computation other than drawing graphics.  Maybe it will accelerate other software I run on the box at some point.

I’m keeping the box in a closet which besides building up heat from the machines gets afternoon sun on the outside wall.  The closet is warm in the summer.  My experience with the stock cooler that comes with the CPU is that it’s loud or even inadequate.  Looking through NewEgg, I looked for this style with low noise and a good price.  I normally like this style in part because it takes a standard square fan which can be updated and replaced, but the Zalman is known for quiet fans too.  I mounted it, not with the thermal grease that it came with, but with Phobia HeGrease, carefully applied and spread.

The RAM was not available at NewEgg.  Apparently ECC but not Buffered/Registered also is uncommon.  Buffering is used to facilitate having many more memory sticks on a board, which is not the case of this server-but-desktop board.  I found it at a specialty RAM site,, which has a wide selection.  To be on the safe side, I looked at the brands that Supermicro had tested on this board, and took the cheaper of the two.  16GiB uses two of the four memory slots, so it can be doubled in the future.

I use Seasonic power supplies, and that’s another story.  I looked for “Haswell support”, which enables a new improved stand-by mode.

Now for the case:  Some mentions on the FreeNAS web forum led me to Fractal Designs.  I followed up by reading reviews and the manufacturer’s web site.  There are a couple models that are so similar that I wonder what the difference is!  Since there is no direct explanation, it takes reading the specs very carefully and comparing the dimensions to spot the real differences.  This R4 features an internal stack of 8 HDD trays (with the anti-vibration mounting) plus two half-height 5¼″ external bays.  If you include two SSDs stuck elsewhere, that is 13 drives total, which is nicely close to the motherboard’s support of 14.

I chose an option with two external bays so I could fit a 3-disk hot-swap backplane.  Here I went with the name-brand ICY-Dock and a with-tray design, because I had trouble with trayless on Mercury.  So using the front-loading drive bay requires the use of two mounting screws, which is not very handy as it turns out.

Worse, the “2 half height bays” is a little exaggerated.  It’s more like 1.95 half height bays, as a large bulge protrudes into the area where the bottom bay should be.  I had to remove the bottom piece of sheet metal from the ICY-Dock in order to squeeze it in; this also got rid of the normal mounting ears.  I’ll make a bracket some day (a perfect job for a 3D printer), but it fits tightly and is not heavily used, so I left it without screws for the time being.

Other than that, assembling was easy and straightforward.  Testing proved interesting and adventuresome, and I’ll tell you about that later.