Monthly Archives: April 2018

Writing C++ in 1992

In the previous post, you may have noticed a few odd things about the specifics and the design.  For example, switches were described as TRUE and FALSE.

And here is a small utility header in the project:

// a few simple things used by the rest of the code.

typedef int bool;
typedef unsigned char byte;
#define TRUE 1
#define FALSE 0

There was no built-in bool type or true/false keywords at this time! You might find that hard to fathom, but as I write this in 2017 the existence of std::byte is brand new and almost no code uses it yet.

Of course, when I learned C there was no such thing as the void type for making void*, names of structure members had to be globally unique and if you used the wrong member name for a variable there was no error — you just got the fixed offset represented by that member name, applied to the wrong type of variable. And functions declarations did not have arguments declared, and the compiler did not check what you passed when calling a function anyway.

Some of the names were scoped as being in cmdl, but that’s only the flag enumerations defined inside the class (enumerators scoped to the enumeration name itself didn’t come until C++11). The various classes used though are defined globally. Why? Because there were no such thing as namespaces.

The cmdl_int type is written specifically for the int type. Why not a template? Because there were no such thing as templates. And BTW, int for me was 16 bits, running in “real mode” 8086 code.

The makefile shows a symbol that’s defined if I’m compiling under Borland C++ 3, which supports the 2.1 version of the C++ specification.

#ifdef VERSION21
cmdl::errval cmdl::error= OK;
#else
errval cmdl::error= OK;
#endif

Originally, the type (to the left of the name being defined) was implicitly known to be in the same scope. This was changed in version 2.1

I also spotted this little gem:

for (int loop= 0; loop < len; loop++)
string[loop]= commandline[loop];
// _fmemcpy() not available in TC++1.01 (a.k.a. "second edition"). Bummer
string[len]= '\0';

Turbo C++ 1.0 was released in May 1990, and TC++ 1.01 was released February 28, 1991. Borland C++ 3.0 was released in 1991. That should indicate the true vintage of this code.

Wikipedia chronicles that C++2.0 was released in 1989.  As it so happens, I was a reviewer of the spec and documentation before it was finished, and got my name in the Annotated C++ Reference Manual.  This added, of note, multiple inheritance, abstract classes, static member functions, and const member functions, and placement new.

Version 2.1, noted above as this code was used when 2.0 and 2.1 compilers were both in use, added partial nesting of classes. So that explains why none of the other types were nested inside cmdl — you could not do such a thing!

Composable Command Line Parser in C++ — in 1992 !

I just watched Phil Nash’s presentation “A Composable Command Line Parser” and that made me reflect on the subject.

Once upon a time, I developed an easily-used command line parser in C++ with review and feedback from the community (Fellow TeamB members, and regulars on CompuServe’s DDJ and CLM forums including other authors).  Note that this pre-dated the world wide web and mass access to the Internet.

I ended up using it a lot, for most every testing, benchmark, or demo program that I created.

Philosophically, declaring gobal variables that reflect the command line parameters is essentially declaring the parameters being passed to the program.  It was designed to be composable so that a library could come with its own (possibly obscure) arguments to control it, and the program would just automatically respond to them.  For example, a logging component might have options for specifying the output location, verbosity, archiving behavior, etc.  Any program that used that logger would have those options available.

Here is the original documentation, last saved December 22, 1992.

I’ll continue in the next post with some observations about the C++ language circa 1992.


C++ Library for Easy Command-Line Parsing
by John M. Dlugosz

I’ve always felt that the argv[] array was difficult to use.  Not bad, just primitive.  If all you have are a couple arguments, it is not too hard.  But you still have to check for the correct count and convert each value to the proper type.

If your program has various flags and switches, things can get much more difficult.  How many programs have you written and suffered through the argument processing?  In how many programs have you wished you had a better way?  In my case, I’ve written many simple programs that could benefit from command line arguments, but found it more trouble than it was worth.  So I was stuck with a simpler, less flexible program.  For test code and such, I would even change a value and recompile, instead of adding a nice command line processing.

Now, I do have a simple way.  It has revolutionized the way I write small programs.  Rich command line argument processing, sign-on messages, and help on usage are now trivial.

Here is an example.  Consider a program that takes a -v switch for verbose mode.  Using this library, this is accomplished by including the definition

cmdl_flag v ('v', "requests verbose mode");

to make the program recognize the flag, and code such as

if (v()) {  //do this in verbose mode
//whatever...
}

to respond to the state of this flag.  There is no messy string manipulation, error checking, or anything.  The library automatically handles -v or /v forms, disabling a switch with -v-, cascading switches such as -vbx, and other features.

Notice the definition of v above takes two constructor arguments. The second argument is a string that provides usage information.  The library will automatically generate the usage message, collecting the messages from all the parameters in the program.

Concepts

The basic idea is to model command-line parameters as program arguments. That is, they should be analogous to arguments passed to a function. In a function call, each value passed is bound to a name in the called function.  By analogy, a program argument is a name which gets bound to something which can be specified on the command line.  To provide for command line input, you declare those arguments you want to receive, along with their types.

The cmdl library has a type for each type of command line parameter:  flags, integers, strings (more can be added).

The constructor is given the name of the parameter, as used on the command line.  It can also be given a help string, and flags.  Here are some examples:

typedef cmdl_flag flag;
flag v ('v', "requests verbose mode");
flag s ('s', "specifies alternate algorithm");
flag T ('T', "prevents the foobar from clearing (debugging)"  ,cmdl::once);
cmdl_string pos1 ((char*)0, "first positional parameter", cmdl::required);
cmdl_string pos2 ((char*)0, "second positional parameter");
cmdl_string pos3 ((char*)0, "third positional parameter");
cmdl_int count ('c', "iteration count");
cmdl_help helper;

This shows the following types:

  • Type cmdl_flag is a simple switch.  Using that flag makes the parameter TRUE, if absent it is FALSE.  You can also turn off the switch by using the name with a trailing - sign.  (The library takes care of cascading switches, too.)
  • Type cmdl_string allows input of an arbitrary string.  The syntax is somewhat flexible, with the argument separated from the keyword by a space or an =, and the string can be in quotes.
  • Type cmdl_int allows input of an integer.  The input is checked for valid syntax.
  • Type cmdl_help provides for an automatically generated help screen if the command line is empty, or with the -? switch.

Except for the special cmdl_help class, the constructors take two or three arguments.  The first is the name of the command-line parameter. This can be given as a single char or as a string. If passed (char*)0, there will be no name and it is taken to be a positional parameter, explained later.

The second constructor argument is the usage help string.

The optional third argument to the constructors is a bank of flags. once indicates that the argument can only appear once in the command line. Ordinarily, repeating it will override the previous mention. The required flag means that it is an error to omit the parameter. There are others, detailed in the code listing.

A flag worth particular attention is keyword.  If present, then the command-line parameter name will not use the switchchar (- or /) to indicate that this is a parameter.  If a keyword is found anyplace outside of a quoted string it will be used as an instance of the parameter.

Using class cmdl in a program

The program that contains these definitions will kick off everything by calling cmdl::parseit();.

This works because the constructor for each command-line argument class linked them together into a linked list.  The command-line argument objects should be global, or defined in main before calling parseit().  In any case, no commnand-line object should ever go out of scope before parseit() is called.

Because the objects link themselves up, the complete collection of defined command line parameters is known.  parseit() will parse the command line, and compare what it finds with the list of possible arguments. It takes care of usage errors and such, so the program aborts if the command line is invalid.  No error checking is required by the main program.

Each command-line-parameter object contains an operator() which provides a succinct way to get the value of that parameter.  There is a default value in case it was not specified on the command line.  If you would rather check for its presence, use the hasvalue() member.

Before calling parseit(), you can use the static member signon() to note a string used during the usage help message.

See the listing of TEST3.CPP and other files for usage of the examples described above.

Kinds of argument names: char, string, and positional

The first argument to the constructor of the command-line parameter objects is the name of the parameter that will be used on the command line. The constructor has two forms.  It can take a char, used for a single letter switch.  Or it can take a string (char*), for arbitrary names.

In addition, the string form responds to a special name of NULL.  Passing in (char*)0 for the name makes it a positional parameter.  The parser will not assign it based on a name.  Instead, it is used for unnamed parameters.

If a parameter does not start with a - or /, and it does not match the name of a keyword parameter (those that don’t use the -), it is taken to be a positional parameter.  It is assigned to the first unused positional parameter you defined.  This lets you mix switches with non-named parameters such as filenames.

Note that positional parameters can be flagged as required.

The Use of C++

A few C++ language concepts may need explaining.

Note the syntax of the flags in the third constructor argument.

cmdl::required | cmdl::keyword

The names here are enumeration constants.  They are created with an enum definition (see CMDL.H, line 50).  The names are defined within the class, and are in the scope of the class.  They are not global, and don’t pollute the global namespace.  So, you have no conflict with a name keyword used elsewhere in the program, for example.  The downside of this is that you qualify the name with its classname, as shown.  Note that in C, you probably would have seen CMDL_KEYWORD instead — the name would contain its “family” identifier as part of itself.  So it really is not additional typing to use class-scoped names like these.

The enumeration constants are given explicit values as powers of 2, so they behave as flags which can be combined with | or +.  The function’s parameter taking the flags are defined as unsigned, not as an enum type (in fact, the enum type has no name.  It just defines the constants). This is necessary because the result of | or + is an int, not an enum type.

The class contains two definitions of enum names for flags that share the same flags variables.  But some are public and some are protected.

Another interesting feature is the use of operator().  See CMDL.H lines 84, 97, and 109.  The operator is defined with the name operator() which is then followed by the parameter list.  Here it has no parameters, so you see two sets of ()’s in a row.  The operator is invoked by following the object name with the parameter list, as shown in the test programs.

The positional parameter ability requires you to pass (char*)0 instead of just 0 because 0 is ambiguous — 0 can be a char '\0' or a null pointer.

The Parser

The core parser code breaks up the command line into tokens and looks up names of parameters.  The value of parameters is sent to the matched object for conversion to the proper type.  The virtual scan() function does this final part.  An earlier version of this library had a seemingly more flexible system that allowed significant customization in the specific parameter type’s code.  However, it proved too clumsy and was never really used.  This points out a good design philosophy:  Make a thing just flexible enough.  If it is too configurable, it can become as difficult, or more so, to use as writing code each time; which is exactly what the library is supposed to avoid.

The parser uses a class cmdlscan for low level character manipulation and tokenizing.  It is planned to give this more power in the future, for better error reporting.  Some of the implementation details are implemented as they are for that reason.  The need for a parser class was indicated because several related values, including the string and its current scan position, were always being passed together.  When things like this happen, think about combining them into an object.

Error and Report Output

I did not want the code to simply use cerr and cout for output. This may be used in programs that have their own idea of I/O, including programs that run in graphics mode.  For maximum flexibility, all output is separated.  The final results are funneled through a pair of functions called cmdl::output(), both defined in OUTPUT.CPP.  If linking in CMDL.LIB, you can supply your own versions of these two functions to handle output your way, without having to recompile the cmdl library.