Author Archives: JDługosz

Classic “Aggravation” board game

The Memoir

When I was a kid, my family of 4 played a game called Aggravation.  According to the history recounted on Wikipedia, this was the original version and is no longer made.  I recall it was a piece of thick material similar to what I’d buy today as high-density fiberboard, with holes in it.  This was placed on top of the bottom of the box which was printed with colored patches that would show through the holes.  The box had short sides so the marbles could be placed within and the lid put on, for storage.  (Here is the same board from 1962, but the marbles were different and I think the lid was slightly different too.)  I think it was obtained in 1964 by my parents.

Photos from Julie of the LeolasAttic store on Etsy.

Vintage 1962 Aggravation board courtesy of LeolasAttic

Later, I got a 6-player version for a birthday present.  This had a folding board like many board games used, but was this single board only—there was no backing, so the holes went through and you could see the table through the holes.  Instead, colors were drawn around the holes to indicate different zones, and this gave it a completely different character.

Now I see the 6-player set for sale as “Rare! Original vintage deluxe party edition antique” circa 1972 for $130.  I believe I’m missing one of the colors of marbles and have substituted ordinary cats-eyes for that color.  Even so, I hope it turns up in my parents’ garage in the same good condition!

I must say, it lived up to its name.  Even rolling the die (by a child barely old enough to play) might fly across the table and knock into marbles in play, fall off the table, or (most irritating to some) be dropped in place so gently that it was not sufficiently randomized.

I don’t remember personally, but I’m told that my maternal grandfather called the center the “knockout hole” and also enjoyed the game with family, when it was first released.

This is another family game that I’d like to revive, particularly with my mother-in-law visiting.

It would be easy enough to make a copy of the classic version, with a couple pieces of hardboard.  But not easy enough, right this moment.  It would be even simpler to just print out the board on paper.  After all, the historical predecessor that this is a variation of was normally made out of cloth, and many other modern versions are flat boards that don’t need holes.  It’s just a matter of using more typical game tokens rather than marbles.  These can be borrowed from another game, or, to preserve more of the original flavor, I can use glass stones from a Pente set.  (I see my “Vintage Pente Game (1977)” being sold for $125 now!)

The Project

My idea is to make it modular.  Not only does this make each module fit easily one a standard-sized sheet of printer paper, but it can then be customized for the specific situation.  Any number of players can be used by using 3, 4, 5, 6 (or more?) modules, and there will be no dead zones or asymmetry from using a board that accommodates more players than are in play.  It can also arrange the different colors in any order, to suit seating preference.

So, each module is the section of track that us a “W” shape, with the center row being one player’s home.  The modules will overlap at the first/last spots which form the inside corners.  A spot for the knockout hole needs to be positioned in the center, separately.

The track module is 6 spots tall and 5 wide.  So, it should be easy to lay out on a grid using Adobe Illustrator.

New Variations

Now, Aggravation is a specific variation owned by Hasbro.  The larger family of Pachisi (पचीसी) games is ancient, though, and there are many westernized commercial versions of the idea.  So, if I come up with my own unique variation, I can publish it as my own game that needs a suitable name.

I think the modular nature is key to making a new variation.  Adding and removing modules can be a dynamic effect of game play, not just a configuration made before starting the game.  I found a similar game called “marbles (or pegs) and Jokers” that has a modular track.  But it doesn’t have the knock-out hole or the shortcut track of the 6-player board.  And that’s the best part!  So my variation will feature more self-intersecting tracks and shortcuts.

I have a general idea of adding/removing sections that provides for a loop at each corner, and then a flying bridge shortcut between these loops.  A player can spend chips (use poker chips) to perform the actions of adding, removing, or moving track modules.

Now here comes the clever part: whenever starting a new token out of the base, the player gets a chip.  This means that attacking a player — knocking out his tokens repeatedly so he has to start over — also makes him stronger in terms of these board-changing powers.

Another variation would be to use different types of dice, with 4, 8, 12, or 20 sides, as used in role-playing games.  Simply using a different die, perhaps scaling the board to match the range of motion from one throw, isn’t much of a change.  It would be interesting to use difference dice throughout the game, giving a speed boost for example by rolling a d12, or making it easier to snuggle up the tokens in the “home” by using a d4.  I don’t have any good ideas as-yet as to decide when to allow these choices.

…work in progress…

 Files

I’ll post PDF and Illustrator files when ready.

 

Yahtzee in Chinese — scorecard adaptor

My mother-in-law is visiting from China, and one of the games my family has always played while I was growing up is Yahtzee.  Although now I have official published score cards, I recall as a small child that we originally had home-made sheets produced by my Great Aunt Harriett.  They probably date from about the time the game was first introduced and popularized: I read in Wikipedia that innovations made by Yahtzee® over the traditional forms include the upper-section bonus.  The sheets I remember did have an upper-section bonus, but had the Big/Little straight idea as shown for Yacht, and I recall it had a Pair.  So, it had to have been influenced by the E.S.Lowe product some time after 1956, and I know my parents were playing it in the 1960’s.

I’ve played Yahtzee since I was old enough to understand it, sometimes in large family gatherings with parents and grandparents.  It was always a favorite of my Mom’s.

So naturally I thought it would be great to play during the holiday season with my mother-in-law’s visit.  The catch is that she doesn’t speak English.

Yahtzee-adaptor

I had an idea to make, not a translated score sheet to use in place of our English sheets, but an adaptor.  Originally, I thought to make a stiff card, printed on letter-size paper, that the sheet would attach to using paperclips.  So, it would contain translations for the score information (the first two columns of the printed sheet) that exactly line up with the rows of the score sheet, to the left of the sheet; and general notes and instructions that could point exactly to the rows it referred to.

So, the main design work involved exactly duplicating the grid spacing and position on the sheet.  That did not seem as simple as it should be in Illustrator, so I posted a question on StackExchange.  I quickly got comments that Illustrator doesn’t do that easily but InDesign has a table tool.

While playing (initially using translations on a separate sheet), I noticed that it was proving difficult to use columns beyond the first two games.  So I modified the design to take care of this also:  I reversed the attachment idea, and now have the card attached on top of the score sheet.  The new legend appears on the right, and can be shifted over for each new game.  This turned out to be a good design in general as now the explanation can continue to the right of the name, as wide as desired.

The question then became how to attach the papers when using the rightmost columns on the score sheet?  There is no paper beyond that to clip under the card.  I solved this by having the card both in front and behind the score sheet at the same time: a careful cutout and fold, and the letter-size page can provide a “tail” that supports the entire width of the score sheet in any position, from behind.

As planned, the adaptor is lined up such that the papers can be aligned on their vertical edges, and then two paperclips will hold them perfectly.  I do suggest using smaller clips that I have in the photos: less than one inch long and they’ll naturally cover only the title area above the part you write on.  The photo above shows the adaptor card positioned to the right of the Game 5 column (nearly at the right border), and the score sheet is clipped to the folded-back strip along the entire top edge and holds securely.  I printed on 32# HP Premium printer paper.

A final improvement concerned the scoring.  I noticed some confusion in remembering that some rows used “total of all dice” and others used just the matching dice.  So I color-coded the different scoring schemes in the “score description” column, as well as color matching the row where the upper total is copied down below.  And as long as I was shading things, I shaded the “name” column to indicate which rows represented turns to be scored, as opposed to totals and bonuses.

Here is the  PDF File of the Simplified Chinese Yahtzee Scorecard Adaptor.  Be sure to print as “actual size” and simply allow the margins to be cut off as needed.  This is free to use with attribution, under the CC-BY 4.0 license, so here is the InDesign file.  If you make one for a different language (just change the text in my layout) or other variations, let me know and I’ll link to it or include it here.

Updating Urth

Isaac Asimov wrote several stories starring Wendell Urth, a leading extra-terrologist with a fear of all mechanical forms of transport.  In February 2011 thought I’d make my own character after him, with some explicit homage to the Asimov stories.

The character I invented is Professor Charles Xavier Urth, set in present day.  In his office he has a signed magazine cover framed on the wall, from 1955, inscribed “to the real Doctor Urth ⊕”.  The implication is that Dr. Robert Urth, Charles’s grandfather, was someone Asimov knew from his circles of interest in mysteries, Sherlock Holmes, and Nero Wolfe.

I pulled out a plot idea I’ve had in the back of my mind for even longer time, where a glitch or paradox or some kind of problem with quantum mechanics and/or time travel experiments causes the universe to simply seal up the offending region behind an impenetrable wall: like an oyster secreting a pearl around an irritant.

I purposefully begin the story in a manner typical of Asimov’s stories, opening with the main character’s name.  Furthermore it’s a non-action type of activity.

Although told in 3rd person, I want to make the way in which the events are perceived and related follow the perceptions of the character currently in focus.  For example, when the efficient secretary is interacting with the agents they are referred to by name, but when the professor is dealing with them they are simply “the first agent” etc. since he didn’t notice them as individuals or remember their names.

With these things in mind, I pointed the character at the situation to see what happened.  Of course, nobody knows the details of the anomaly at the beginning, so I don’t have to either!

Of course, my Professor immediately sets off to do something uncharacteristic, flying off to visit the anomaly.  I wanted a more explicit link with the mysteries of Nero Wolfe, where Professor Urth doesn’t travel.  Revisiting the story, I think he would do more telecommuting with a multi-window screen of video conferences to various grad students and in-field assistants.  After all, my own work meetings can have 16 windows on my 70″ living room screen with participants all over the world.

Another note about the character’s name: Professor X (Charles Xavier) has a logo that is a circled cross, like the mathematical symbol ⊗. Urth’s logo looks like a circled plus: ⊕, the astronomical symbol for Earth (this is used in Asimov’s story The Key.

The idea of a mindquake comes from GEB.

Short Story beginning/draft — “The Pearl”

Copyright 2011 by John M. Długosz

Professor Charles Xavier Urth was doing what he did best. His first task that morning upon arriving at his campus office was to read some scientific papers, magazine articles, and other clippings that were prepared for his attention. About two thirds of the way through the first paper, he had an idea. Reflecting in thought, one thing suggested another, and so on for a chain of half a dozen topics. Finally, the sixth thing—which was totally unrelated to the subject of the paper—apparently bore a relationship to some other topic that he had not previously realized.

A “mind quake” ensued, with symbols in his brain rearranging themselves in light of this new revelation. Normal people do such reorganization normally during a certain phase of sleep, but never on such a scale or for such an extended period of time. His brain activity was very much like that of a normal person in deep sleep in some ways, but also like a normal person having a simple partial seizure.

Professor Urth sat at his desk, seemingly relaxed, his eyes half closed, the print-out in front of him forgotten, for two hours. He still had not moved when the visitors arrived.

Nancy checked on Professor Urth every fifteen minutes when he was in such a state. When she came back into the outer office, she found the two gentlemen standing by her desk.

As way of introduction, one of the visitors took out a wallet-sized case and unfolded it to produce a very official looking badge and a laminated photo ID. He spoke his name as she read it: “I am Special Agent Sebastian Floyd, from the F.B.I.” He pronounced it so you could hear the periods. “My partner is Special Agent Christopher Friday.” Friday nodded in her direction at the mention of his name.

“We are hoping The Professor,” (you could hear the capital letters) “can help us with an investigation. It is a matter of some importance.”

“Of course,” replied Nancy. “Professor Urth will be available within the hour, and if you care to wait here, I’ll make sure he sees you immediately.”

Floyd was not used to waiting. “It is most urgent. Can we see him right away?” He was not sure but was under the impression that Urth was in his office. In his work, busy was not the same as absent, and he was used to interrupting people and making his Official Business take precedence.

“The Professor Must Not Be Disturbed at the moment.” Nancy made sure Floyd could hear her capitals. Her face wore the expression of a stern school mom. Then she softened her expression and explained, “If you want his help, you most certainly want his willing cooperation. And undoubtedly you want to make use of his talents. So, you need to accommodate his ways. His talent has needs of its own.”

She sat down at her desk, and motioned to the row chairs along the wall by the door. There were seven guest chairs in the outer office: how many students and suppliants of various importance have waited here for an audience, and for how long? The agents had been briefed, and were prepared to wait however long it took. But it didn’t hurt to check first, if only out of habit.

The agents sat down, and Friday, the junior agent, took out a tablet and started reading some documents. Research materials and ideas for this case, presumed Floyd. Floyd was more pure detective than technician, and preferred to wait with his own thoughts for a while.

Professor Urth’s fits never lasted more than three hours. Through long practice, he had trained himself to limit them and apply them to useful purposes. If he showed any unusual signs, or failed to come around in time, Nancy would take action. So, he continued, in comfort and safety, to juggle symbols in his mind. Later, when asked about a particular problem in group theory or certain practical applications of it which include self-assembly of nanostructures and unification of Dark Energy to the rest of known physics, Urth would realize that a particular long-standing mathematical problem could in fact be solved and give instructions on doing so. If anyone ever asked.

While the visitors sat, Nancy checked on Urth twice more. The second time, something about his demeanor prompted her. She tapped his hand and called softly, “Professor…” Not expecting an immediate reaction, she went to the small refrigerator tucked under a table along one wall. She took out a juice pouch, shook it for good measure, and ignoring the thin straw and all instructions printed on the pouch, used scissors to clip a corner, and poured most of into an elegant blue Chinese teacup. She parked the filled cup out of reach and tapped Urth’s hand again. She finished the remainder of the juice in the pouch herself while she waited.

As the Professor emerged from his fit, Nancy placed the drink in his hand, purposefully folding her hands over his larger one to mold his fingers to the cup. When his eyes showed that he was lucid, once again sharing this mundane world with other people, she briefed him. “You have visitors. Two F.B.I agents wish to see you. Shall I send them in now?”’

Special Agent Floyd repeated his show with his credentials, this time in the inner office. Urth showed no interest in looking at them, or even in remembering the man’s name. Furthermore, Urth waved off the agent’s introductory spiel about how he came to be referred to Urth and (presumably) that there was some great problem that needed to be solved. The Professor trusted that if the agent got this far and got in to see him, it must be an issue for him to at least hear out. He didn’t care who the agent’s knew that got them referred here; they were here now.

“Tell me about the problem.”

“Well,” said the first agent, “there is a ‘physical anomaly’ at the Organon Research facility. We, by which I mean everybody, is at a complete loss.”

The second agent handed Urth his pad. The display showed a room containing most of a large sphere. It appeared to be about three meters in radius; if it was a complete sphere, the lower half must be mostly in the room below, with portions in the room above and adjacent. A small amount of space was visible through the door along the curving wall of a giant pearl.

“This just appeared out of nowhere,” the second agent explained. “It is a perfect sphere and cuts through the walls, floor, and ceiling, but is mostly in this lab. A witness in the next room reports that the curved surface simply appeared along his lab’s wall, with no sound or disturbance. It was, as far as anyone can tell, just there.”

Professor Charles Xavier Urth was shaken. More than anyone, he knew that the universe works according to a set of rules. Not only did he know those rules to a great level of detail, but he intuitively grasped interrelationships and deep underlying principles at work. A macroscopic object appearing out of nowhere violated conservation laws that were a direct result of the deepest principles underpinning the universe. If it was merely hoaxed, it would be a serious puzzle to solve. But if it was genuine, it would be a prize indeed.

The realization that it meant travel to an unfamiliar place only slightly dented his excitement. “OK, let’s go.”

The photographs did not do justice to the anomaly. Seen in person, the surface of the sphere was, to a first approximation, a metallic gray that was diffuse rather than mirror-like. However, it had a luster and finish that looked simply strange. It was not like anything else, but not in any obvious way. It was not iridescent because it didn’t show color fringes, and yet it seemed to shimmer. It had enough shading to make it appear properly round, but less so than any real object, like it was drinking up some of the light rather than scattering it. Sighting along the edge, it seemed to bend light slightly. When not looking at an edge, it was hard to tell exactly where the surface was.

“Is it safe to touch?”

Illustrating his answer, Davis Holden placed his hand against the weird surface. “Everyone who first saw it touched it before realizing how remarkable it truly was. Nobody has had any complaints.”

Professor Urth gingerly touched it with his fingertips. “It’s cold,” he observed.

Holden explained, “it’s the same temperature as the air, but it conducts heat very well so it feels cold to the touch. Our experiments indicate that it might be a superconductor of heat, in fact. The research into the anomaly is next door.

Where the wall intersected the sphere it formed a circle about two meters in diameter. It bulged into the room, but left the bulk of the lab space available for normal use. The original contents of the lab had been removed to make room for the instrumentation and people tending the instruments.

“Just about everyone dropped what they were working on and came to study this, and brought whatever kind of tools they had.”

Someone gave Urth a fresh print out detailing the physical properties that have been determined. Optically, it scattered all light with 100% reflection, but not in a normal manner to form spectral highlights; rather, it seemed exactly backwards, preventing light from reflecting near the angle of incidence, so it looked gray and strangely flat. Light grazing the limb was bent sharply away from the expected reflection angle. In the infrared, it radiated as expected for its temperature. That it was the same temperature as the surroundings was verified with contact thermocouples. That experiment produced a secondary result: tape would not stick to it.

It was a possible superconductor of heat, but a very good electrical insulator. Magnetic fields did not pass through it. Neither did X-rays, but it had not yet been determined if they reflected in the same manner as the laser light. An alpha particle experiment was in progress; the immediate observation is that at least some particles bounce off.

The few chemicals available showed no reaction. Nothing managed to chip or even mar the perfect surface.

Urth added to the list mentally from his own observations. It was moving with the Earth, being held by the cutouts it produced in the walls and floor. If it was too massive, it would have crashed through the building. “Is anyone trying to determine its mass?” The researchers all shook their heads. “I think it might be important.” This he said to Holden. He would make sure someone worked on it.

He continued with his next subject. “I want to know more about Dr. Saunders’ work.” Urth knew from the briefing he read on the plane that the lab space that was the center of the anomaly was formerly occupied by the project of Doctor Jarod Saunders. It seems to have been formerly occupied by Dr. Saunders himself, since he could not be located, his car was parked in the lot, and there was every reason to suppose that he was working in the lab at the time of the indecent.

Holden explained, “it was some kind of quantum computer.”

Urth stopped him. “I read the report that the F.B.I. people gave me. I need to know details! What was unique about it? What equipment was in the room? What was the precise nature of the experiment he was conducting? Where are his notes?”

Dr. Saunders, as it turns out, kept meticulous notes. They were located in the computer that was used to interface with the experimental system, and was currently located well inside the anomaly.

GCD from “From Mathematics to Generic Programming”

For my birthday my sister got me a copy of From Mathematics to Generic Programming by Alexander Stepanov and Daniel Rose.  I had noticed it on Amazon.com and flagged it on my wish list, then forgotten about it, so it was a nice surprise and yet something that she knew I would be interested in.

The bulk of it is about pure math.  It covers the history of math with a small number of continued examples showing how progress was made when domains were generalized.  Euclid described an algorithm for Greatest Common Measure which was geometric in nature.  Much later zero was invented and a couple hundred years later it was realized that a line segment could have zero length, and the Greatest Common Divisor is the same algorithm using division and remainders on whole numbers.  Then came algebraic structures and the same algorithm can be applied to Gaussian integers, polynomials, and beyond.  Noether then studied the question of what, in the most abstract sense, can the algorithm work on, and described the Euclidean domain (or ring).

Returning to the conventional approach of using integers, the function they present is:

//gcd from page 59
template <typename N>  // N is some kind of integer
static N gcd(N a, N b)
{
    using std::swap;
    while (b != N(0)) {
        a= a % b;
        swap (a,b);
    }
    return a;
}

Pretty mundane in this day and age.  But, earlier microcomputers did not have an instruction for division.  I recall figuring out how to implement multiplication on an 8-bit computer in the early 1980’s.  Even now, I can call up a modulo operation with a single symbol and know it maps to a single machine opcode… but that instruction is not very efficient.  I was curious as to just how heavy it is, and found a report giving instruction timing details measured for various architectures.  On the Intel Haswell, I saw it was indeed something of an anomaly in terms of how many micro-ops, how many compute ports it sends them to, and how long the latency is.  Apparently it is a microcoded routine that works out the division just like you would code in the old days; just built-in and already perfectly optimized.

Stein’s Algorithm

Later, the book goes on to describe Stein’s algorithm.  Written for a computer in 1961, it uses bit-shift and subtract and other very primitive operations which are especially simple on binary encoded integers, with no need for division.

template <typename N>
static N gcd(N m, N n)
{
 
//	if (m < N(0))  m= -m;
//	if (n < N(0))  n= -n;
    if (m == N(0))  return n;
    if (n == N(0))  return m;
 
    // m>0 && n>0
 
    int d_m= 0;
    while (even(m)) { m>>=1; ++d_m; }
 
    int d_n= 0;
    while (even(n)) { n >>=1;  ++d_n; }
 
    // both are now odd
    while (m != n) {
        if (n>m) std::swap (n, m);
        m -= n;
        do m >>= 1; while (even(m));
    }
 
    // m == n
 
    return m << std::min (d_m, d_n);
}

How does Stein’s algorithm fare today?  Sure it avoids the big fat modulo operation, but it has a lot of steps.

Here is a summary of my results.  Values are in nanoseconds for one call of the function, having run a trial of 16 million calls.  I’m using Microsoft Visual C++ 2015, and running on an Intel Xeon E3-1276 v3 @ 3.60GHz.
000011

The first four columns is the code from the book, compiled as 32-bit and 64-bit.  In 32-bit builds, the modulo operation of 64-bit values is not a single instruction and the built-in microcode is not accessible.  Stein’s algorithm is a big win in this case.

For 64-bit code, the modulo operation is significantly slower for signed values than for unsigned (though all test values were unsigned).  Stein’s algorithm’s time is between those two, so it’s faster when using signed 64-bit integers.

Because the % operation is so much slower on signed values than unsigned, if there was a need to handle signed arguments, a wrapper function could take the absolute value of each argument and then call a version that used unsigned values.  Remember that any time you are doing division or modulo!  Unsigned types are faster.

I then called the 64-bit integer form with values that would all fit in 32 bits.  Unlike multiplication, division can’t easily quit early when the numbers are smaller.  But, fewer iterations are needed through the loop, so the function is faster.  Likewise, the Stein algorithm will loop less in every stage.  The built-in division regains its lead, as it speeds up more.  That is, the Stein algorithm’s complexity is a steeper curve considering the length of the input values in bits.

template <typename N>
bool odd (N n)  { return bool (n&1); }
 
template <typename N>
bool even (N n)  { return !odd(n); }
 
⋮
 
int d_m= 0;
while (even(m)) { m>>=1; ++d_m; }
 
int d_n= 0;
while (even(n)) { n >>=1;  ++d_n; }

The Stein’s algorithm code is written to avoid division, but what it is using is a lot of test-and-branch instructions.  The above fragment removes all the trailing zero bits and remembers how many there were.

Today’s Bottleneck Issues

Each operation is very simple: test the least-significant bit, shift right one bit, increment an int.  But the branch is a killer!

With a modern superscalar highly-pipelined processing core, a conditional branch breaks the pipeline.  The x86 processors use branch prediction and speculative execution to mitigate this, but here the branch decision is essentially random on every iteration.  It guesses wrong half the time.

Meanwhile, the instruction set does feature primitive opcodes to determine how many consecutive zero bits are at one end or the other of a register.  I can access that via a compiler intrinsic, and eliminate the problematic loops.

template <typename N>
static N gcd(N m, N n)
{
    if (m == N(0))  return n;
    if (n == N(0))  return m;
 
    unsigned long index;
    _BitScanForward64(&index, m);
    int d_m= index;
    m >>= d_m;
 
    _BitScanForward64(&index, n);
    int d_n= index;
    n >>= d_n;
 
    // both are now odd
    while (m != n) {
        smaller_first (n, m);
        m -= n;
        _BitScanForward64(&index,m);
        m >>= index;
    }
 
    // m == n
 
    return m << std::min (d_m, d_n);
}

There is still one loop remaining.  This main while loop ought to be predicted to loop and only fail the prediction when it’s ready to exit.

The column in the results table labeled “BitScan” gives the results after making this change.  All of a sudden, Stein’s algorithm beats the built-in div code in every case!  In the case of small values to 64-bit signed integers, it is 3× the speed!  Interestingly, the running time seems to be determined by the size of the values, without regard to the register size used: 32-bit values run at the same speed whether the code is compiled to use 32-bit integers or 64-bit integers.  That makes sense, since the various primitive operations (unlike multiplication and division) are constant speed and don’t have any advantage to using a smaller word size.

But I still have one more trick left. if (n>m) std::swap (n, m); is a branch point, and it will take one way or the other many times as it loops.  That is, this is another “bad” branch.

A long time ago, branching was far worse than it is on today’s CPUs.  I recall writing high-performance graphics code when any branch was a penalty.  I learned to absorb the conditional logic into the math, and also learned how to generate and manipulate masks in various clever ways.  Here is my quick attempt at a purely computational compare-and-swap:

template <typename N>
void smaller_first (N& n, N& m, std::false_type)
{
    using std::swap;
    if (n>m) swap (n, m);
}
 
template <typename N>
void smaller_first (N& n, N& m, std::true_type)
{
//        cout << "smaller_first input n=" << n << ", m=" << m;
    typedef typename std::make_signed<N>::type sN;
    const auto bitcount= 8*sizeof(N);
    const N mask= (sN(n)-sN(m))>>(bitcount-1);  // all 1's if m is larger, all 0's if m is smaller
//        cout << ",  mask=" << mask;
    const N smaller= (m&~mask)|(n&mask);
//        cout << ",  smaller=" << smaller;
    const N larger= (m&mask)|(n&~mask);
//        cout << ", larger=" << larger << endl;
    n= smaller;
    m= larger;
}
 
template <typename N>
void smaller_first (N& n, N& m)
{
    typedef typename std::is_signed<N>::type tag;
    return smaller_first (n, m, tag());
}

It only works on signed types, so I made the template choose the regular way for unsigned types and the non-branching trick for signed types.  (Actually, as written here it would work for unsigned types if the values were constrained more.)

It works by first doing a subtraction: if n-m produces a negative result, that means the highest bit is set in 2’s complement representation.  So if m is larger, that high bit will be set, and if m is smaller it will be clear.

Then the arithmetic right shift replicates this one bit to fill the entire word size: now I have all 1s if m is larger and all 0s if m is smaller.  In either case, the ones-compliment (~) operator will reverse that.

Each input value is then ANDed with a mask: one mask will always produce zero regardless of the input, and the other mask will have no effect.  Then both of them are ORed together, which results in the one that wasn’t all zero.

This is done twice, once for the larger and once for the smaller, by reversing the mask choice.

This would appear to use a lot of registers:  the two input values, the output values (which can’t destroy the input values before it is done), the 1 mask and the 0 mask.  But after computing the masks, the rest of the steps are not dependent on each previous step so they can all be done at the same time in various execution units, or at least scheduled optimally.

The result indicates that all this math and register usage is cheaper than branching: 44 becomes 39 or 37, 84 becomes 75.  This is about an 11% speed up in the overall algorithm.

Additional experiments indicate that the final test (doing the min at the very end) doesn’t benefit from using this method.  It is only performed once, at the end of the function.  Likewise the tests at the beginning for an argument of zero don’t have a noticeable effect (they are almost always “not taken” and when they are the routine is fast because it doesn’t have to do the main part at all).

 What about cmov ?

This post on Stackoverflow reminded me that the x86/x64 has long since addressed this with the cmov, or conditional move, instruction.  It makes sense that an if statement containing a simple comparison controlling only simple assignment statements could cause cmov instructions to be generated.  However, my attempt to write it that way showed that no cmov was used and conditional branching was still being done.  Yet further on in the function, a cmov is seen, and it comes from a use of std::min.  Looking at the headers, I see std::min is written as a ternary expression in the assignment, not an if statement.

Presumably the std templates are carefully crafted to trigger the kind of code that results in optimization, or the optimizer is written to handle the code generated by the standard headers, or cycles of feedback between the two.

So writing:

template <typename N>
void smaller_first (N& n, N& m)
{
    const N old_n {n};
    const N old_m {m};
    n= std::min(old_n,old_m);
    m= std::max(old_n,old_m);
}

does, on this compiler version, cause cmov instructions and no branching; although it does unnecessarily do the comparison twice.  Other than the lack of branching, the machine code appears rather poor to me.  But here’s the real performance:

000015

The min/max code wins hands down on 16-bit values. It does poorer than the jumping code in all other cases, so the funny arithmetic is still the winner in the signed values where it is supported.

Here is the 16-bit case, which is fast:

cmp         ax,r8w  
movzx       ecx,r8w  
cmovl       r8w,ax  
cmp         cx,ax  
cmovl       cx,ax  
movzx       eax,cx  
 
sub         ax,r8w

The values for m and n start out in AX and R8W, and are reversed if necessary so the subsequent use (the subtraction) uses those same registers.

The generated code uses one additional register, ECX, to hold the original value of R8W (zero-extended to 32 bits) in case the subsequent line copies AX into R8W.  Then the test is performed again between AX (still unchanged) and CX (holding the original R8W), and based on that result may copy AX into CX.  Then it moves CX (which may be the value from the first copy or the conditional second copy) into EAX, again zero-filling it to 32 bits.

I suppose there is a performance penalty to move into 16-bit register fields, so the (non-conditional) moves use zero extension moves to 32-bit fields.  Yet it uses the 16-bit portion in other places, and never reads larger portions even if they are known to be zero or simply ignored later.

This code could be simplified by removing the second CMP and then reversing the condition of the second CMOV.  That second CMOV may be free though, since it can be done simultaneously with an adjacent instruction, and removing it would not free up resources because no other instructions are not dependent on the result.

You would think this identical code would work for 32 bits or 64 bits, just by using the full width of the same registers.  But no, here’s what it generates for 32-bit values:

mov         ecx,r8d  
mov         dword ptr [rsp+0F0h],ecx  
mov         dword ptr [rsp+0E8h],edx  
lea         rax,[rsp+0E8h]  
cmp         edx,r8d  
lea         r8,[rsp+0F0h]  
cmovge      rax,r8  
mov         r8d,dword ptr [rax]  
lea         rax,[rsp+0E8h]  
lea         r13,[rsp+0F0h]  
cmp         ecx,edx  
cmovge      rax,r13  
mov         edx,dword ptr [rax]  
 
sub         edx,r8d

Besides using EDX and R8D for the two values, it uses ECX for a copy as with the first case, but also uses RAX, R8, and R13 as 64-bit pointers, and stores the values in stack-based memory locations as well, [RSP+0F0h] and [RSP+0E8h]!

The values are stored in memory, with pointers to those addresses in two new registers.  The comparison is done with the values (still) in registers, but the pointers are what are conditionally copied, and then the resulting pointer is de-referenced to get the matching value.  Besides chewing up 3 more registers it requires 3 memory accesses!

The 64-bit case works the same way, with the value-carrying registers and memory allocations widened.

The “Blood Moon” is a dull ember only just visible in a hazy sky

My photos of the lunar eclipse did not turn out well when the moon was reaching totality: basically, underexposed because the moon was (nearly) gone! I recall from earlier shots that 1/125 second was about as slow as would work, due to motion blur from atmospheric effects and the moon’s motion. So I left it at 1/125 with maximum aperture (f/5.6), and increased the ISO as the moon disappeared.

However, I integrated 12 exposures taken as a burst, giving essentially 12/125 or about 0.1 second. Even though the exposures were made within the space of 2 seconds, each one showed the image in a different position, which illustrates why a longer exposure is blurry. By chopping it into separate short exposures I was able to manually align the separate images.

Lunar Eclipse “Lantern”

This simply adds the pixel sample values together. Dedicated software, such as used with astronomical instruments, would do better at removing the random noise as part of the process. I did noise reduction on the combined exposure.

Yes, the sky really is purple.  There was a visible haze here, and later clouds were visible over the moon.  I calibrated the white balance on an exposure of the normal full moon taken just after the eclipse ended, setting that to neutral grey.  The same profile was applied here, so the red tone is visible and accurate.

The last bit of direct light was just touching the limb, and that is pure white and overexposed in this image.  By eye, the area between the white tip and the red blush did appear more turquoise (blue/green), but that’s a perceptual illusion due to the fact that it’s simply less red than the neighboring region.  These colors did not show up in the true-color photo.  I suspect that the dark colors next to a full-sunlight bright spot affects the eye differently than the camera sensor.

Also notice how the upper-right blends into the same shade as the surrounding sky.  That’s how dark it appears: only just an ember separating itself from the empty haze.

The picture loses something in the JPEG compression, and the sRGB color space is disappointing after viewing it in Photoshop in the full gamut available to my monitor.  But you get the general idea.

Fixing a Windows 7 Laptop

A friend of the family had a laptop that would crash during the boot process.  It appears that a file loaded and run during booting must be corrupted.

The laptop came with Windows pre-installed.  He did not have a physical disc or a readable recovery disc.  The built-in Repair feature announces without elaboration that it can’t help.

This uses a SSD, which may have been “fixed” simply by scanning it, as bad sectors are mapped out automatically.  My own recovery discs refused to perform the repair.

So, with his permission, I re-install Windows after deleting the entire partition and resetting the partition table.  I have all the files copied off in case there is anything needed.

This ought to be simple, but being Windows, there is a synthetic problem: licensing.

First, I need his “Number”.  It’s not on a sticker on the bottom of the machine.  It’s from Dell, so I don’t know why it wasn’t noted there.  I peruse the old disk data to find the information, and most of what I find on obtaining this obfuscated information is reposted from the same original article that refers to a range of bytes in a registry key, but in my case those bytes are all zeros!

What I finally found that did work is this page.  I suppose the details have changed with SP1, or is different for a volume-installed original like Dell used.  In any case, I got a number that looks like the right kind of thing.

Second, I no longer keep a “Universal” MSDN subscription, and my own Windows is the Ultimate edition.  His was Professional.  All versions are on the DVD, if only you can select it!  Deleting one little text file turns it back into a Universal installer, prompting me which version to install.  But it’s on a DVD, and a boot DVD at that.  (Making a boot flash drive would be the same issue, with additional chance of incompatibility, plus I wanted to give him a copy to keep anyway.)

I found an excellent tutorial on re-burning the Windows install DVD so that it still boots and works correctly.  It used free software for Windows, which was OK.  I did the saving and poking around on the bad disk under Linux in case it was caused by viruses, but I switched to Windows for burning the DVD.  I was able to make a DVD with his personal info on the label, to keep for future use.

Now, the installer did not work!

After copying files and rebooting the first time, it complains that it can’t configure “this hardware”.  I look up the error text in Google and find some documentation about Intel drivers, or newer Advanced Format disks.  It’s addressed in SP1, but this installer is plain. Slipstreaming service packs onto a new install disc is something I have not done in a while…

Following up on some things, I discover the BIOS is set for “RAID mode”.  I was looking for a compatibility mode setting for Advanced Format drives or somesuch, and saw that it was simply set to a useless setting.  The laptop only has one disk, so what’s the purpose of RAID mode, in a pre-built product?  I suppose it may have features beyond the baseline ASPI mode other than doing RAID things?  Anyway, this might explain why the Repair feature would not work either.  It probably is like the (pre-SP1) installer, and doesn’t know about the drive with the controller set in this mode.

Eventually, it installs and proceeds.  However, it doesn’t like the product ID code I gave it.  I’m hoping it’s just not expecting that Dell manufactured version to be installed in the normal way (or not with the regular product disc), but with the number in hand he can “activate” Windows after explaining that he had to replace the hard drive.  If other credentials are needed, I have the old registry hives.  It continued just fine without an ID, and I expect it will work for some period of time like 30 days.

Next, the Dell laptop drivers!  What a pain.  Without even a network driver, I try to get stuff from another computer via their web site.  The first thing I downloaded was a big CAB file that (1) didn’t seem to extract sensibly, having multiple files of the same name and (2) no instructions.  I found the right network card driver as a EXE installer and copied it over on a flash drive.

So the rest should be automatic right?  Wrong.  Windows Update doesn’t know about the Dell drivers.  The Dell site complains that the web browser is both IE and old, and suggests Firefox, Chrome, or a newer version of IE.  Well, I want drivers first, so that’s too bad.  It does have a hardware detect, but the automatic download and install everything is broken.  For something like downloading essential drivers, they ought to have an austere web page that works without the latest browser technology to display pretty pop-up menus or try to sell additional products or entertain you with videos while you are busy being annoyed.

Now another big complaint:  they are pushing an “app” that keeps drivers up to date, but only after downloading and installing do you find out it’s a commercial program that costs money to make work.  To do what Windows Update is supposed to be doing.

The list of possibly useful files on Dell’s page, once given the model number, is awkward and brittle but does allow downloading things one by one.  I got the video driver installed before returning the laptop.

 

Moon Shot, take Ⅱ

The quarter moon offers much more contrast than a full moon, and this time I used a telephoto lens: Canon EF-S 55–250mm f/4–5.6 lens on its longest position.  I used the Canon 70D on a tripod, ISO 100, 1/160 at f/5.6 ×3 exposures.

Quarter Moon

It’s difficult to get focused, and about 1/3 of the bursts were out of focus even though I used the touch screen “Live View” to indicate what to focus on.

The lens has optical stabilization, but I suspect it’s turned off automatically if the steadiness of a tripod is detected, as that is a feature listed on some lenses. Unfortunately faster exposure times were subject to blurring anyway. I didn’t bother to use a remote shutter release, but that might help. I don’t know if this is (usual) vibration of the camera lens, or caused by atmospheric turbulence making the image move around. f/5.6 is the fastest aperture at the 250mm end of the zoom, which is also the size where diffraction starts to be noticeable.

Exposure-wise, 1/40 second still (only barely) has no clipping, so would be the optimum for ETTR to capture the most detail and least noise. 1/125 is the good-looking natural exposure, and at 1/640 it shows visible noise.

I improved the contrast by using the “highlight” slider as well as the overall “exposure” slider in Adobe Camera Raw. The difference between the limb and the terminator was still vastly different lighting, and bringing out better the rays of Kepler would darken the terminator and bring the shadow farther in to make more of a crescent than a quarter. So I used “curves”, leaving the darks alone and changing the contrast of the mid-to-bright.

The color is as-shot, with no white balance adjustment or B&W conversions.  The underexposed pictures came out brown, and the bright ones are very neutral.

I merged three exposures as follows:

  • Cancel all adjustments made in Lightroom while evaluating the exposures.  Noise reduction is turned all the way off, but sharpening is left as proper for the basic “developing”.
  • Transfer the burst to Photoshop as layers
  • Double the resolution
  • Auto-align the layers, specifying “reposition only”.  Leaving it on auto can make PS complain that it can’t find anything.
  • Use the stacking mode Median or Mean, whichever works better.  For longer bursts the Median gets rid of outliers, but for low-noise images Mean is more accurate.
  • Rasterize the stack.

The doubling of the resolution allows for alignment to half a pixel in the original images, and combining exposures this way actually increases resolution as well as removes noise.  With non-noisy images, the increase in clarity is due to the “dithering” of different exposures.

After combining the original exposures, then adjust the exposure, apply noise reduction, and so on.  This is easily done using Adobe Camera Raw as a (smart) filter in Photoshop.

Now sharpening is also done in the Adobe Camera Raw filter, at the same time as the other adjustments.  Because the resolution was doubled and the image is not pixel-sharp to begin with, a large radius value is warranted.  I set it to 3, which allows more aggressive processing without introducing fake detail from noise.

Shoot the Moon

A few days ago was the extra full moon in a calendar month.  This is sometimes called a blue moon, as a weaker form of the traditional term. July 2 and July 31 2015 were both full moons. Two full moons in the same calendar month occurred twice in 2012 and will occur twice again in 2018.

I took some pictures using my somewhat-new Sony α-6000 with the Sony 55–210 f/4.5–6.3 zoom lens.  This features image stabilization, and I hoped the auto-focus could lock on it.  I avoided the most extreme aperture setting so set f/5.0 and 1/250 second at the native (best quality) ISO 100.  The shots were hand-held.  Unfortunately,  I seem to have left the lens set to the short end of the range, not the long end, so this was essentially taken with a “normal” (not “telephoto”) focal length.

I also grabbed my Canon 70D, which was mounted on a tripod.  I attached the EF-S 55–250mm f/4–5.6 lens and tried that too.  Again, I backed off from the most extreme aperture setting and used f/4.5, which is brighter than the Sony.  Also again, I used the shortest (not the longest) focal length, so I would have been better off using a high-quality “normal” prime lens.

In both cases, in full manual mode, I set a variety of shutter speeds and shot bursts with each.  The in-camera metering and histogram is not useful because the black background overwhelms the data and the moon is actually very small in the frame.

Blue Moon #1

Blue Moon #2

I did notice that the underexposure warning overlay was standing off from the disk of the moon, and in Lightroom it was even clearer. The sky around the moon was not black but hazy. Presumably the atmosphere scatters light, enough to show in the exposure, even if it appears very clear by eye.

The histogram showed two distinct distributions, with the haze a few stops below the bright portion. I wondered if the dark shades of the haze were only in the dark sky, or if the dark areas on the moon also included these values. That would determine whether I truncated these darks and to to what extent. So I used a gradient map filter in Photoshop, setting different colors for the darkest darks, the next darks, and others. By moving the slider of the gradient editing tool I could set the split so that the background was “darkest” but any more would start showing the other color; then see if the dark shades on the moon were tagged as “darkest” or the other key color.

000453I found that a tiny amount in the Sea of Crises was in the same range as the background.  The Levels tool controls in the screenshot inset shows what I decided on. The “haze” is the lobe to the left, and it’s significantly below the main subject’s detail.  Only shadow areas that essentially go to black — don’t show any detail anyway — extend farther left than where the right lobe rises visibly from the zero value.

Even with the contrast and exposure levels set nicely, it’s hard to make out detail in the colorless image.  The earlier use of gradient mapping to find details inspired me to use the same tool again, for artistic purposes.  I used midnight blue for the darks and a warm-light color for the lights, but set yellow for the brightest values and adjusted the slider so the split between them was showing color contrast on the rays coming from Tycho Crater.

 Exposures

In both photos presented above, I stacked a burst of 3 shots and took the median of each pixel, to reduce noise.  Since this was ISO 100 with brightness in normal midtone range, I don’t think it did anything useful for the main subject, though it did improve the pixelization of the “haze” background (which is truncated from the final image anyway).  Before aligning the exposures I doubled the image size, as this will help with sub-pixel alignment differences.  The disk of the moon is only 250 to 275 pixels even after doubling: as I said, very small in the frame or about 3% of the height of the exposure.

Since the range of values (see histogram inset above) is a small portion of the exposure’s latitude, there is a large latitude in useful exposures possible.  I chose 1/250 f/5 for the Sony and 1/400 f/4.5 for the Canon, and applied the identical adjustments to each, so you can see they are the same EV.  A brighter exposure on the Canon, which is not overflowing the high end of the available capture range, was not nearly as good even though you would expect it to be less noisy and more nuanced.  Oddly, the size of the moon is different from exposure to exposure, so the zoom lens was not held fixed but moved.  Markedly more chromatic aberration and lower contrast (even after adjusting for the difference in exposure) could be due to the performance of the lens at different zoom settings?

Of the shots I took, the Canon (second picture above) is the clearest and sharpest.  If the lenses are equally good, the lower resolution and longer focal length of the Canon would be better, even though the two cancel out to give the same number of pixels.   A larger pixel pitch and larger projected image to match, covering the same number of pixels, should give better quality.  However I think any differences are dominated by lens performance.  It’s hard to tell exactly if it’s really “better” though because the best exposure in the Canon was actually a little smaller than the Sony, so it would look sharper due to that.

Conclusions

  1. shoot lots of exposures, even more than you would think you need.
  2. vary the settings, even beyond what you think is the right range and also change things you don’t think would matter.
  3. know your lenses.  Not only how the sharpness varies with f-stop, but are there any sweet spots or sour spots in the zoom?
  4. know how to operate everything by feel in the dark.  That includes reviewing images and zooming to inspect what you just shot with the camera’s screen.

This is also a striking example of how a good camera trounces what a cellphone can do.  People might be used to “selfies” and social media shots that are of very poor technical quality, and not really notice when a picture is better in that respect.  But when it comes to pictures that could not be taken at all with a smartphone camera, it’s a difference between getting the shot and having nothing useful at all.

It’s amazing that a small fist-sized compact camera can get hand-held pictures with the kind of close-up you have using binoculars.