[Perl 6 page]

Meditations on a Loop

Synopsis 4 contains an example under “The do-once loop”:

@primes = (do (do $_ if .prime) for 1..100);

Depending on your current level of Perl savvy, this either looks bizarre or deceptively simple. This essay will break it down to its most elementary language features. Like dissecting a frog, there will be a lot of interesting things pulled out and pinned to the mounting board along the way.

In all fairness, Perl 6 is supposed to be richer than it looks. With its similarities to natural language, you want idioms and different cases to mean things. Perl 5 is full of special cases and magic rules to support that. Perl 6 makes an effort to have it work out the way you wanted without special rules. The regular rules are re-factored and reprocessed, with a bunch of mysterious internal organs for support, to work out correctly to mean what you want even though the uses may appear different.

That means you might ignore these “internal organs” most of the time. They are optimized away and the statement “just” means what you thought you said. But we’re going to dig into these details as a means of understanding it on a deeper level.

The parentheses are suggestive in terms of understanding the statement in layers. So start with the innermost part:

Fragment 1

$_ if .prime

From Perl 5, you may be used to the statement modifier form of if. So $_ has an if modifier. You are used to using such modifiers for statements with side effects, such as print "entering the gates\n" if $verbose;. That will either call print or not call it. Our fragment says that the result of this subexpression is $_ if the condition holds. That begs the question, what does it express otherwise? Turn this around into the normal control-flow structure and you see the problem. if (blah blah) { return $_ } where’s the else?

In Perl 5, similar things are explained in context: some constructs return undef in scalar context or an empty list in list context. So such a fragment would not make sense in isolation, as we find here. It would have to be understood as part of a larger construct. So, in Perl 5, you can’t easily break it down into small pieces.

In Perl 6, the concept of “context” a la Perl 5 is being phased out. First of all, it is incompatible with features where type information needs to flow in the opposite direction, such as overloading and multi-method dispatch. You can’t generalize the concept of having the surrounding code tell the subexpression what type it ought to return. Instead, like in most other languages, the subexpression determines the type it happens to be and sends this outward for the enclosing expression to deal with.

Given that, and having to find another way to accomplish such feats, we find the improved version, even while being more mainstream, gives another advantage. The piece can be understood in isolation. That makes the underlying language primitives more orthogonal and more reusable.

So what is the meaning of this tiny piece of viscera? I’ll answer that by showing the implicit other branch. But first, a digression. Note that return is used for a function, not an inner subexpression. If you wrote a return in the middle of an if statement, it indeed returns from the enclosing function. So how do you write that for a block to return up to the wrapping expression? In Perl 5, you have no choice but to just use the implicit value as the last thing in the block executed (more or less). But to be explicit about it, in Perl 6 you can say leave.

if .prime
    { return leave $_ }
else
    { return leave Nil }

S04<77> still read “If the final statement is a conditional which does not execute any branch, the return value is undef in item context and () in list context.” That shows this history, and a clue to understanding the new formulation that does not rely on context. The newer wording, here and throughout the Synopses, is “…the return value is Nil.”. So Nil equals undef in item context and () in list context. It’s just an ordinary looking value, and the mention of context is gone but it still works out the same. How?

The value Nil is an object. If you contemplate it as-is, it tests as undef. In Perl 6, there is not a single reserved undef value like a null pointer in other languages. Rather, anything can say it’s undef, by returning true when asked whether it is undef. “Hey, Dog instance, are you undef? Woof Yes!”

Meanwhile, Nil acts like a list. If asked for the list interface, it returns one. And upon using Nil as a list, the caller will discover that said list has zero elements.

In Perl 6, supporting the list interface, to speak casually, means that the object does Positional. Positional is a role, which among other uses will act as an abstract interface, like an abstract base class in C++. The Positional role is associated with the @ sigil. The caller (or surrounding code) doesn’t establish some kind of mysterious “context” that influences what the inner code means. Rather, the caller wants a list, so it takes whatever it gets back and says “Give me your list interface.”.

So to summarize, $_ if .prime will produce $_ or Nil depending on the test. And Nil knows how to satisfy different needs for different contexts, so we don’t have to worry in advance about the nature of enclosing expressions.

Now, come back to another question that we ignored until now. What does .prime mean? That’s easy! That’s a method call, using the implied $_ as the invocant. That is, it’s a short way of writing $_.prime. Or even $_.prime() if you are not used to being able to leave off useless parentheses in a function call with no arguments.

Wow, the built in Int class has a .prime method! Well, no, it doesn’t. Once upon a time methods that were not found would fail-over to ordinary sub calls. That has been removed and this example was not updated. Perl 6 is a moving target!

You can add methods to existing classes, but don’t do that because it confuses other code. But as long as it’s been brought up… Classes in Perl 6 are “open”, meaning you can come along after the class is written and say something like:

augment class Global::Int {
   method prime () { ... }
   }

But if that doesn’t get you sent to the confessional after the code review, you must not be using or writing any modules. You don’t want to change global classes that are used by all code in the program! Unless that’s exactly what you want to do, that is. It might seem harmless to add this method, but what if other modules did the same thing? It would be a patch-fest. It’s better to keep things visible to your own code or imported where needed.

So, prime should be written as a sub, and thus called using the proper syntax: prime($_). Notice that you may not have a space before the parameter list.

So far, so good. Let’s pin that to the board and see what’s next. Look at the immediate use of that:

Fragment 2

(do $_ if .prime)

We know what $_ if .prime produces (in principle). So what does the do in front of it mean? The short answer is that it doesn’t change the value, but you get a compiler syntax error without it. This is because the if… suffix is a statement modifier. The emphasis here is on statement. Fragment 1 is not a subexpression capable of being enclosed in a larger expression. It is a full statement unto itself. Using a statement modifier implies that the stuff before it is a statement. So if it was just an expression, it gets finished and promoted up to a higher grammatical category. Many languages have distinctions between expressions and statements. In C++ it’s easy because expressions carry values and statements don’t. But in Perl, statements have values too, so the difference is more grammatical than conceptual.

What do does is turn a statement back into an expression. The grammar provides that the keyword do is followed by a statement, and the resulting construct is a term (using the term term casually; the actual grammar specification is very complicated) that may be used as a sub-expression. In terms of meaning, it just expresses the value of the statement, unchanged.

The parentheses are used to terminate the statement used by the do. Because of the way statement modifiers are used, they go after what already might be a complete statement unto itself. The grammar says that a statement followed by a statement modifier is a larger statement. Because of that, having whatever if .prime for 1..100 would be a suffix for a statement that already has a suffix

But that’s exactly what we want here. Stacking the suffixes mean the same thing as nesting the equivalent prefixed forms, and the parentheses is unnecessary, just like 2+(3+4) has parentheses that group it the same way it was going to do anyway.

Once upon a time, stacking statement modifiers was not allowed. That was erring on the side of caution, since the grammar is very complicated. But now the formal grammar is implemented and multiple suffixes is not a problem. But, you can always add extra parentheses to clarify things or make a complex expression easier to read.

Although unnecessary, the parentheses isolate the inner statement so it ends with the first modifier. In a manner analogous to using parentheses in subexpressions for grouping, it does the same thing even though we are using statement-level constructs. The parentheses makes what’s inside a world unto itself for the parser, so the statement stops because the parser ran out of input. Basically, it does what you think it should do.

We are used to parentheses being used as part of the grammar for expressions to group things, in defiance of the natural order of precedence. Grammar is written that way to affect parsing, but the parentheses don’t really do anything. In Perl 6, you are about to go down the rabbit hole.

In Perl 6, the parentheses are an active part of the language, and the syntax creates an object of type Capture. In Perl 5, the parentheses makes something a list (sometimes), and the comma operator might behave like in C, or might create a list. The Capture is a major part in how Perl 6 eliminates these context-sensitive rules.

People love to identify “the heart” of this or that. I’m not sure what the heart of Perl 6 would be, but I think we’ve identified the spleen with the Capture. In the human body, most people have no idea what the spleen does. It sits there out of the way doing its thing. Looking it up in a dictionary, the brief biological definition doesn’t make it completely understandable, but some of its functions are easily explained. The obsolete poetic definition also applies to Perl 6 Captures: “the seat of spirit and courage or of such emotions as mirth, ill humor, melancholy, etc.” So the Capture is the spleen of Perl 6.

A Capture is like a list and a hash, holding both positional and named items. Unlike the normal Array and Hash types, it holds aliases to the original things, not just copies. The full glory of Captures can be seen in their use of parameter passing. So what is one of those doing in the middle of my simple expression? I dissected it expecting simple inert connective tissue and pulled out a spleen!

Well, look at it this way. You take all the uses of constructs that have multiple uses in different contexts, including parentheses and commas, spend a couple years refactoring the concepts in an attempt to explain all the uses in a uniform way rather than a list of special cases, and you get “something different”.

The general rule is to replace constructs that have different meanings or behavior in different contexts with one behavior always which is to create an object. That object can be used in different ways depending on what the enclosing code does with it, and the different possible behaviors of that object take the place of the original context-sensitivity in the parser. We saw it with Nil in a straightforward way. The construct always does the same thing, but the consumer of that Nil object can make it either bark or meow.

It is the same with parentheses, only much more complex because parentheses are used in a lot of places, and different kinds of things take place inside them.

In general, a pair of parentheses can contain a list of things separated by commas and/or semicolons. The semicolon is a stronger separator than the comma, and is handy for multidimensional lists such as specifying a slice of a multidimensional array. Using \( ... ) is even fancier, and allows named values.

The parentheses mean that we now have a Capture instance which contains the earlier result as a single item.

This idea doesn’t interfere with common parentheses in expressions for grouping because of the behavior of the Capture object. If a literal Capture is used as an item, and that Capture has exactly one positional item in it, then that inner item is used. That is, when the enclosing expression says, “give me your item interface”, it returns the inner item. If, on the other hand, the Capture had more stuff in it, the item interface would return the Capture itself. Look at what this gives us: Ordinary grouping parentheses when used in such a manner, and comma-separated lists when used in that manner. All with one and only one rule for what to do with parentheses as they are being parsed.

It is up to the optimizer to recognise this very common idiom of creating a Capture of one item just to take that item out again and throw away the Capture, and cut out all the junk and just use the item. So at run-time they behave like traditional parentheses, in that they don’t show up as a run-time activity at all but influenced the parser.

As it turns out, “Parentheses are parsed on the inside as a semicolon-separated list of statements”, emphasis mine, according to S03. So the parentheses alone will do what do does in the example. The use of do is superfluous and can be removed. The reason it’s in the example is because this behavior of plain parentheses is fairly new to the specification so do was indeed needed until recently, and mainly because this example is found in the section on do! Expect future versions of S04 to contain a different example here. ☺

And the plot thickens, because the plain parentheses Capture syntax is inherently two dimensional . If you write (1,2,3;4,5,6;7,8,9) you get a Capture containing 3 Captures, each of which contain 3 Ints. So if you don’t have any semicolons, you just have the first group, but it is still a group. Writing (1+1) will produce a Capture containing one item which is itself a Capture containing one item which is an Int equal to 2.

The rules for a literal Capture used as an item actually accommodate that more detailed structure. It realizes that the “one item” is expected to be nested inside another Capture. That is the case which returns that inner item.

Here is a low-level representation of the resulting structure. The brackets indicate a primitive sequencing, not any Perl data structure. Fragment 2 will express something like ⋖ ⋖ Nil ⋗ ⋗ or ⋖ ⋖ 17 ⋗ ⋗ depending on whether $_ is prime.

The real point is that putting exactly one thing in parentheses without any commas or semicolons will result in a Capture with a particular structure, and the “give me the item interface” method will recognise that structure and return the enclosed item alone. Various other contents and usages will result in the proper thing, too.

Sorry you asked? Too late now. Onward!

Fragment 3

for 1..100

This is another statement modifier. Tacked on as a suffix to a statement, it means the same thing as writing it as a for loop, except for the extra braces. Perl 6 for loops are like foreach in Perl 5. They are not the 3-part general iteration construct like in C++. Use loop for that. The for syntax takes a something that can be iterated and uses that to control the iterations. It will execute the body once for each item in the list, and by default aliases $_(read/write) to the item.

The syntax 1..100 seems obvious. The meaning is clear, even in cases where it is far from obvious what is going on under the skin. For two Int arguments it is simple enough: The .. operator returns a Range that stores the 1 and the 100 as the bounds, with 1 as the step value.

The code generated for the for loop will start out by asking the thing for an Iterator. The Range object obliges with one that returns each number in turn as you would expect. The low-level details, made explicit, would look something like this (using the Iterator specification from S7<5> early draft):

# for $stuff { block }

my @results is Capture;
my Iterator $it = $stuff.Iterator(:rw); 
while ((my $_ := $it.get()) !=== Nil) { push @results, (block); }
leave @results;

How all that works is another story. Suffice to know that anything that knows how to offer an Iterator in the standard manner will work to control the for loop. That includes, naturally, Range objects and Array objects.

Later versions of Perl 5 implement a special case so this range notation won’t cause the whole array to be generated before the loop is performed. With Perl 6, that is naturally the case, since iteration does not require a list primitive.

This also shows that an Iterator cannot return Nil as part of a collection. It is used as a signal that there are no more elements! However, you’ll find that you can’t put Nil into any kind of list in the first place, since they behave like or turn into empty lists. That is exactly what Nil is meant to do. If you push a Nil onto a normal Array, the push sees it an empty list and does nothing. If you add it to a Capture, it will be stripped out when you view the Capture as an ordinary list, which we will encounter later.

Finally, notice that the for loop in Perl 6 doesn’t just execute code for its side effects. It produces a value, too. The value is a collection of all the results of each iteration. In general, the body may produce a list as a result, not just a single item. Each result is itself wrapped in a Capture (though if it’s a bare-paren Capture already it’s not wrapped again), and a Capture containing all those result Captures is the final expressed result.

Fragment 4

(do $_ if .prime) for 1..100

We already know what the body of the loop produces. Take a hundred of those, and make them the elements of another Capture, and you have something that starts off like this: ⋖ ⋖ ⋖ Nil ⋗ ⋗, ⋖ ⋖ 2 ⋗ ⋗, ⋖ ⋖ 3 ⋗ ⋗, ⋖ ⋖ Nil ⋗ ⋗, ⋖ ⋖ 5 ⋗ ⋗, ⋖ ⋖ Nil ⋗ ⋗, ⋖ ⋖ 7 ⋗ ⋗, … ⋗

Fragment 5

do (do $_ if .prime) for 1..100

Next, a do is put in front of that, which as we know just turns the statement back into an expression.

Fragment 6

(do (do $_ if .prime) for 1..100)

Now, another set of parentheses. These wrap the value up in another 2-deep Capture. Its purpose of grouping is not being used here, since there is nothing it needs separating from. So they are superfluous, just there for beauty or nostalgia.

The result of the expression now looks like this:

⋖ ⋖ ⋖ ⋖ ⋖ Nil ⋗ ⋗, ⋖ ⋖ 2 ⋗ ⋗, ⋖ ⋖ 3 ⋗ ⋗, ⋖ ⋖ Nil ⋗ ⋗, ⋖ ⋖ 5 ⋗ ⋗, ⋖ ⋖ Nil ⋗ ⋗, ⋖ ⋖ 7 ⋗ ⋗, … ⋗ ⋗ ⋗

with the 93 iterations not shown in the ellipses. That might be hard to read if you're not a LISP programmer, but don’t worry about it. The point is that it has lots of wrappings around the values of interest, many of them pointless. (And there may be fewer in a later spec…here’s hoping!)

Fragment 7

@primes = (do (do $_ if .prime) for 1..100)

That result is assigned to @primes. But the result in @primes ends up being ⋖ 2, 3, 5, 7, …⋗ with no sign of 5 levels of nested lists or the Nils. What happened?

The list assignment asks the Capture for its list interface. The Capture object can produce two different list interfaces, one that shows everything as-is and one that flattens. The list assignment asks for the latter. This is still easily explained as “list context”, but the behavior of the code producing the value was not context-sensitive. Rather, the “list context” is asserted by asking the Capture object to reveal its list-like behavior.

The list assignment then copies the contents to the @primes variable. Here, all the values are present in the right-hand-side and they are simply copied, as if by iteration and pushing. But lists can be lazy, or made up of multiple segments containing code to generate elements. So in the general case, the stuff defining the list is copied.

The final thing in the original statement is the semicolon. In some languages where semicolons tend appear at the end of statements, they are statement terminators and considered part of the statement. In others, such as Pascal, they are separators and go between statements, so one is not needed after the last statement in a block. In Perl 6, they are terminators, but you can still leave them off at the end of the block and some other places. The Perl 6 grammar goes to great lengths to appear more casual and less pedantic. So, the semicolon is not the only way to terminate a statement.

Conclusion

Based on this analysis, the original statement could be written as @primes = do $_ if .prime for 1..100; for the same effect. Much is going on beneath the skin to support a “casual” syntax and context-sensitive behavior, without the limitations inherent in the Perl 5 parser. But, short of defiling the Global::Int class, the “same effect” is a compile-time error, and it should be written:

@primes = do $_ if prime($_) for 1..100;

For those not familiar with other versions of Perl, you can see that this neo-natal language is very expressive and powerful.

Orthodoxy

Technical details checked as of May 2009.

The concepts of the Capture is still relatively new, and the Synopses are still changing to shift to the new paradigm. Details of how they are structured are still conflicting and incomplete. In fact, this Meditation was done in part to serve as concrete examples of these ideas.

Likewise, the details of exactly when and how the list flattening takes place has undergone reform, and details are not clear from the current text.

TIMTOWTDI

@primes = $_ when prime($_) for 1..100;

@primes = 1..100.grep: { prime($^n) };

1..100.grep: { prime($^n) } ==> @primes;

font test: Let me know which of these show up and the particulars of your system. ⁅123⁆ ⋖234⋗⋐345⋑ ❲456❳ ⟬567⟭ ⟦678⟧
Results: 567 isn’t good the Mac OS X. Not in Serif font with Gentoo.


Footnotes

I’m hoping this can be simplified. The current specification is slightly contradictory for this exact point, vague in most other places, and not a coherent whole.

I think the current spec doesn’t distinguish between things that can be traversed once in the forward direction from things that are full random-access lists. It needs to, to preserve the efficient for code in the general case, rather than optimizing (only) the simple literal range. A lazy list doesn’t convey that items can be thrown away from the front after being used once.