[Perl 6 page]

[Table of Contents] [previous/first in this series] [next]

LValues in the Perl 6 Information Model

This is part of a series which describes the Information Model of Perl 6. It is theoretical rather than an exact description of any real implementation. The details in this installment are not quite spelled out in the Synopses, but completely consistant with them except where noted.

What is an LValue?

The way I learned it, a long time ago, an lvalue is basically a value referred to in such a way that it may be modified. The name comes from its use on the Left side of an assignment.

To clarify what we are getting at, suppose we have an object of type Dog. Given some primitive object reference to the objectóa value, we can certainly call methods on it to modify it.

sub f1(Dog $d)
 {
    $d.set_name("Fido");  # change dogís name
 }

my Dog $spot .= new;
$spot.set_name("Spot");
f1($spot);
say "My Dogís name is $spot.get_name()";  # Fido.

The variable $d here holds a value of type Dog, and a method is called to mutate that object. $d will then continue on with that new name in place. Furthermore, the caller will see the object has changed. But, that is not what we mean here.

Changing fields within $d does not change the fact that it is the same instance. Bearing in mind that assignment in Perl has reference semantics like Java and Smalltalk and unlike the assignment operator of C++, consider f1 trying to change which object that $d refers to. The statement $d = $another_dog; would fail (even if we had $another_dog) because the parameter $d is read-only.

Here we have to distinguish the casual usage of $d being ďthe DogĒ and set_name is ďchanging $dĒ, and instead insist on the more pedantic view that $d is a variable that somehow provides access to a Dog object, but is itself not affected when the Dog object mutates. Assignment to $d is asking to make $d refer to a different object, changing the value of $d itself.

In this case, the compiler complains and says you are not allowed to do that. As weíll see later, this is not as simple as pass-by-value in other languages as this aliases the callerís variable (if applicable) anyway. But donít worry about that for now. The point Iím trying to make is that $d itself is a different thing than what $d holds, even though the syntax goes to lengths to hide that from you.

What do we do with them?

Change the declaration slightly:

sub f1(Dog $d is rw)
 {
    $d= Dog.new;
 }

my Dog $spot .= new;
say "before: ", $spot.WHICH;
f1($spot);
say "after: ", $spot.WHICH;

Now assigning to $d in the function will change the callerís original variable $spot, pointing it to something else. $d now refers to the variable $spot in such a way that it may be modified. Not just the Dog, but the variable itself. That is what we mean by an lvalue.

Here is another example. Consider the return value from a function. In most languages, these are mere values, not lvalues. That is the default in Perl 6.

sub f2 (Str $x)
 {
 state @a = 0 x 100;  state: think ďstaticĒ in C++.  Remembers between calls.
 my $n= 2;  # stub.  real program computeís index from parameter
 return @a[$n];
 }

my $y= f2("boat");
say "result is $y";  # itís 0 now.
$y= 5;
my $z= f2("boat");
say "now itís $y";  # did not change.

f2("boat") = 5;  # does not compile.

The function f2 returns a value. That value is then disassociated with the Array it came from. You donít expect assigning to $y after that to change the value back in @a, right? Assigning to the result of the function will give a compile-time error that you simply canít do thatóthe function returns a value, and is not itself a variable.

But all you have to do is ask. Declare your intent, and the compiler allows it and puts in the proper machinery to make it possible.

sub f2 (Str $x) is rw

Now assigning to the function call f2("boat") = 5; will update the cell in @a[2], and will be seen next time you call f2. But, nothing is different about $y. Assigning the value to $y copied the Int object, but did not capture its lvalueness. That evaporated like a soap bubble. In most languages, thatís how it goes: an lvalue return would have to be used directly on the left side of an assignment, and canít be bottled for later use. In C++ you can be explicit about how you accomplish this by returning a pointer to the place you want to later change. Perl does it by using a different operator, :=. As weíll see later, that works behind the scenes in a similar way to what you would manually do in C++.

my $y := f2("boat");
$y = 5; 
say "now itís ", f2("boat");  # itís 5

The LValue is something thatís returned from f2, that allows assignment and allows access of the value. In using the value, it is accessed, but the LValue itself is never seen again. We can see the LValue has a more tangible existence in the last example. Using := binds the LValue object itself to the variable $y. It becomes an alias for @a[2] in exactly the same way that rw parameters become aliases for the callerís lvalues. The aliasing is possible simply because the LValue is itself an object.

It is an object that allows assignment to it but other than that just passes things through to the referenced object. Were have we seen that beforeÖ? The Scalar, or more generally the Item Container.

LValue is the Item Container

The essential feature of an lvalueóreferring to a value in such a way that it may be modifiedóis exactly what the Item Container interface does. It has a FETCH and a STORE method. FETCH is for reading the value as a normal value, and STORE is what gets called when you use it on the left hand side of an assignment. Furthermore, all the features discussed above, and more, will be explained below by the use of objects that provide this interface.

Since there is basically a 1-to-1 correspondence between the abstract Item Container and the LValue, I would propose that this Role, containing the FETCH and STORE methods and the transparency as to normally just see the contained value, be called LValue.

Parameter Passing

Since almost everything can be explained in terms of function calls, understanding parameter passing will go a long way. First weíll look at how LValue objects are used in function calling, then returning and other places, and later weíll see where LValue objects come from. For now weíll just use Scalar since we already have them laying around.

Furthermore, we will only look at a parameter name declared using the $ sigil. Other sigils will not rely on any effects of the Information Model than what is explained here, but is much messier to explain precisely because there is a bit of DWIM involved in those cases. But not to worry, that will be explained later.

rw parameter

The rw parameter is the easiest to understand, so weíll look at that one first even though it is not the default. In this illustration, we see what happens if f1 is declared as sub f1 (Dog $d is rw). The call is as discussed above: f1($spot).

parameter illustration

The rw trait on the parameter makes it bind as an lvalue, which means grabbing the lvalue of the callerís argument, not just the (contained) value. In this case, passing $d, which is bound to Scalar #1, caused the parameter to become bound to the same object. Seeing this, of course any assignment to $d will affect $spot.

If you called f1 with an argument that was not an lvalue, such as the return from another function: f1(get_dog()); (assuming get_dog was not declared as returning an lvalue), the compiler will complain because it demands an lvalue from the caller.

Regular (readonly) parameter

parameter illustration

Now we come back to the default parameter passing. The formal parameter becomes a local variable that is a read-only alias for the callerís value. Above, we see f1 declared as sub f1 (Dog $d) and called with f1($spot).

This passing mode is defined to still be an alias for the callerís lvalue, so it must bind at the lvalue level, like the rw case. Although $d is not allowed to change the contents of Scalar #1, it can still be changed through $spot, and if that happens, $d must see the change.

The deepness of the binding is the same as the rw, essentially. But we must prevent $d from being used in assignment. The way to do that is with a proxy object that sits in front. The proxy passes through a FETCH and all method calls just like any other Item Container, but it refuses the honor a STORE, throwing an exception if you call it.

If the caller's argument is already read-only, then another proxy is not added. For example, if f1 contains $d†->†{†say†$^cheese†} [ add link to detailed explaination of that ]

Now recall that this is a theoretical model. A good implementation, since this is a common thing to do, will probably optimize away the proxy object and instead point $d directly to Scalar #1, just like in the rw case. But, the compiler will instead just know that $d is read-only and detect any use of assignment directly. But if you pass $d as an lvalue to another function whose parameter is ref, or returned from a function, then the read-only nature will need to be tracked at run-time. So such an optimal implementation will still need the full mechanism sometimes. This is discussed at length on another page.

And thatís not the end of the story. This diagram only shows what happens if you pass an lvalue that can be bound to. In function calling, that is often not the case. Consider the mathematical function sin for example. If itís declared as sub†sin†(Num†$x), and called as sin(2*π/4).

parameter illustration

If a non-lvalue is passed, there is obviously no need to track changes to the callerís lvalueóthe caller doesnít have an lvalue and itís not coming from anything in the symbol table. And the function is not allowed to assign to its local variable either. So there is no reason for any kind of Item Container at all. As shown in this diagram, the local variable $x simply binds directly to the argumentís value.


ref parameter

Another choice of parameter passing is ref, e.g. sub†f1†(Dog†$d†is†ref).

The intention is to work like rw, binding to the lvalue of the caller if one is provided. But, if the caller does not provide an lvalue, donít reject it outright like rw would. Instead, take it anyway.

In general, ref will take whatever you give it, and preserve the essential nature (capabilities and limitiations) of whatever that was.

The ref parameter will bind to whatever you passed as the argument, neither complaing nor altering it in any way. If you pass an lvalue, the function gets a rw alias to it. If you pass a non-lvalue, the function gets that bound directly to the variable without any container being added. And if you pass it a read-only proxy, you get an alias for that. [ will href to "passing_examples" when that collection gets fleshed out ]

In any case, the function can determine at run time what it actually got.

copy parameter

The last way to pass a parameter is with copy. E.g. sub†f1†(Dog†$d†is†copy).

parameter illustration

For a parameter named with the $ sigil, this will create the local variable as an lvalue, but it will not bind to the callerís. Rather, it only cares about the (plain) value of the argument, and always makes its own Scalar to hold it. Note that it does not clone the callerís lvalue (if there is one) to make another of the same concrete Item Container class. Rather, it always creates a Scalar.

The behavior is easy to understand as it is exactly like:

sub f1 (Dog $temp)
 {
 my Dog $d = $temp;
 ...
 }

By which I mean the behavior of $d within the function. The example is different because it uses a different name for the parameter, but that is unimportant to the example. Writing it to be exactly the same would be more complicated.

The parameter passing mechanism treats the callerís argument in the same manner as for a default readonly parameter, in that it only cares about the underlying value and forgets about any lvalue that may be wrapping it. But rather than binding the formal parameter directly to the value, it creates a Scalar (object #5 in the illustration) and binds to that, and stores the value in the Scalar.

To the caller, itís exactly the same as a readonly parameter. But to the function, it behaves like a regular local variable declared using my, that starts out with the passed-in value but may be changed. If you are annoyed that you canít assign to the parameter without it being rw, this is what you need.

For containers other than Item Containers, you get a somewhat different effect. For parameters declared using the $ sigil it works exactly as just described, caring only whether the actual argument is an LValue or ďsomething elseĒ, even if that something else is an Array or other class that is container-like.

But for variables declared using other sigils, it will create a new container of the default type for that sigil, and shallow-copy the contents. This is covered in more detail in the next installment.

In general, for any kind of variable, it behaves like a readonly parameter from the callerís point of view, and copies it into a fresh container using the normal assignment for that container.

To continue

The next section continues by discussing the behavior of parameters declared with sigils other than $.

Or, see the Table of Contents for other subjects.

Orthodoxy

Checked for technical accuracy in May 2009. Initial presentation, awaiting review feedback.

The concepts discussed leading up to the conclusion that an lvalue is closely associated with the Item Container idea is implicit in the Synopses. Calling the role LValue is my suggestion.

ref parameters

The Synopses donít clearly specify what happens if a non-lvalue is passed to a ref parameter. The intent is clear enough: provide rw if possible, but make the call anyway even if itís not.

There are two obvious things that could happen. One is as stated here, where the non-lvalue case shows up as a non-lvalue in the function. This means that assignment will cause an exception at run-time if you try it without looking first. So if the intent is to provide optional altering of the callerís variable, the function needs to look first or use an exception or use the optional method-call form, e.g. $d.?infix<=>($another_dog);.

The other choice is to create an lvalue if one is not present. That is, do copy instead of readonly if it canít do rw. This means assignment will work but silently not update the callerís variable. The disadvantage is that there is no way to test which case you have. You always have an lvalue in the function, and you donít know whether it is aliased with the caller or not. Without introducing more language features, there is no way to find out.

Since you can use an optional method call, trap an exception, or use VAR to see what youíve got, the first way is more conservative of the language features needed to support it. On the other hand, you might argue that the only reason to use ref if when you donít care, and if you do want to detect the difference you would write two different signatures and let MMD choose the rw form if it can and the readonly form otherwise. But that means you must write more functions and arrange to have common code in a third, and add the expense of MMD even if you were not overloading based on type already.

copy parameters

The Synopses S06<108> does use the word ďclone". It states:

To pass-by-copy, use the is copy trait. An object container will be cloned whether or not the original is mutable, while an (immutable) value will be copied into a suitably mutable container. The parameter may bind to any argument that meets the other typological constraints of the parameter.

But the term was used in a more casual sense, and does not refer to a virtual clone method of the object which produces a copy of the same concrete type. That would simply not make sense. Consider, for example, an lvalue is a particular type designed by the implementation to reference a cell of a collection. What would it mean to duplicate it? The duplicate would also reference the same cell of the same collection. To make another one ďjust like itĒ of that type that nevertheless referred to a unique location is nonsense.