[Perl 6 page]

[Table of Contents] [previous] [first in this series] [next]

Lvalues from collections in the Perl 6 Information Model

This is part of a series which describes the Information Model of Perl 6. It is theoretical rather than an exact description of any real implementation. The details in this installment are not quite spelled out in the Synopses, but completely consistant with them except where noted.

How collections can provide lvalues for individual elements is the last major piece in understanding the Information Model.

This essay uses arrays throughout to avoid thick standards-style language. Realize that the situation is analogous for hashes and class instances and anything else that contains other cells that may be referred to somehow.

Elements of a Collection

Consider the humble statement:

@a[12] = 42;

The result of @a[12] is an lvalue in the original sense of the word. In some languages the subscripting might be something that's just built-in to the compiler, so that whole thing is just a name in the same way as a simple variable. But in Perl 6, it is an expression that follows the same rules as any other function call and subexpression.

If the plain assignment is just too simple to register as being something profoundly interesting, try this example instead:

sub modify ($x is rw)
 {
    $x = 42;
 }

my @a = eager ^20;
modify(@a[12]);

This indeed works as indicated, and the value of one item in the array is changed by the function. Given what we know about parameter passing and assignment from the previous parts of this series, how?

The Primitive Object Reference, the red square shown as one cell of the array, is not an LValue taken by itself. This cell canít be passed as rw to the function in the same way as a simple named variable.

The rest of this installment concerns itself with just how it does work.

More About the Problem

A naÔve approach would be to permanently associate an Item Container with each array element. You can see from the illustration that this would indeed work in this case: Scalar #13 has no trouble aliasing with the $x parameter. It works out just like normal variables declared with the $ sigil.

Figure 1 — Array cells bound to Item Containers

parameter illustration

But, it is not going to work. Besides the obvious inefficiency, it canít be done in all cases. For example, a compact array, such as my int @b; requires that the storage for the int int values be contiguous machine primitive words, just like in C or C++.

A far worse problem is with arrays that are not long enough or donít exist. Consider the above function but called with modify(@a[99]);. The assignment needs to extend the array as the indicated cell doesn't exist. The subscripting itself does not do this, and neither does the binding to $x. If the function read from $x, it would see undef. If it never wrote to $x, the array would be unchanged. Only upon writing to $x does the array grow to accommodate it.

Now try modify(@a[99][5][2]);. Upon writing to that LValue, it somehow needs to autovivify two nested arrays and store them in @a[99], in addition to storing the 42 in the (now existing) cell.

Binding Reference Objects

Suppose you were writing a special collection class of some kind, and needed to provide this functionality. How would you do it? From within the Perl 6 language, all the elements are available.

The object needs to have a method named precircumfix:<[ ]> that returns an LValue by declaring the function as rw. The Synopses shows an example of a method returning a proxy to provide access to member data, with code to run for the STORE case. The precircumfix:<[ ]> function simply needs to return an object that does the LValue role, and implements STORE and FETCH. For nested autovivications, this binding proxy can itself implement precircumfix:<[ ]> that builds up a more complex memory of what needs to be done on a STORE.

Iteration and Mutation

Letís digress a moment and discuss the issue of changing the container while iterating over it.

# Warning: positively evil code ahead 
my @A = 0..20;   # originally, value same as subscript.
for @A -> $x {
   say "before: $x";
   @A[5] :delete; 
   say "after: $x";
   }

In Perl 5, the documentation says ďdonít do that.Ē However, there is quite a bit of code in the implementation that specifically accommodates deleting the current element.

Meanwhile, look at a similar issue with the Standard Template Library in C++. In STL, iterators are used for both pointing into containers in general as well as stepping through them. That itself tells you that the concepts are interrelated. Our Perl 6 lvalues of cells inside containers are conceptually like STL iterators. The Perl 6 Iterators are like STL iterators, too.

In STL, the standard says that any modification to the container will invalidate all iterators associated with it. What really happens will vary with the kind of container: For linked lists, it is not much of a problem. Some containers might reallocate their memory or reorganize themselves, depending on the exact operation and the current state. So the old iterator might still work, or it might trash memory and horribly crash the program.

The reality is that programmers do that anyway. Rather than assuming the worst, they write code with specific containersí in mind, knowing when it is OK or just being lucky (or having hard-to-find bugs). At the very least, it loses generality and assumes implementation details.

One lesson here is that all containers should behave in the same way. We donít want a function that works fine with a standard Array object break when passed some other object that does Positional. Many times the idea of tying an array to a different implementation is supposed to be invisible to the code that then uses the array, so this is even more important than the case in C++ where choosing some container or another is always deliberate.

Second, Perl is not a low-level program-to-the-metal language like C++. We donít have raw machine pointers and should not have anything like stray pointers. So even doing it wrong should not crash the program terribly with random memory overwrites. The effects should be confined to the Perl 6 Information Model.

Third, Perl has different objects for Iterators and for LValues, so they can have different semantics. Each can be designed to be more specialized for its intended role, with slightly different design criteria. So, I propose that the Iterator for an Array can be informed of changes to the Array and adjust itself to some degree (and that is another article). Iterators are generally not created en masse for one collection, and you usually have just the one for the current loop. And Binding Reference Objects don’t need to address this, but need only handle what they are used for: LValues that may be used promptly or saved as a means to alias the location symbolically.

Consistent and Reusable

When you create a custom container, that you want to use the way you would normally use a built-in Array, ideally you would only care about providing the STORE and FETCH methods. Like tieing an array in Perl 5, you want to concern yourself with what makes this custom array-like thing unique: how individual elements are stored. Everything else should just work.

To get back to the main story of describing Binding Reference Objects, we want something that operates consistently regardless of the concrete container, and furthermore is generic enough to work with any container you might happen to create.

What is the most generic form of a binding reference that always works? The answer is simply to remember the original argument to the subscript operation, and apply it to the collection only when it is eventually used to STORE or FETCH.

Thinking of the simplest case of a single integer subscript, @a[$n], this would return an object that remembers @a and the value of $n. Something like this:

class simple_BRO does LValue
 {
	has @!collection;
	has Int $!index;
	# constructor ...
	# other things ...
	method FETCH
	   {
		return @!collection.FETCH($!index);
	   }
	method STORE { ... }
 }

# inside role Positional :
   method postcurcumfix:<[ ]> (Int $n) is rw
    {
	my simple_BRO $result = .new( collection => self, index => $n);
	return $result;
    }

This only shows the simplest case of a single Int subscript, and itís still very incomplete. It needs to be parameterized to propagate the type information of the collection. It needs to have its own postcurcumfix:<[ ]> operator for cascading subscripts.

And the real case isnít anywhere near this simple. The parameter to the subscript can be a list and generate a slice as the result. The array could be multi-dimensional. It needs to hande adverbs such as :delete. The full specification for this would need to be longer than this article, and that's a lot to implement if you only wanted to customize the way elements are stored.

That is why a generic implementation of postcurcumfix:<[ ]> is found in the Positional role itself. It may be several overloaded forms of this operator to handle the most general case and still catch compile-time errors when it can and optimize the easy cases. It will include whatever concrete types it needs for the BROís to be returned, automatically customized for the concrete Positional class it finds itself in. So, if you donít do anything, when you create a class that does Positional, you get all this by default. You only need to supply the basics, such as STORE and FETCH and a few other very simple things.

Back to the Example

Understanding that the subscripting operator returns a Binding Reference as a kind of LValue, look again at the first example:

sub modify ($x is rw)
 {
    $x = 42;
 }

my @a = eager ^20;
modify(@a[12]);

And here is how it looks when postcircumfix<[ ]> returns a Binding Reference Object:

Figure 2 — Binding Reference Object

BRO illustration

Even if we donít know the details of the Array, it still works. The Array might be a compact array, the index might refer to an element that doesnít exist, or it might be a custom class that maps elements to rows in a SQL database. The Binding Reference Object is completely generic and only cares that the underlying collection does Positional (In reality, the Binding Reference Object would be a generic type to propagate the element type of the collection.)

Figure 3 — BRO with opaque collection

BRO illustration

Semantics of the BRO

Remembering the index (or the parameters in general) is the only way to program a binding reference that is fully generic. In this section we explore what this implementation does if we change things between the time the binding reference is created and when it is used.

my @a = 1,2,3;

sub f4 ($x is ref)
 {
    push @a, 4,5,6;
    $x = 42;
 }

f4(@a[5]);

Clearly any kind of memory reallocation or rearrangement on the part of the collection is not an issue. Also, notice that the subscripting operation canít make the determination of whether the element exists, as that might change before the binding reference is used. In the listing to the right, @a[5] does not exist when postcircumfix<[ ]> is called, but does exist when the STORE is eventually invoked on it.

If you look closely, youíll see that this code used is ref for the parameter. Remember, ref takes whatever you give it without complaint, and that includes a binding proxy to elements that donít exist. But is rw behaves differently. The act of passing to a rw parameter will cause the element to be autovivified. cover this in detail when I get back to parameter passing/returning

Even so, the rw parameter might have its collection changed again, so it must still be prepared to grow the array again when STOREd to. Not only can things happen between creation and use, but things can happen between uses as well.

If you insert or remove elements, changing the numbering of some of the elements, the binding reference object will not track the movement. The standard binding proxy only remembers the index numbers, and might not have any way to further track the objects. Note that this is different from the behavior of Iterators, which do track movement. And of course you can implement an LValue however you want for your own collections.










This is still being written. Consider this file a sneak-peek.

To continue

The next page goes into an in-depth study of the Read-Only Proxy objects.

Or, see the Table of Contents for other subjects.

Orthodoxy

The Synopses S06<108> under Lvalue subroutines shows a proxy being returned from a routine declared with the is rw trait applied to the sub itself. Iím presuming this will DWIM, which is to make the LValue object mentioned bound to the return parameter, as opposed to binding an (ordinary) LValue for the object named in the return. The DWIM-ness could take into account the declared return type as well. When it is ambiguous and you want the other way, youíll need to override that somehow, or use the full syntax for the return parameter which avoids this DWIM effect. cover this in detail when I get back to parameter passing/returning.

The LValue role (as Iíve defined it in this series) needs a method to cause autovivification, which can be called without STOREing something. This is in addition to the STORE, FETCH, and other essentials from P5ís tiescalar.

My description of BROís for standard containers follows the decision that they should be uniform in semantics and have a fully general form available. Support for that position is in the article. If different kinds of collections under certain conditions behave differently (e.g. tracking movement), the reasons for that and the precise conditions and behavior need to be described. Thatís not in the Synopses either, and is far less parsimonious than assuming the general case and arguably not needing to describe in detail what is logically compelled from the premise of being ďuniform and generalĒ.