9 min read

Computing on Indices

Computing on Indices
Photo by Maksym Kaharlytskyi / Unsplash

I received this inquiry from Daniel Lyons, and I wanted to post my response here, since this is a favorite subject of mine.

I’m learning J. I watched your talk “Living the Loopless Life,” and it occurred to me that, due to my exposure to generators/iterators in Python and functional programming more generally, that the idea of doing computation using indexing was rather foreign to me. I thought I would ask someone about general techniques for using this for someone in my shoes who is somewhat uncomfortable with the idea of being too familiar with locations in an array. I asked on the APL Farm and they suggested I ask you specifically.

Some that came to mind were:

  • Filtering using Copy (#)
  • Sorting using Grade (/: and :), which allows sorting one array by the order established in another
  • Simple key/value mappings (lookup index in key array, extract from value array)

Are these indeed fundamental ideas you alluded to about array indexing? If so, are there others that come to mind? If not, can you give examples of the kinds of things you meant?

Thanks for reaching out. This is one of my favorite little "extra bits" of APL that I'm very fond of.

The examples that you gave are good "starting points" for computing on indices, but they aren't really computing on indices just yet. Really, they are just using indices as single item lookups. The advantage of using indices for these primitives is that you can get a lot of results while still using simple return types and the like, which is fine, but not really where the full power comes in.

I think the starting point for this kind of thing is to think back to one of the things that Knuth has said about his favorite features in C. He loves that you can take a pointer and then compute over it using basic addition and subtraction. This is the start of basic index computation. People are used to working with affine index operations in some languages. This is very common, for instance, when computing indices for stencil grids. These uses, however, don't really make for very interesting cases, because they are just verbose models of particular access patterns, which are much better put behind some kind of operator (adverb/conjunction).

However, thinking of things like this does set the stage, especially if we look at C first, because we have this idea that we can talk about contiguously arranged things which are mapped to the natural numbers. We can then say that such things can not only be mapped to the natural numbers, but to arbitrary systems of polynomials (aka multi-dimensional arrays). This is the concept encoded in the encode/decode primitives of APL. This connects single vector indices to multi-dimensional spaces. Great!

But what do we do with this stuff? My present use for these features comes in thinking about nodes and the creation and deletion of nodes in a tree. In other languages, a reference to a single object is just a hidden pointer for which you can only perform equality, but no other computation. But if we associate all the nodes in our tree with an index that maps them contiguously to a known region in memory, now, the "pointer" for the node object is the same as its index into that region or whatever.

This allows us to now easily associate metadata of any kind with each object simply by making sure that we have the appropriate metadata at the right index in any given array storing said metadata. Thus, given any object X, we can get, for instance, the depth, parent, or data associated with that object by using indexing:

depth[X] 
parent[X]
data[X]

And of course, since we can use a vector of objects Xs, now, depth[Xs] will give us all the depths of all the objects.

This is the setup that then makes things fun. My favorite example of computing over indices right now is the following pass in my compiler:

⍝ Specialize functions to specific formal binding types
rc←(≢p)⍴1 ⋄ isa isd←⊣@p⍨¨↓(⊂3 7)⌷(9⍴2)⊤k
rc←1 1 2 4 8[k[i]]@(i←⍸(t=F)∨isa∧t=T)⊢rc
rc←1 1 1 2 4[k[i]]@(i←⍸(t=T)∧~isa)⊢rc
_←{r[⍵]⊣x×←rc[⍵]}⍣≡r⊣x←rc ⋄ j←(+⍀x)-x ⋄ ro←∊⍳¨x
p t k n r vb lx rc isa isd pos end⌿⍨←⊂x
p r{j[⍺]+⍵}←⊂⌊ro÷rc ⋄ vb[i]←j[vb[i]]+⌊ro[i]÷(x⌿x)[i]÷x[vb[i←⍸vb>0]]
k[i]←0 1 2 4 8[k[i]](⊣+|)ro[i←⍸(t=F)∨(t=T)∧isa]
k[i]←0 1 2 4 8[k[i]](⊣+isd[i]+2×|)ro[i←⍸(t=T)∧~isa]

This is function specialization, which creates a copy of each function body specialized to a specific function signature for different types of arguments. There's a whole lot going on here, which is why it's fun, and it's also fun because it really showcases how you can bring the whole weight of APL to bear on computation over indices to make your code simpler.

If you think about how you might do this in a functional language, you would traverse the tree downwards, doing a "replication" at each point in the recursion where you encountered one of these functions that needs specialization. You would probably have to thread through some data to ensure that you knew how much you were specializing for each node, and you'd need to find a way to reconstruct the tree back up together in the right way upwards in the return path of the recursion.

Fundamentally, we need a few things:

  1. How much to replicate each node
  2. How to replicate them
  3. How to make sure all the replicated nodes point to the right things

We are able to do all of this through index computation instead of using recursion, and we are able to simultaneously fix up all of the other pointers that exist all throughout our code in one nice, neat little package. In our case, each node has three major "pointer fields":

  • p: the parent of the node
  • r: the enclosing scope of the node
  • vb: the binding point of the node

Each of these pointer vectors points to other nodes in the tree.

To actually do all this, we first figure out the replication count rc of each node. That is what the first three lines does. We are computing how much each node should replicate based on only its own value. So certain function nodes get replicated multiple times, while all other nodes only replicate once.

Then, we compute an extended replication count x that encompasses the total number of times a node will be replicated in actuality, because a node is replicated not only based on itself, but also on how many times its parents are replicated. If you look in the _← expression, you'll see that we're doing a few optimizations here, which is also thanks to arrays of objects as indices. We use r instead of p to walk up the tree, because we know that only functions actually have non-singleton replication counts, and r is a walk up through those function points. We use the indexing operation to do the mapping operations instead of needing some specific array mapping operation with a funtion abstraction and field getter to do the actual traversal, so that updating at each stage in the tree walk is simply x×←rc[⍵] where ⍵ is our current ancestor up the r tree. This is essentially a product over the rc of ancestors of function type for each node, or you could write that as ×⍀rc[All my ancestors].

Now that we have the extended replication count, we are going to replicate all nodes in the tree according to how much they should be replicated, but just doing that will completely break all the pointers (which are just indices), so we will need to be sure that we can rebuild all the pointers at the end. Since we are using ⌿ to do the replication, we know that all the nodes will be replicated contiguously, that is, the replicated nodes will appear in the new array as a "bunch" of the same node repeated one after another. This means that we can compute the starting point for each node's replication group j as the exclusive sum scan j←(+⍀x)-x. Then, we can compute an "offset" or the sub-id of a replicated node indicating which replicated node it is, called ro, as ro←∊⍳¨x. Again, all just a bunch of index and offset calculations based on contiguously arranged data. We can take advantage of all this precisely because our primitives provide mathematically precise information about where elements will exist according to their indices. Or, put another way, our primitives provide predictable indices for elements in their outputs.

Once we have that, we can go ahead and do the replication. This will extend the tree with new nodes, but all the pointers will now be busted. Fortunately, for each replication, we can tell exactly where the pointers should point based on j and ro. Using these, the basic idea is given in our recalculation of the parent tree p and "scope tree" r:

p r{j[⍺]+⍵}←⊂⌊ro÷rc

Here, we're doing a direct recomputation on our indices to make sure that the newly replicated nodes point exactly where they should. But, we can also be more complex in our index computation, as we need to do for the vb pointer vector, which can be negative:

vb[i]←j[vb[i]]+⌊ro[i]÷(x⌿x)[i]÷x[vb[i←⍸vb>0]]

And then of course we can update the type information tags, which is what we do in the last two lines, again, based on the offset information we calculated.

As you can see, at every point here, we are taking advantage of indexing, but in a lot of subtle ways that aren't obvious if you haven't learned to think in APL. One of the most basic points, which is lost in other languages, is the idea that when we create new objects, we know exactly what index they have (where they are) relative to everyone else, which means that their relative positions are known beforehand, unlike in a garbage collected language where you create new objects without any knowledge of where that object will be allocated relative to other objects.

When you get right down to it, this is literally just applying low level object management thinking using more friendly, high level language constructs. All object references are just indices into some memory region, but in most languages, those references are opaque and unpredictable, and might change under your feet due to garbage collection, which means that we can't say anything about them other than whether one reference is equal to another reference. By taking all that back, we are controlling "where" our objects are created in known, limited index spaces, which subsequently provides for a lot of power in reasoning about those references, and we can now useful compute over them to avoid unnecessary pointer chasing.

Furthermore, notice that we are doing "object allocation" but within known arenas. We aren't just doing generalized pointers, but instead, each index is constrained to exist only relative to specific other known objects of a certain class. This is similar in spirit to the arena allocator technique in C, and we gain some of the same benefits, because we don't have to garbage collect these regions. We simply remove them when we are done, and we know that everything they could have pointed to is also gone. Thus, we gain more control over our object allocations, and better handling on what our regions can point to. It's a more precise way of speaking.

This precise way of speaking about objects as indices also gains some additional benefits that aren't always obvious. For once, the number of objects in a region is likely to be relatively small, which means we are likely to not need full 64-bit integers to reference them, which means that we are also likely to save on memory costs by doing computation over objects in a 16-bit or 32-bit address space all the time instead of using a 64-bit address space by default. This all happens magically under the hood in an APL system like Dyalog with type squeezing.

This small range addressing gives us another benefit if we want to apply this technique to symbols or string interning. Because we are creating the string internment or symbol table region ourselves, we don't have to use generalized pointers as symbols (or other tagged objects), which is the typical technique in other languages. This means that often our symbols are very small in size, and it's not uncommon in databased type applications to be able to have symbol tables where the size of a symbol is 8 bits or less. That's an 8 times memory gain over 64-bit symbols in a typical generalized system. Some other programming environments provide these benefits only when you go into a relational database context, and sometimes not even there, but here, this gives us all those benefits all the time for any problem we want, and so the technique has a much less harsh cost to use, since you don't have to commit to storing data in a relational database system just to get these storage optimization advantages.