7 min read

Flat Program Organization: An Email Response

Flat Program Organization: An Email Response
Photo by Martin Sanchez / Unsplash

A fellow APLer, Tort, started an interesting email discussion with me, and one of his points was good enough that I thought the discussion was worth posting here. I reserve the right to refine and amend my ideas as I go forward, since email threads aren't always as polished as other things, but I wanted to put this out to a broader audience as well.

When you put everything at the top level, what keeps you from worrying about what parts people focus on? That is, how do you make it so it doesn’t really matter what people focus on so they get the right idea of what you were thinking for the application?

Say for example, reading the co-dfns code, how could you ensure that someone reading the code would be able to say “I understand how this works” regardless of what part they first decide to stare at? I imagine people would eventually read the entire code, but isn’t segregation of information important so people can focus on high level steps before nitty gritty implementation details?

This is a great question and is at the heart of what I think represents a divide amongst two camps in the software engineering and computing communities.

Reframing the question a little bit, everyone agrees that we need to be able to manage complexity, which means being able to *not* think about some things when we think about others. If we have to think about everything all at once, then there is far too much of a limit on what we can do. Good systems allow us to scale our thinking beyond being able to keep "everything" in our heads all the time, or, conversely, they enable us to retain more in our heads all at once through good designs that facilitate various means of mental abstraction and chunking.

While this isn't at question, what is at question is what this should reify down to in code. There are two related but slightly different questions. Firstly, to what degree should intent and organization be explicitly represented in the text of the source versus other organizational details, such as file system layout, order of presentation, indentation, and so forth? Secondly, to what degree should information be *hidden* from the reader?

Broadly speaking, the more you desire to hide information from the reader, the more you will need to explicate or reify organizational details that enforce those restrictions in some way, and the more likely you are to desire some sort of name that sits literally in the code to substitute for some hidden portion of the code. So, in a sense, there is a real correlation between how much information hiding you think there should be and how much explicit organization is *required* (versus just optionally nice to have).

The first camp consists of those people who tend to believe that stronger and more mechanically or implementation enforced information hiding and segregation is important. The second camp argues that transparency, and the ability to understand all the layers of the system at any time is more important.

This relates to how we manage information complexity, because the first camp above believes that we primarily manage complexity by hiding information. In APL terms, the second camp instead argues in favor of "subordinating detail" (c.f. Iverson's Turing Award Lecture). The difference is that information hiding prevents the user from seeing the details unless he specifically seeks them out at some different layer, but the upper and lower layers are completely obscured from one another, whereas subordinating detail means that some information is highlighted, but that highlighting does not come at the expense of obscuring those details, so you can see both layers at the same time, though with different precedence or weight.

APL as a language firmly falls into the camp of "subordinating detail" rather than hiding information. I take this further and argue very much in favor of implicit organization as much as possible so that explicit organization (which is really just boilerplate) is unnecessary. That is, if the code isn't actively contributing something, but is just there to keep things "tidy" in the system, then ideally, I would want to find a way to remove that code and tidy up the remaining code using some implicit or "invisible" means. This might be by using some pattern or structure that puts some sort of constraint on the arrangement of the code, but doesn't actually "show up" in the code explicitly. It might be by choosing how I use variables in some very systematic way. It might be in some limitation on the way that references are made, and so forth. All of these take something *out* of the code, but they don't add anything back *into* the code, in the form of text or syntax.

Importantly, you asked a really telling question (series of questions), in that you want to be able to look at *any* part of the code and understand what they are reading "in situ." This is what I call the "local readability metric." That is, can I pull up (as is usually the case) a single method, and know exactly what this method does, why it is there, and what is expected of it, without needing to look at anything else? But you take it a little bit further by also saying that you want people to be able to see high level steps before nitty gritty details. That's a bit stronger desire than just local readability.

I would claim that the answer to your question is, NO, in fact, I do *not* want to ensure that people can have local readability of my source code at the expense of global systems comprehension. That is, if I could add something that would enhance local readability, but it would mean that the system becomes a little bit harder to see in total or in whole, then I would *not* put that in, and I would consider that a *bad* addition to the code.

The most important readability metric for me is whether or not I can deeply understand how the whole system fits together, and importantly, see patterns and insights *across* the boundaries of various components, modules, or functions. This cannot happen if I cannot see into the various modules at the same time. I need to be able to see just how a change in one part of the system will have effects in another part of the system.

This doesn't mean that I design my system such that I can't work without affecting the whole system, quite the opposite, but it is important to me that I can see as much of the detail when thinking about the architecture as possible, not hiding it, because the big architectural insights come from me seeing the details inside of the system, rather than just the movement of big, black box pieces.

Another way of putting this is that I try to design my systems to remove as much boilerplate as possible. That means, that, as much as possible, I want no part of the system to be "trivial" or meaningless. It should *all* be meaty. That means if you look at any piece of code, ideally, you're not dealing with little unnecessary details, but you're always dealing with, as much as I can make it, an architectural level system description. That means that every part of the system has meaning. The cost of that is that every part of the system has meaning.

Thus, in a sense, even the change of a single significant character in the code might constitute an "architectural level" question, because I've eliminated the need to worry about any of the smaller level questions. In practice, it's not quite that good, but I think I get closer than many other projects.

If I'm dealing with seeing how a system interacts with itself, the last thing I want is to segregate my system in such a way that I can't reason easily about the whole because each piece is too isolated from the next. There is a time and place for isolation, but the Co-dfns compiler is no where near that level. Indeed, one of the things that I think makes APL somewhat interesting is that you are often able to solve rather larger problems than you expect without ever encountering the weight that would make you start to need such heavyweight concepts like information isolation. It's not that there isn't a reason to do that, but if you don't have to do that and you can solve a big problem, I think that's better, and I'm not the only one who thinks this way, because monoliths are easier to iterate on.


So, to answer your question, I don't want people to just dive into any part of the code and then get that code out of context. Doing that would probably 10x the size of the codebase and make it much, much harder to do anything. Instead, I want people to be able to easily see how the whole thing fits together by examining the entry points into the system, so that they can easily understand the flow of information through the system, and then they can relatively simply go from there to understand the specific transformations over that flow that occur throughout the system, with each of the associated invariants. However, almost all of that organization I try to make implicit through things like ordering, structure, and shape/whitespace, rather than using explicit names and structures in the code itself.

Because I'm using the linear data flow model throughout the entire system, then, in a sense, the entire system can be read like a book, and if you ignore the specific "chapters" or paragraphs, then you can simply watch the flow of the system through the whole thing by reading from the top and going to the bottom. The function Fix establishes the "books" if you will as a series of parts:

    _←          TK ⍵⊣⍞←'T'
    _←a n s src←PS _⊣⍞←'P'
    _←          TT _⊣⍞←'C'
    _←        ⍺ GC _⊣⍞←'G'
    _←        ⍺ CC _⊣⍞←'B'
              n NS _⊣⍞←'L'

That has been the documented main entry point for the compiler for pretty much the entire time it has even looked anything remotely close to this. That provides all the structure that you need to see how to understand how all the rest of the parts of the system work when you combine it with a read of the manual.md document.

See also: https://eandt.theiet.org/content/articles/2011/02/profile-turing-lecturer-donald-knuth/