The Problem

I've spent some time with Cleanroom Software Engineering attributed to Harlan Mills. Without a doubt it is my favorite development technique. In a nutshell, CSE gives me the following benefits:

  1. Certified reliability metrics of the code base
  2. Formal verification of code correctness
  3. Certification of code based on actual usage models
  4. A clear and direct line of connection and blame from requirements specification to final executable system
  5. A systematic approach to program construction

In the traditional Cleanroom Software Engineering method, there exist a few well thought out methods that combine synergistically to produce the above results:

  1. Stochastic black-box testing
  2. Formal methods of program development
  3. Sequence-based enumeration for software specification
  4. Iterative development cycles
  5. Direct program derivation from Requirements through a series of refinements
  6. Use of program review and teams to audit code regularly and formally

It's also no secret that I find the APL programming languages one of my favorite programming language around. Unfortunately, in the creation of Co-dfns, Cleanroom Software Engineering has helped me some, and in some big ways, but it has not been able to help me throughout the entire process, which I consider a failure of technology in a big way. It fails for a few reasons:

  1. Difficulty of specifying the compiler using sequence-based enumeration
  2. Difficulty of deriving the nanopass style compiler architecture from Sequence-based Black-box specifications
  3. Difficulty of verifying the implementation against a black-box specification when the implementation is written in vectorized APL style code without branching or recursion

Basically, on two critical paths along the CSE path, namely, Enumeration → Black Box → Implementation, the CSE technology that I have been able to find doesn't cut the mustard. Let me address each of these in turn and postulate on what would be necessary to fix it.

Difficulty in specifying a compiler with sequence-based enumeration

Sequence based enumeration enables me to derive a black box specification for a system through rote enumeration of an input domain systematically. It does so in a way that I can avoid infinite loops in the process and also ensures that the resulting specification is both complete, consistent, and correct. In other words, there are no corner cases that I have missed, and the results are correct, or, I didn't accidently provide a specification that doesn't make sense. I discover inconsistencies and redundancies in the system requirements and I'm able to easily determine correct behavior because I can deal with each case one at a time without worrying about other cases.

Unfortunately, this method of enumeration seems to be difficult to apply to a compiler. I was able to get a very good enumeration of the parser of the compiler, and that worked so well that I was able to uncover bugs and other things in the interpreter upon which I was basing my specification. However, I had to leave critical operational behaviors unspecified here because I could not figure out how to properly specify them.

The first thing I would need to do then, is to fix the specification techniques used here so that they retain their valuable features, but expand their scope to handle the needs of specifying a compiler.

Difficulty in deriving a compiler architecutre from sequence-based black-box specification

In part because of the previous section's difficulties, and the under-specificity of the black-box specification, deriving a compiler from this specification was very difficult. It was even more difficult to derive one of the appropriate architecture. I knew that the architecture I wanted to use was a nanopass architecture, but how do I derive a vectorized compiler pass, much less a complete composition of those compiler passes in such a way that makes sense, given that I'm starting with a sequence enumeration.

Difficulty verifying APL programs against a black-box specification

The stimuli based black-box definition used during specification in the traditional CSE approach does not make it easy to verify this specification against and implementation that uses almost completely bulk, aggregate, data-parallel style operations. While one might determine that a given composition of functions should do the right thing, given the set of intended functions, it's very difficult to determine that the set of intended functions does what one expects, and even more difficult to verify that the composition of these intended functions delivers a program that matches the black box specification.


I don't have any solutions to these problems yet, but I have heard some people talk about what they think are solutions. I've come up with a series of questions for which we can answer yes given any sufficient solution.

  1. Can I take a system requirements written in English and derive an unambiguous black-box specification?
  2. Can I, by construction, assure myself that the black-box specification is complete, consistent, and correct with respect to the system requirements?
  3. Can I systematically derive an algorithm and implementation of the specification which lends itself to good APL style?
  4. Can I rigorously verify that any given implementation so derived correctly implements the specification?
  5. Can I derive stochastic usage models from the specification and sample inputs from the user which are direct and easily verified to be accurate representations of usage based on the specification?
  6. Can I easily generate stochastic test cases from the usage models so derived?
  7. Does the specification approach lend itself to incremental implementation in which black-box testing is both meaningful and possible at each iteration with little to no boilerplate code and little to no stubbing code which needs to be replaced in the future?