A Philosophy on Scheme Modules
Update 2: I should note that because library forms in this proposal are syntax, they are just Scheme code, and as such, the implementation is presumed to provide some default environment, likely to contain the library forms.
Update: I have altered my proposal to require the name position, but allow it to be false, indicating an anonymous module.
Almost every Scheme system worth its salt has some means of encapsulating a set of definitions and expressions and providing controlled visibility to those bindings created in some fashion. Recent discussions on r6rs-discuss have brought the classic debate about modules and library/packaging systems to the forefront of the Scheme community yet again.
In this article, I hope to address my own concerns regarding a module system, make some observations about solutions, and propose a direction for the development of a standardized module system. The end result is, I hope, a module system that feels "Schemely;" that is, a general, expressive construct that provides the means of satisfying the needs of the entire Scheme community with regards to module systems. This goal may never be fully achieved, but the ideas I present here, I hope, will make it possible to move towards that goal.
A Brief Historical Examination
Before I tackle this issue in full, I would to examine some of the past history of Scheme modules systems.
Before module systems were widespread, forms like EVAL-WHEN and LOAD were used to control the order of loading of files containing Scheme code. Note that there was no control of visibility here, just controlling the order of evaluation of Scheme code.
Later, a number of systems for managing Scheme code were developed. I will mention the three most relavant systems to my discussion here: the Scheme48 module system, the Chez Scheme module system, and the R6RS Library form.These three systems represent two opposite approaches to Scheme modules as well as the current module standard, which is, of course, a compromise among the systems.
The Scheme48 package system (which I can only discuss at a high level, since I have not actively used this packaging system for some time) represents the philosophies of separate implementation and interface, separate package declaration from code locations, and the idea that module systems exist more as a metalanguage than as a syntax of the Scheme language. In a sense, these modules represent a static description of the relations between groups of definitions and expressions.
The Chez Scheme module system represents the other end of the spectrum, where a module form is viewed as just another syntactic extension of the language for controlling scope and visibility. There is a basic form for encapsulating code and making only select identifiers visible to the outside world. Additionally, there is an import form which may appear in any place that a definition may.
The R6RS Library form represents a compromise of goals between the two systems. It cannot be generated by macros, and the import form is a direct part of the R6RS library form. However, there is no separation of interface and the module form itself is tied to the source code unless additional macros are used to separate the source code from the library form.
Modules versus Packages
The term "module system" can confuse the discussion somewhat, because people have different ideas of what should constitute a sufficient module system. Generally, there are those who look at module systems as package management systems, which are used as a "distribution fomat" for source code. This is much like the Scheme48 approach. Others view modules as a building block inside of Scheme, like the Chez Scheme module system.
Originally, the R6RS library form was touted as being a distribution system by some. Unfortunately, it lacks some of the important features that many consider necessary for that purpose. At the same time, it fails to satisfy the needs of the syntactic module crowd, since the library system is entirely top-level and static.
Desirables in a Module System
What important features should be in a module system? Again, the two camps of module philosophies will have two different answers, some of which will not agree.
The "packaging system" crowd's primary use of modules is to describe interfaces to code, to make discoverable descriptions of that code, and to make it easier to control the loading and evaluating of the various components. Generally speaking, these systems benefit from the following features:
The syntactic module crowd favors the ability to use modules in a variety of locations to do micro-packaging. This means they may be generated by macros, and may not even have names that map directly to files at all. The forms may also be more closely tied to the rest of the Scheme code, because the code itself is generating modules. The features that tend to be important here are:
Generally speaking, the two crowds don't approve of the module systems created by the other, because, obviously, they have conflicting goals.
If we are to design a module system that will work, I submit that we cannot strictly follow either philosophy because it is the right one. That is, neither view is right, and we should figure out ways to handle both.
Some have suggested (and I have generally agreed) that perhaps having two different systems in Scheme will make it possible to have a packaging system together with syntactic modules, both disjoint from each other and neither requiring the other.
I have thought this was a good idea. After all, it would satisfy the needs of almost everyone. However, doing this in the standard will create a much greater number of constructs than necessary, and chances are, both standards will not end up in the core Scheme. WG 1 requires a module system, but if there are to be two of them, I doubt that the community would support placing both in the Core Scheme document.
I began thinking, then, by going back and trying to approach the problem from what I call my "Schemely Philosophy." Generally speaking, this is a philosophy that tries to take away features and discover the most expressive, practically useful construct that makes implementing the other features in a standard unnecessary. Scheme has traditionally succeeded in having a great many general features that allow you to express a great deal of other things without having to require them in the standard.
With this in mind, I propose a different primary goal for a standard module system: generality. It should be possible to create, from this module system, an interface on top of it that will satisfy the needs of either crowd. In other words, while the module system need not satisfy the needs of either crowd, but it should be possible to build systems on top of the module system that will satisfy one crowd or the other. This is generality.
Moreover, I believe that this initial module system should be simple. It should not require a great deal to understand the core constructs.
It should also be backwards compatible, such as can be done, with R6RS libraries. The reason I put forth this requirement is because it doesn't make sense to ignore the one standard library system that we have, unless it really makes sense to do so.
How would such a system look?
The first conclusion to be made about this system is that it would have to be syntactic. It is possible that a syntactic module system can be used as the base to a standard, discoverable library description language, but it is not possible for a static description of module systems to somehow enable syntactic modules. In order for the two to cooperate, the core forms must be syntactic.
In other words, examining the requirements of the two systems, it appears clear to me that the syntactic approach is more general.
Nonetheless, there are faults with Chez Scheme's module system which makes it inadequate as it standards.
With the pre-release of Chez Scheme v8.0, the import form in Chez Scheme can take either module names or R6RS library descriptions. This makes it possible to search for the library, but the Chez Scheme module system's naming scheme doesn't permit a naming scheme that makes it easy to map names to files in an useful way. Additionally, Chez Scheme's module system uses a positional export form, so extending the naming convention will result in ambiguous module forms. Clearly, while a syntactic module system would be good, the existing example I cite here will not work.
Let's examine the R6RS library form however. Syntactically, it uses an export keyword to identify the exports, and this allows the naming convention to be the way it is. Nothing about it is inherently static, and it would be easy to extend the naming of the libraries to handle single identifiers.
So, the module system I propose consists of two forms:
In the above form, I have tried to avoid introducing anything new that does not already exist in current module systems. This is almost like the existing R6RS library form, and is backwards compatible with it. I provide for simpler names, and I allow for the Chez Scheme method of specifying syntactic dependencies in exports (which is something that should have been there from the beginning in my opinion). This makes it possible to be more efficient in handling libraries. I have allowed the intermingling of expressions and definitions above, but I am not tied to this, and would be willing to accept only definitions followed by expressions, since this is in fact, how Chez Scheme's module system does it, and is the current modus operandi on R6RS. I have also made the name of the library optional. This is to allow for anonymous modules, which is essential if they are to be used effectively in macros and to be sufficiently general for implementing packaging description languages.
The major difference here is not in the form, which is basically the same, but in the fact that this form should be syntactic, in that you can generate it from macros. It should be possible to nest these library forms. I make no arguments about how they should map to files, since this should be up to the implementation.
I have removed, in the above, the import form. This is because this import form should be usable anywhere, and is, in this proposed system, its own form, and not a component of the library syntax. I would like to define two import forms which I believe are generally useful enough to be included.
The above is a combination of R6RS and Chez Scheme module import forms. Multiple specs may be listed in a single import form, but drop-prefix and alias have been added. The use of import-only means that only those identifiers imported from the import specs listed will be visible in the scope that the import-only form affects. This is useful when you want to generate these module forms.
These forms, import, library, and import-only can appear in any definition context.
I am also proposing that include, and possibly include/ci be a part of the standard. This will easy the creation of module and source code separation.
It should have the effect of expanding into the forms from the specified file. The /ci variant should be a case-insensitive version.
It is possible to do this with macros, so these forms are not strictly necessary, but they are of general interest, and make it much easier to write a sophisticated macro system. Additionally, it is more likely that Scheme's will include useful positioning information if the include forms are built in, rather than losing much positioning information from the current R6RS implementations of include.
But wait! Foul! Foul!
"This is just syntactic macros, you're just selling out the static package folks," I hear you say! No, actually, I am not. I am suggesting a standard module system that is general enough to be used by both crowds. The syntactic module crowd won't have to develop any new syntax to use this system, and the static crowd will have to do some extra macrology; this is true. Nonetheless, at least it is possible for both crowds to use the same underlying system! Otherwise, this is not possible.
An astute reader will also notice that I am proposing this be the module system for both WG 1 and WG 2. In fact, I believe that this system is simple enough for both crowds to use, is backwards compatible, and general.However, said reader will observe that if no procedural macro system is provided by the WG1, it will not be possible to create macros of sufficient expressiveness to create the static package description language. Yes, this is true.
The question then becomes, is the module system in WG1 supposed to be such that it satisfies everyone? Should it be a compromise to cater only to a select few, or really, cater to no one in particular, making no one happy with it? In the end, I contest that WG1 should have a module system that is simple and general, and not require additional work by WG2 to use it for all module systems. The above system would satisfy these conditions. I don't think it is necessary that a package description language be available in WG1, just that the system specified there facilitates the creation of one at the WG2 level.
The Evaluation of this System: Benefits and Drawbacks
Obviously, the drawbacks of this system are that by default, it ties the library system to code evaluation, which many people consider a bad thing. The R6RS library form does this as well. The syntactic base will also be seen by some as making it difficult for people who care about the "introspectability" or "discoverability" of modules.
Yes, the module system above will require some extra work to make a suitably sophisticated system on top of it that will satisfy the needs of the static description language crowd. It is however, possible to do this, even to the point that modules defined in this way may be introspected procedurally, to discover their imports and exports, &c. Moreover, once this is done once, in portable Scheme, this system will be portable to all compliant systems, making it more portable than existing solutions. This makes it possible to have the best of both worlds while maintaining a simple standard. WG2 may even want to develop an implementation of the static language on top of this proposed system.
The system itself has more potential benefits, however, that I believe outweigh the minor inconvenience presented by the above argument. Firstly, it is general enough to handle the entire spectrum of module systems. Secondly, it introduces no new concepts. All the above features exist already in one or another module systems, and the majority of the syntax comes directly from the existing library standard.
The above system is also fully backwards compatible with the R6RS library standard, making the transition to the new module system that much easier. Thus, this system holds to tradition, promotes maximum backwards compatibility, and is at the same time a very simple system.
Conclusion
The above module system proposed is simple, general, and backwards compatible, with no new features introduced, and promotes all of the necessary core features necessary to make an effective module system of most any desirable shape. The system is simple enough to be incorporated into WG1, and is expansive enough to require no changes for WG2, since it is a full module system. It is, in my opinion, the right approach to making a module system that is maximally applicable, while retaining the simple qualities of a good Scheme solution.
This proposal is obviously only a draft, and I would readily accept feedback on this issue.
Update: I have altered my proposal to require the name position, but allow it to be false, indicating an anonymous module.
Almost every Scheme system worth its salt has some means of encapsulating a set of definitions and expressions and providing controlled visibility to those bindings created in some fashion. Recent discussions on r6rs-discuss have brought the classic debate about modules and library/packaging systems to the forefront of the Scheme community yet again.
In this article, I hope to address my own concerns regarding a module system, make some observations about solutions, and propose a direction for the development of a standardized module system. The end result is, I hope, a module system that feels "Schemely;" that is, a general, expressive construct that provides the means of satisfying the needs of the entire Scheme community with regards to module systems. This goal may never be fully achieved, but the ideas I present here, I hope, will make it possible to move towards that goal.
A Brief Historical Examination
Before I tackle this issue in full, I would to examine some of the past history of Scheme modules systems.
Before module systems were widespread, forms like EVAL-WHEN and LOAD were used to control the order of loading of files containing Scheme code. Note that there was no control of visibility here, just controlling the order of evaluation of Scheme code.
Later, a number of systems for managing Scheme code were developed. I will mention the three most relavant systems to my discussion here: the Scheme48 module system, the Chez Scheme module system, and the R6RS Library form.These three systems represent two opposite approaches to Scheme modules as well as the current module standard, which is, of course, a compromise among the systems.
The Scheme48 package system (which I can only discuss at a high level, since I have not actively used this packaging system for some time) represents the philosophies of separate implementation and interface, separate package declaration from code locations, and the idea that module systems exist more as a metalanguage than as a syntax of the Scheme language. In a sense, these modules represent a static description of the relations between groups of definitions and expressions.
The Chez Scheme module system represents the other end of the spectrum, where a module form is viewed as just another syntactic extension of the language for controlling scope and visibility. There is a basic form for encapsulating code and making only select identifiers visible to the outside world. Additionally, there is an import form which may appear in any place that a definition may.
The R6RS Library form represents a compromise of goals between the two systems. It cannot be generated by macros, and the import form is a direct part of the R6RS library form. However, there is no separation of interface and the module form itself is tied to the source code unless additional macros are used to separate the source code from the library form.
Modules versus Packages
The term "module system" can confuse the discussion somewhat, because people have different ideas of what should constitute a sufficient module system. Generally, there are those who look at module systems as package management systems, which are used as a "distribution fomat" for source code. This is much like the Scheme48 approach. Others view modules as a building block inside of Scheme, like the Chez Scheme module system.
Originally, the R6RS library form was touted as being a distribution system by some. Unfortunately, it lacks some of the important features that many consider necessary for that purpose. At the same time, it fails to satisfy the needs of the syntactic module crowd, since the library system is entirely top-level and static.
Desirables in a Module System
What important features should be in a module system? Again, the two camps of module philosophies will have two different answers, some of which will not agree.
The "packaging system" crowd's primary use of modules is to describe interfaces to code, to make discoverable descriptions of that code, and to make it easier to control the loading and evaluating of the various components. Generally speaking, these systems benefit from the following features:
- Static, top-level metalanguages for package descriptions.
- Separation of source code, implementation modules, and interfaces.
- Some way to map libraries to files to make loading of software more automatic.
The syntactic module crowd favors the ability to use modules in a variety of locations to do micro-packaging. This means they may be generated by macros, and may not even have names that map directly to files at all. The forms may also be more closely tied to the rest of the Scheme code, because the code itself is generating modules. The features that tend to be important here are:
- Syntactic, dynamic module forms that can occur inside of code, and not just at the top-level.
- The ability to create anonymous modules.
Generally speaking, the two crowds don't approve of the module systems created by the other, because, obviously, they have conflicting goals.
If we are to design a module system that will work, I submit that we cannot strictly follow either philosophy because it is the right one. That is, neither view is right, and we should figure out ways to handle both.
Some have suggested (and I have generally agreed) that perhaps having two different systems in Scheme will make it possible to have a packaging system together with syntactic modules, both disjoint from each other and neither requiring the other.
I have thought this was a good idea. After all, it would satisfy the needs of almost everyone. However, doing this in the standard will create a much greater number of constructs than necessary, and chances are, both standards will not end up in the core Scheme. WG 1 requires a module system, but if there are to be two of them, I doubt that the community would support placing both in the Core Scheme document.
I began thinking, then, by going back and trying to approach the problem from what I call my "Schemely Philosophy." Generally speaking, this is a philosophy that tries to take away features and discover the most expressive, practically useful construct that makes implementing the other features in a standard unnecessary. Scheme has traditionally succeeded in having a great many general features that allow you to express a great deal of other things without having to require them in the standard.
With this in mind, I propose a different primary goal for a standard module system: generality. It should be possible to create, from this module system, an interface on top of it that will satisfy the needs of either crowd. In other words, while the module system need not satisfy the needs of either crowd, but it should be possible to build systems on top of the module system that will satisfy one crowd or the other. This is generality.
Moreover, I believe that this initial module system should be simple. It should not require a great deal to understand the core constructs.
It should also be backwards compatible, such as can be done, with R6RS libraries. The reason I put forth this requirement is because it doesn't make sense to ignore the one standard library system that we have, unless it really makes sense to do so.
How would such a system look?
The first conclusion to be made about this system is that it would have to be syntactic. It is possible that a syntactic module system can be used as the base to a standard, discoverable library description language, but it is not possible for a static description of module systems to somehow enable syntactic modules. In order for the two to cooperate, the core forms must be syntactic.
In other words, examining the requirements of the two systems, it appears clear to me that the syntactic approach is more general.
Nonetheless, there are faults with Chez Scheme's module system which makes it inadequate as it standards.
With the pre-release of Chez Scheme v8.0, the import form in Chez Scheme can take either module names or R6RS library descriptions. This makes it possible to search for the library, but the Chez Scheme module system's naming scheme doesn't permit a naming scheme that makes it easy to map names to files in an useful way. Additionally, Chez Scheme's module system uses a positional export form, so extending the naming convention will result in ambiguous module forms. Clearly, while a syntactic module system would be good, the existing example I cite here will not work.
Let's examine the R6RS library form however. Syntactically, it uses an export keyword to identify the exports, and this allows the naming convention to be the way it is. Nothing about it is inherently static, and it would be easy to extend the naming of the libraries to handle single identifiers.
So, the module system I propose consists of two forms:
<library> := (library [<name>] <exports> . <body>) <name> := #identifier | <r6rs library name> <exports> := (export <export-spec> ...) <export-spec> := #identifier | (#identifier #identifier ...) <body> := (#expr|#def #expr|#def ...)
In the above form, I have tried to avoid introducing anything new that does not already exist in current module systems. This is almost like the existing R6RS library form, and is backwards compatible with it. I provide for simpler names, and I allow for the Chez Scheme method of specifying syntactic dependencies in exports (which is something that should have been there from the beginning in my opinion). This makes it possible to be more efficient in handling libraries. I have allowed the intermingling of expressions and definitions above, but I am not tied to this, and would be willing to accept only definitions followed by expressions, since this is in fact, how Chez Scheme's module system does it, and is the current modus operandi on R6RS. I have also made the name of the library optional. This is to allow for anonymous modules, which is essential if they are to be used effectively in macros and to be sufficiently general for implementing packaging description languages.
The major difference here is not in the form, which is basically the same, but in the fact that this form should be syntactic, in that you can generate it from macros. It should be possible to nest these library forms. I make no arguments about how they should map to files, since this should be up to the implementation.
I have removed, in the above, the import form. This is because this import form should be usable anywhere, and is, in this proposed system, its own form, and not a component of the library syntax. I would like to define two import forms which I believe are generally useful enough to be included.
<import> := (import|import-only <import-spec> <import-spec> ...) <import-spec> := <R6RS library reference> #identifier (only <import-spec> #identifier ...) | (except <import-spec> #identifier ...) | (prefix <import-spec> #identifier ...) | (drop-prefix <import-spec> #identifier) | (rename <import-spec> (#identifier #identifier) ...) | (alias <import-spec> (#identifier #identifier) ...)
The above is a combination of R6RS and Chez Scheme module import forms. Multiple specs may be listed in a single import form, but drop-prefix and alias have been added. The use of import-only means that only those identifiers imported from the import specs listed will be visible in the scope that the import-only form affects. This is useful when you want to generate these module forms.
These forms, import, library, and import-only can appear in any definition context.
I am also proposing that include, and possibly include/ci be a part of the standard. This will easy the creation of module and source code separation.
<include> := (include|include/ci <file-name-string>)
It should have the effect of expanding into the forms from the specified file. The /ci variant should be a case-insensitive version.
It is possible to do this with macros, so these forms are not strictly necessary, but they are of general interest, and make it much easier to write a sophisticated macro system. Additionally, it is more likely that Scheme's will include useful positioning information if the include forms are built in, rather than losing much positioning information from the current R6RS implementations of include.
But wait! Foul! Foul!
"This is just syntactic macros, you're just selling out the static package folks," I hear you say! No, actually, I am not. I am suggesting a standard module system that is general enough to be used by both crowds. The syntactic module crowd won't have to develop any new syntax to use this system, and the static crowd will have to do some extra macrology; this is true. Nonetheless, at least it is possible for both crowds to use the same underlying system! Otherwise, this is not possible.
An astute reader will also notice that I am proposing this be the module system for both WG 1 and WG 2. In fact, I believe that this system is simple enough for both crowds to use, is backwards compatible, and general.However, said reader will observe that if no procedural macro system is provided by the WG1, it will not be possible to create macros of sufficient expressiveness to create the static package description language. Yes, this is true.
The question then becomes, is the module system in WG1 supposed to be such that it satisfies everyone? Should it be a compromise to cater only to a select few, or really, cater to no one in particular, making no one happy with it? In the end, I contest that WG1 should have a module system that is simple and general, and not require additional work by WG2 to use it for all module systems. The above system would satisfy these conditions. I don't think it is necessary that a package description language be available in WG1, just that the system specified there facilitates the creation of one at the WG2 level.
The Evaluation of this System: Benefits and Drawbacks
Obviously, the drawbacks of this system are that by default, it ties the library system to code evaluation, which many people consider a bad thing. The R6RS library form does this as well. The syntactic base will also be seen by some as making it difficult for people who care about the "introspectability" or "discoverability" of modules.
Yes, the module system above will require some extra work to make a suitably sophisticated system on top of it that will satisfy the needs of the static description language crowd. It is however, possible to do this, even to the point that modules defined in this way may be introspected procedurally, to discover their imports and exports, &c. Moreover, once this is done once, in portable Scheme, this system will be portable to all compliant systems, making it more portable than existing solutions. This makes it possible to have the best of both worlds while maintaining a simple standard. WG2 may even want to develop an implementation of the static language on top of this proposed system.
The system itself has more potential benefits, however, that I believe outweigh the minor inconvenience presented by the above argument. Firstly, it is general enough to handle the entire spectrum of module systems. Secondly, it introduces no new concepts. All the above features exist already in one or another module systems, and the majority of the syntax comes directly from the existing library standard.
The above system is also fully backwards compatible with the R6RS library standard, making the transition to the new module system that much easier. Thus, this system holds to tradition, promotes maximum backwards compatibility, and is at the same time a very simple system.
Conclusion
The above module system proposed is simple, general, and backwards compatible, with no new features introduced, and promotes all of the necessary core features necessary to make an effective module system of most any desirable shape. The system is simple enough to be incorporated into WG1, and is expansive enough to require no changes for WG2, since it is a full module system. It is, in my opinion, the right approach to making a module system that is maximally applicable, while retaining the simple qualities of a good Scheme solution.
This proposal is obviously only a draft, and I would readily accept feedback on this issue.