Rabbit: A Compiler for Scheme/Chapter 7
7. The Imperative Treatment of Applicative Constructs
Given the characteristics of lexical scoping and tail-recursive invocations, it is possible to assign a peculiarly imperative interpretation to the applicative constructs of SCHEME, which consists primarily of treating a function call as a GOTO. More generally, a function call is a GOTO that can pass one or more items to its target; the special case of passing no arguments is precisely a GOTO. It is never necessary for a function call to save a return address of any kind. It is true that return addresses are generated, but we adopt one of two other points of view, depending on context. One is that the return address, plus any other data needed to carry on the computation after the called function has returned (such as previously computed intermediate values and other return addresses) are considered to be packaged up into an additional argument (the continuation) which is passed to the target. This lends itself to a nonfunctional interpretation of LAMBDA, and a method of expressing programs called the continuation-passing style (similar to the message-passing actors paradigm [Hewitt]), to be discussed further below. The other view, more intuitive in terms of the traditional stack implementation, is that the return address should be pushed before evaluating arguments rather than before calling a function. This view leads to a more uniform function-calling discipline, and is discussed in [Declarative] and [Steele].
We are led by these points of view to consider a compilation strategy in which function calling is to be considered very cheap (unlike the situation with PL/I and ALGOL, where programmers avoid procedure calls like the plague see [Steele] for a discussion of this). In this light the code produced by the sample macros above does not seem inefficient, or even particularly convoluted.
Consider the expansion of (OR a b c): ((LAMBDA (V R) (IF V V (R)))
a
(LAMBDA () ((LAMBDA (V R) (IF V V (R)))
b
(LAMBDA () ((LAMBDA (V R) (IF V V (R)))
c
(LAMBDA () 'NIL-))))))
Then we might imagine the following (slightly contrived) compilation scenario.
First, for expository purposes, we shall rename the variables in order to be able to distinguish them.
((LAMBDA (V1 R1) (IF V1 V1 (R1)))
a
(LAMBDA () ((LAMBDA (V2 R2) (IF V2 V2 (R2)))
b
(LAMBDA () ((LAMBDA (V3 R3) (IF V3 V3 (R3)))
c
(LAMBDA () 'NIL))))))
we shall assign a generated name to each LAMBDA-expression, which we shall notate by writing the name after the word LAMBDA. These names will be used as tags in the output code.
((LAMBDA namel (V1 R1) (IF V1 VI (R1)))
a (LAMBDA nameZ () ((LAMBDA name3 (V2 R2) (IF V2 V2 (R2)))
b
(LAMBDA name4 () ((LAMBDA name5 (V3 R3)
(IF V3 V3 (R3)))
c
(LAMBDA name6 () 'NIL))))))
Next, a simple analysis shows that the variables R1, R2, and R3 always denote the LAMBDA-expressions named name2, name4, and name6, respectively. Now an optimizer might simply have substituted these values into the bodies of name1, name3, and name5 using the rule of beta-conversion, but we shall not apply that technique here. Instead we shall compile the six functions in a straightforward manner. we make use of the additional fact that all six functions are closed in identical environments (we count two environments as identical if they involve the same variable bindings, regardless of the number of "frames" involved; that is, the environment is the same inside and outside a (LAMBDA () ...)). Assume a simple target machine with argument registers called reg1, reg2, etc.
main: <code for a> ;result in regl
LOAD reg2,[name2] ;[name2] is the closure for name2
CALL-FUNCTION 2,[name1] ;call name1 with 2 arguments
name1: JUMP-IF-NIL reg1,name1a
RETURN ;return the value in reg1
name1a: CALL-FUNCTION 0,reg2
name2: <code for b> ;result in reg1
LOAD reg2,[name4] ;[name4] is the closure for name4
CALL-FUNCTION 2,[name3] ;call name3 with 2 arguments
name3: JUMP-IF-NIL reg1,name3a
RETURN ;return the value in reg1
name3a: CALL-FUNCTION 0,reg2 ;call function in reg2, 0 arguments
name4: <code for c> ;result in reg1
LOAD reg2,[name6] ;[name6] is the closure for name6
CALL-FUNCTION 2,[name5] ;call name5 with 2 arguments
name5: JUMP-IF-NIL reg1,name5a
RETURN ;return the value in reg1
name5a: CALL-FUNCTION 0,reg2 ;call function in reg2, 0 arguments
name6: LOAD reg1,'NIL ;constant NIL in reg1
RETURN
Now we make use of our knowledge that functions, and convert CALL-FUNCTION of certain variables always denote certain a known function to a simple GOTO. (We have actually done things backwards here; in practice this knowledge is used before generating any code. We have fudged over this issue here, but will return to it later. Our purpose here is merely to demonstrate the treatment of function calls as GOTOs.)
main: <code for a> ;result in reg1 L
LOAD reg2,[name2] ;[name2] is the closure for nameZ
GOTO name1
name1: JUMP-IF-NIL reg1, name1a
RETURN ;return the value in reg2
name1a: GOTO name2
name2: <code for b> ;result in reg1
LOAD regZ,[name4] ;[name4] is the closure for name4
GOTO name3
name3: JUMP-IF-NIL reg1,name3a
RETURN ;return the value in reg1
name3a: GOTO name4
name4: <code for c> ;result in reg1
LOAD reg2,[name6] ;[name6] is the closure for name6
GOTO name5
name5: JUMP-IF-NIL reg1,name5a
RETURN ;return the value in reg1
name5a: GOTO name6
name6: LOAD reg1,'NIL ;constant NIL in reg1
RETURN
The construction [foo] indicates the creation of a closure for foo in the current environment. This will actually require additional instructions, but we shall ignore the mechanics of this for now since analysis will remove the need for the construction in this case. The fact that the only references to the variables Rl, RZ, and R3 are function calls can be detected and the unnecessary LOAD instructions eliminated. (Once again, this would actually be determined ahead of time, and no LOAD instructions would be generated in the first place. All of this is determined by a general pre-analysis, rather than a peephole post-pass.) Moreover, a GOTO to a tag which immediately follows the GOTO can be eliminated.
main: <code for a> ;result in regl
name1: JUMP-IF-NIL regl,namela
RETURN ;return the value in reg1
name1a:
name2: <code for b> ;result in regl
name3: JUMP-IF-NIL reg1,name3a
RETURN ;return the value in reg2
name3a:
name4: <code for c> ;result in reg1
name5: JUMP-IF-NIL reg1,name5a
RETURN ;return the value in reg1
name5a:
name6: LOAD reg1,'NIL ;constant NIL in reg1
RETURN
This code is in fact about what one would expect out of an ordinary LISP compiler. (There is admittedly room for a little more improvement.) RABBIT indeed produces code of essentially this form, by the method of analysis outlined here.
Similar considerations hold for the BLOCK macro. Consider the expression (BLOCK a b c); conceptually this should perform a, b, and c sequentially. Let us examine the code produced: '
((LAMBDA (A B) (B))
a
(LAMBDA () ((LAMBDA (A B) (B))
b
(LAMBDA () C))))
Renaming the variables and assigning names to LAMBDA-expressions:
((LANBDA name1 (A1 B1) (B1)) (LAMBDA name2 () ((LAMBDA name3 (A2 B2) (B2))
b (LAMBDA name4 () c))))
Producing code for the functions:
main: <code for a>
LOAD reg2,[name2]
CALL-FUNCTION 2,[name1]
name1: CALL-FUNCTION 0,reg2
name2: <code for b>
LOAD reg2,[name4]
CALL-FUNCTION 2.[name3]
name3: CALL-FUNCTION 0,reg2
name4: <code for c>
RETURN
Turning general function calls into direct GO's, on the basis of analysts of what variables must refer to constant functions:
main: <code for a>
LOAD reg2,[name2]
GOTO name1
name1: GOTO name2
name2: <code for b>
LOAD reg2,[name4]
GOTO name1
name3: GOTO name4
name4: <code for c>
RETURN
Eliminating useless GOTO and LOAD instructions:
main: <code for a>
name1:
name2: <code for b>
name3:
name4: <code for c>
RETURN
What more could one ask for?
Notice that this has fallen out of a general strategy involving only an approach to compiling function calls, and has involved no special knowledge of OR or BLOCK not encoded in the macro rules. The cases shown so far are actually special cases of a more general approach, special in that all the conceptual closures involved_are closed in the same environment, and called from places that have not disturbed that environment, but only used "registers". In the more general case, the environments of caller and called function will be different.
This divides into two subcases, corresponding to whether the closure was created by ax simple LAMBDA or by a LABELS construction. The latter involves circular references, and so is somewhat more complicated; but it is easy to show that in the former case the environment of the caller must be that of the (known) called function, possibly with additional values added on. This is a consequence of lexical scoping. As a result, the function call can be compiled as a GOTO preceded by an environment adjustment which consists merely of lopping off some leading portion of the current one (intuitively, one simply 'pops the unnecessary crud off the stack"). LABELS-closed functions also can be treated in this way, if one closes all the functions in the same way (which RABBIT presently does, but this is not always desirable). If one does, then it is easy to see the effect of expanding a PROG into a giant LABELS as outlined in [Imperative] and elsewhere: normally, a GOTO to a tag at the same level of PROG will involve no adjustment of environment, and so compile into a simple GOTO instruction, whereas a GOTO to a tag at aux outer level of PROG probably will involve adjusting the environment from that of the inner PROG to that of the outer. All of this falls out of the proper imperative treatment of function calls.