Rabbit: A Compiler for Scheme/Chapter 7

←

Rabbit: A Compiler for Scheme
by Guy L. Steele, Jr.

Chapter 7

→

1177438Rabbit: A Compiler for Scheme — Chapter 7Guy L. Steele, Jr.

7. The Imperative Treatment of Applicative Constructs

Given the characteristics of lexical scoping and tail-recursive invocations, it is possible to assign a peculiarly imperative interpretation to the applicative constructs of SCHEME, which consists primarily of treating a function call as a GOTO. More generally, a function call is a GOTO that can pass one or more items to its target; the special case of passing no arguments is precisely a GOTO. It is never necessary for a function call to save a return address of any kind. It is true that return addresses are generated, but we adopt one of two other points of view, depending on context. One is that the return address, plus any other data needed to carry on the computation after the called function has returned (such as previously computed intermediate values and other return addresses) are considered to be packaged up into an additional argument (the continuation) which is passed to the target. This lends itself to a nonfunctional interpretation of LAMBDA, and a method of expressing programs called the continuation-passing style (similar to the message-passing actors paradigm [Hewitt]), to be discussed further below. The other view, more intuitive in terms of the traditional stack implementation, is that the return address should be pushed before evaluating arguments rather than before calling a function. This view leads to a more uniform function-calling discipline, and is discussed in [Declarative] and [Steele].

We are led by these points of view to consider a compilation strategy in which function calling is to be considered very cheap (unlike the situation with PL/I and ALGOL, where programmers avoid procedure calls like the plague see [Steele] for a discussion of this). In this light the code produced by the sample macros above does not seem inefficient, or even particularly convoluted.

Consider the expansion of (OR a b c):

((LAMBDA (V R) (IF V V (R))) 
 a 
 (LAMBDA () ((LAMBDA (V R) (IF V V (R))) 
            b
            (LAMBDA () ((LAMBDA (V R) (IF V V (R))) 
                       c   
                       (LAMBDA () 'NIL-))))))

Then we might imagine the following (slightly contrived) compilation scenario.

First, for expository purposes, we shall rename the variables in order to be able to distinguish them.

((LAMBDA (V1 R1) (IF V1 V1 (R1))) 
 a
 (LAMBDA () ((LAMBDA (V2 R2) (IF V2 V2 (R2)))
             b
             (LAMBDA () ((LAMBDA (V3 R3) (IF V3 V3 (R3))) 
                        c 
                        (LAMBDA () 'NIL))))))

we shall assign a generated name to each LAMBDA-expression, which we shall notate by writing the name after the word LAMBDA. These names will be used as tags in the output code.

((LAMBDA namel (V1 R1) (IF V1 VI (R1))) 
 a (LAMBDA nameZ () ((LAMBDA name3 (V2 R2) (IF V2 V2 (R2)))
                     b 
                     (LAMBDA name4 () ((LAMBDA name5 (V3 R3) 
                                               (IF V3 V3 (R3))) 
                                 c
                                 (LAMBDA name6 () 'NIL))))))

Next, a simple analysis shows that the variables R1, R2, and R3 always denote the LAMBDA-expressions named name2, name4, and name6, respectively. Now an optimizer might simply have substituted these values into the bodies of name1, name3, and name5 using the rule of beta-conversion, but we shall not apply that technique here. Instead we shall compile the six functions in a straightforward manner. we make use of the additional fact that all six functions are closed in identical environments (we count two environments as identical if they involve the same variable bindings, regardless of the number of "frames" involved; that is, the environment is the same inside and outside a (LAMBDA () ...)). Assume a simple target machine with argument registers called reg1, reg2, etc.

main:   <code for a>            ;result in regl
        LOAD reg2,[name2]       ;[name2] is the closure for name2
        CALL-FUNCTION 2,[name1] ;call name1 with 2 arguments  

name1:  JUMP-IF-NIL reg1,name1a  
        RETURN                  ;return the value in reg1
name1a: CALL-FUNCTION 0,reg2    

name2:  <code for b>            ;result in reg1 
        LOAD reg2,[name4]       ;[name4] is the closure for name4 
        CALL-FUNCTION 2,[name3] ;call name3 with 2 arguments

name3:  JUMP-IF-NIL reg1,name3a 
        RETURN                  ;return the value in reg1 
name3a: CALL-FUNCTION 0,reg2    ;call function in reg2, 0 arguments 

name4:  <code for c>            ;result in reg1
        LOAD reg2,[name6]       ;[name6] is the closure for name6 
        CALL-FUNCTION 2,[name5] ;call name5 with 2 arguments 

name5:  JUMP-IF-NIL reg1,name5a 
        RETURN                  ;return the value in reg1
name5a: CALL-FUNCTION 0,reg2    ;call function in reg2, 0 arguments

name6:  LOAD reg1,'NIL          ;constant NIL in reg1
        RETURN

Now we make use of our knowledge that functions, and convert CALL-FUNCTION of certain variables always denote certain a known function to a simple GOTO. (We have actually done things backwards here; in practice this knowledge is used before generating any code. We have fudged over this issue here, but will return to it later. Our purpose here is merely to demonstrate the treatment of function calls as GOTOs.)

main:  <code for a>           ;result in reg1 L
       LOAD reg2,[name2]      ;[name2] is the closure for nameZ
       GOTO name1

name1:   JUMP-IF-NIL reg1, name1a 
         RETURN                   ;return the value in reg2 
name1a:  GOTO name2

name2:   <code for b>             ;result in reg1 
         LOAD regZ,[name4]        ;[name4] is the closure for name4 
         GOTO name3 

name3:  JUMP-IF-NIL reg1,name3a 
        RETURN                    ;return the value in reg1 
name3a: GOTO name4 

name4:  <code for c>              ;result in reg1 
        LOAD reg2,[name6]         ;[name6] is the closure for name6 
        GOTO name5 

name5:  JUMP-IF-NIL reg1,name5a 
        RETURN                    ;return the value in reg1 
name5a: GOTO name6 

name6: LOAD reg1,'NIL             ;constant NIL in reg1 
       RETURN

The construction [foo] indicates the creation of a closure for foo in the current environment. This will actually require additional instructions, but we shall ignore the mechanics of this for now since analysis will remove the need for the construction in this case. The fact that the only references to the variables Rl, RZ, and R3 are function calls can be detected and the unnecessary LOAD instructions eliminated. (Once again, this would actually be determined ahead of time, and no LOAD instructions would be generated in the first place. All of this is determined by a general pre-analysis, rather than a peephole post-pass.) Moreover, a GOTO to a tag which immediately follows the GOTO can be eliminated.

main:   <code for a>            ;result in regl 
name1:  JUMP-IF-NIL regl,namela 
        RETURN                  ;return the value in reg1 
name1a: 
name2:  <code for b>            ;result in regl 
name3:  JUMP-IF-NIL reg1,name3a 
        RETURN                  ;return the value in reg2 
name3a: 
name4:  <code for c>            ;result in reg1 
name5:  JUMP-IF-NIL reg1,name5a 
        RETURN                  ;return the value in reg1 
name5a: 
name6:  LOAD reg1,'NIL          ;constant NIL in reg1 
        RETURN

This code is in fact about what one would expect out of an ordinary LISP compiler. (There is admittedly room for a little more improvement.) RABBIT indeed produces code of essentially this form, by the method of analysis outlined here.

Similar considerations hold for the BLOCK macro. Consider the expression (BLOCK a b c); conceptually this should perform a, b, and c sequentially. Let us examine the code produced: '

((LAMBDA (A B) (B)) 
 a
 (LAMBDA () ((LAMBDA (A B) (B)) 
             b 
             (LAMBDA () C))))

Renaming the variables and assigning names to LAMBDA-expressions:

((LANBDA name1 (A1 B1) (B1)) (LAMBDA name2 () ((LAMBDA name3 (A2 B2) (B2)) 
                                               b (LAMBDA name4 () c))))

Producing code for the functions:

main:   <code for a> 
        LOAD reg2,[name2]
        CALL-FUNCTION 2,[name1] 
name1:  CALL-FUNCTION 0,reg2 
name2:  <code for b>
        LOAD reg2,[name4]
        CALL-FUNCTION 2.[name3]
name3:  CALL-FUNCTION 0,reg2
name4:  <code for c> 
        RETURN

Turning general function calls into direct GO's, on the basis of analysts of what variables must refer to constant functions:

main:   <code for a> 
        LOAD reg2,[name2] 
        GOTO name1 

name1:  GOTO name2 

name2:  <code for b> 
        LOAD reg2,[name4] 
        GOTO name1 

name3:  GOTO name4 

name4:  <code for c> 
        RETURN

Eliminating useless GOTO and LOAD instructions:

main:  <code for a> 
name1:  
name2: <code for b> 
name3: 
name4: <code for c> 
       RETURN

What more could one ask for?

Notice that this has fallen out of a general strategy involving only an approach to compiling function calls, and has involved no special knowledge of OR or BLOCK not encoded in the macro rules. The cases shown so far are actually special cases of a more general approach, special in that all the conceptual closures involved_are closed in the same environment, and called from places that have not disturbed that environment, but only used "registers". In the more general case, the environments of caller and called function will be different.

This divides into two subcases, corresponding to whether the closure was created by ax simple LAMBDA or by a LABELS construction. The latter involves circular references, and so is somewhat more complicated; but it is easy to show that in the former case the environment of the caller must be that of the (known) called function, possibly with additional values added on. This is a consequence of lexical scoping. As a result, the function call can be compiled as a GOTO preceded by an environment adjustment which consists merely of lopping off some leading portion of the current one (intuitively, one simply 'pops the unnecessary crud off the stack"). LABELS-closed functions also can be treated in this way, if one closes all the functions in the same way (which RABBIT presently does, but this is not always desirable). If one does, then it is easy to see the effect of expanding a PROG into a giant LABELS as outlined in [Imperative] and elsewhere: normally, a GOTO to a tag at the same level of PROG will involve no adjustment of environment, and so compile into a simple GOTO instruction, whereas a GOTO to a tag at aux outer level of PROG probably will involve adjusting the environment from that of the inner PROG to that of the outer. All of this falls out of the proper imperative treatment of function calls.