Tail Call

The Lua language supports tail call elimination. This section describes and supports tail calls.

First we introduce the concept of tail calls. A tail call is formed when the last action of a function is to call another function without doing any other work. For example, the following sample code:

function foo(a, b)
     return bar(a + b)
end

The last action of the foo() function (and in this case the only action) is to call the bar() function. Let's take a look at the execution process of the foo() function without introducing a tail call, as shown in the following figure:

|       |        |       |        |       |        |       |
+-------+        +-------+        +-------+        +-------+
| foo() |        | foo() |        | foo() |      / |  ret1 |
+-------<<       +-------+        +-------<<   /-+ +-------+
|   a   |        |   a   |        |   a   |    | \ |  ret2 |
+-------+        +-------+        +-------+    |   +-------+
|   b   |        |   b   |        |   b   |    |   |       |
+-------+        +-------+        +-------+    |
| bar() |        | bar() |      / |  ret1 | \  |
+-------+        +-------<<   /-+ +-------+  >-/return values
|  a+b  |        |  a+b  |    | \ |  ret2 | /
+-------+        +-------+    |   +-------+
|       |        :       :    |   |       |
                 +-------+    |
                 |  ret1 | \  |
                 +-------+  >-/return values
                 |  ret2 | /
                 +-------+
                 |       |
  • The first picture on the far left is the stack layout before calling the bar() function inside the foo() function. That is, before calling Call(bar) bytecode.

  • The second figure is the stack layout immediately after the bar() function call has completed. That is, after the Return bytecode of the bar() function is executed, but before returning to the Call(bar) bytecode of the foo() function. Suppose this function has two return values ret1 and ret2, which are currently on the top of the stack.

  • The third figure is the stack layout after the bar() function returns. That is, the Call(bar) bytecode of foo() is executed. That is, move the two return values to the entry function position of bar().

  • The fourth figure is the stack layout after the foo() function returns. That is, after the Call(foo) bytecode of the outer caller is executed. That is, move the two return values to the entry function position of foo().

The next three graphs are executed consecutively. Observe the optimization space in it:

  • An obvious optimization idea is that the copying of the last two return values can be completed in one step. But this is difficult to optimize, and it doesn't optimize much performance;

  • Another not-so-obvious point is that the stack space of the foo() function is no longer used after the bar() function in the first leftmost figure is ready to be called. Therefore, we can clean up the stack space occupied by the foo() function before calling the bar() function. According to this idea, the following redraws the calling process:

|       |        |       |        |       |        |       |
+-------+        +-------+        +-------+        +-------+
| foo() |      / | bar() |        | bar() |      / |  ret1 |
+-------<<   /-+ +-------<<       +-------<<   /-+ +-------+
|   a   |    | \ |  a+b  |        |  a+b  |    | \ |  ret2 |
+-------+    |   +-------+        +-------+    |   +-------+
|   b   |    |   |       |        :       :    |   |       |
+-------+    |                    +-------+    |
| bar() | \  |                    |  ret1 | \  |
+-------+  >-/                    +-------+  >-/
|  a+b  | /                       |  ret2 | /
+-------+                         +-------+
|       |                         |       |
  • The first picture on the left remains unchanged, and it is still the state before the bar() function call;

  • In the second picture, before calling bar(), the stack space of the foo() function is cleared;

  • The third picture, corresponding to the second picture above, is after calling bar() function.

  • The fourth picture corresponds to the last picture above. Since the stack space of the foo() function has been cleaned up just now, the third figure above is skipped.

Compared with the above ordinary process, although the operation steps of this new process have been changed, they have not been reduced, so the performance is not optimized. However, there are optimizations in the use of stack space! The stack space of foo() has been freed before the bar() function is executed. 2 layers of function calls, but only takes up 1 layer of space. The advantage brought by this is not obvious in this example, but it is obvious in recursive calls, because there are usually many layers of recursive calls. If the last item of the recursive call satisfies the above tail call, then after applying the new process, it can support unlimited recursive calls without causing stack overflow! The stack overflow here refers to the stack of the Lua virtual machine drawn in the above figure, not the stack overflow of the Rust program.

Compared with the normal process above, this new process has a small difference. The << on the stack in each figure above represents the current self.base position. It can be seen that in the above ordinary process, self.base has changed; but in the new process, the whole process has not changed.

After introducing the concept of tail call, the specific implementation is introduced below.

Syntax Analysis

Before starting the syntax analysis, clarify the rules of the next tail call again: when the last action of a function is to call another function without doing other work, it forms a tail call. Here are some counterexamples from the book "Lua Programming":

function f1(x)
     g(x) -- discard the return value of g(x) before f1() returns
end
function f2(x)
     return g(x) + 1 -- also execute +1
end
function f3(x)
     return x or g(x) -- also limit the return value of g(x) to 1
end
function f4(x)
     return (g(x)) -- also limit the return value of g(x) to 1
end

In the Lua language, only calls of the form return func(args) are tail calls. Of course, func and args here can be very complicated, such as return t.k(a+b.f()) is also a tail call.

After the rules are clarified, it is relatively simple to judge tail calls during syntax analysis. When parsing the return statement, add a judgment on the tail call:

     let iret = self.sp;
     let (nexp, last_exp) = self.explist();

     if let (0, &ExpDesc::Local(i)) = (nexp, &last_exp) {
         // There is only 1 return value and it is a local variable
         ByteCode::Return(i as u8, 1)

     } else if let (0, &ExpDesc::Call(func, narg_plus)) = (nexp, &last_exp) {
         // New tail call: only one return value, and it is a function call
         ByteCode::TailCall(func as u8, narg_plus as u8)

     } else if self. discharge_expand(last_exp) {
         // The last return value is a variable type, such as variable arguements or function calls,
         // then the number of return values cannot be known during the syntax analysis phase
         ByteCode::Return(iret as u8, 0)

     } else {
         // The last return value is fixed
         ByteCode::Return(iret as u8, nexp as u8 + 1)
     }

There are 4 cases in the above code. The second case is a newly added tail call, and the other three cases are already supported in the previous sections of this chapter, so they will not be introduced here.

The newly added bytecode TailCall is similar to the function call bytecode Call, but since the return value of the tail call must be a function call, the number of return values must be unknown, so the third associated parameter is omitted. So far, there are three bytecodes related to function calls:

pub enum ByteCode {
     Call(u8, u8, u8),
     CallSet(u8, u8, u8),
     TailCall(u8, u8), // add tail call

Virtual Machine Execution

Next, look at the virtual machine execution part of the tail call. From the tail call process introduced at the beginning of this section, it can be concluded that compared with ordinary function calls, there are three differences in the execution of tail calls:

  • Before calling the inner function, the stack space of the outer function should be cleared in advance, which is also the meaning of tail call;
  • After the inner function returns, since the outer function has been cleaned up, there is no need to return to the outer function, but directly return to the outer calling function.
  • There is no need to adjust self.base throughout.

Thus, the execution flow of TailCall bytecode can be realized as follows:

     ByteCode::TailCall(func, narg_plus) => {
         self.stack.drain(self.base-1 .. self.base+func as usize);
         return self. do_call_function(narg_plus);
     }

Very simple, just two lines of code:

Line 1, through self.stack.drain() to clean up the stack space of the outer function.

Line 2 returns directly from the current execute() through the return statement, that is to say, after the inner function is executed, it does not need to return to the current function, but directly returns to the outer caller. In addition, according to the rules of tail calls listed above, this line of Rust code itself is also a tail call. So as long as the Rust language also supports tail call elimination, then our Lua interpreter will not increase its own stack during execution.

In addition, the newly added do_call_function() method in line 2 executes the function call, which is extracted from the call_function() method called by the Call and CallSet bytecodes in the previous section, except that the update to self.base is removed. And the call_function() method is modified to wrap this new method:

     fn call_function(&mut self, func: u8, narg_plus: u8) -> usize {
         self.base += func as usize + 1; // get into new world
         let nret = self. do_call_function(narg_plus);
         self.base -= func as usize + 1; // come back
         nret
     }

Test

So far, we have completed the tail call. Verify with the following Lua code:

function f(n)
     if n > 10000 then return n end
     return f(n+1)
end
print(f(0))

But I get a stack overflow error when executing:

$ cargo r --test_lua/tailcall.lua

thread 'main' has overflowed its stack
fatal runtime error: stack overflow
[1] 85084 abort cargo r -- test_lua/tailcall.lua

At first I thought that the debug version of Rust did not perform tail call optimization, but after adding --release, it can only support a greater recursion depth, which delays the stack overflow, but eventually the stack overflow will still occur. This goes back to what I just said: "So as long as the Rust language also supports tail call elimination, then...", the assumption in front of this sentence may not be true, that is, the Rust language may not support tail call elimination. Here is an article that introduces the discussion of tail calls in the Rust language. The conclusion is probably due to the implementation too complicated (may involve resource drop), and the benefits are limited (programmers can manually change recursion to loop if necessary), so in the end Rust language does not support tail call elimination. In this way, in order to make the tail call elimination of Lua completed in this section meaningful, we can only change the recursive call to the execute() function into a loop. This change itself is not difficult, but there are still two places to modify the function call process in the future, one is the calling method of the entry function of the entire program, and the other is to support the state preservation of the function in the coroutine. So we will make this change after completing the final function call process.