Tail Call
The Lua language supports tail call elimination. This section describes and supports tail calls.
First we introduce the concept of tail calls. A tail call is formed when the last action of a function is to call another function without doing any other work. For example, the following sample code:
function foo(a, b)
return bar(a + b)
end
The last action of the foo()
function (and in this case the only action) is to call the bar()
function. Let's take a look at the execution process of the foo()
function without introducing a tail call, as shown in the following figure:
| | | | | | | |
+-------+ +-------+ +-------+ +-------+
| foo() | | foo() | | foo() | / | ret1 |
+-------<< +-------+ +-------<< /-+ +-------+
| a | | a | | a | | \ | ret2 |
+-------+ +-------+ +-------+ | +-------+
| b | | b | | b | | | |
+-------+ +-------+ +-------+ |
| bar() | | bar() | / | ret1 | \ |
+-------+ +-------<< /-+ +-------+ >-/return values
| a+b | | a+b | | \ | ret2 | /
+-------+ +-------+ | +-------+
| | : : | | |
+-------+ |
| ret1 | \ |
+-------+ >-/return values
| ret2 | /
+-------+
| |
-
The first picture on the far left is the stack layout before calling the
bar()
function inside thefoo()
function. That is, before callingCall(bar)
bytecode. -
The second figure is the stack layout immediately after the
bar()
function call has completed. That is, after theReturn
bytecode of thebar()
function is executed, but before returning to theCall(bar)
bytecode of thefoo()
function. Suppose this function has two return valuesret1
andret2
, which are currently on the top of the stack. -
The third figure is the stack layout after the
bar()
function returns. That is, theCall(bar)
bytecode offoo()
is executed. That is, move the two return values to the entry function position ofbar()
. -
The fourth figure is the stack layout after the
foo()
function returns. That is, after theCall(foo)
bytecode of the outer caller is executed. That is, move the two return values to the entry function position offoo()
.
The next three graphs are executed consecutively. Observe the optimization space in it:
-
An obvious optimization idea is that the copying of the last two return values can be completed in one step. But this is difficult to optimize, and it doesn't optimize much performance;
-
Another not-so-obvious point is that the stack space of the
foo()
function is no longer used after thebar()
function in the first leftmost figure is ready to be called. Therefore, we can clean up the stack space occupied by thefoo()
function before calling thebar()
function. According to this idea, the following redraws the calling process:
| | | | | | | |
+-------+ +-------+ +-------+ +-------+
| foo() | / | bar() | | bar() | / | ret1 |
+-------<< /-+ +-------<< +-------<< /-+ +-------+
| a | | \ | a+b | | a+b | | \ | ret2 |
+-------+ | +-------+ +-------+ | +-------+
| b | | | | : : | | |
+-------+ | +-------+ |
| bar() | \ | | ret1 | \ |
+-------+ >-/ +-------+ >-/
| a+b | / | ret2 | /
+-------+ +-------+
| | | |
-
The first picture on the left remains unchanged, and it is still the state before the
bar()
function call; -
In the second picture, before calling
bar()
, the stack space of thefoo()
function is cleared; -
The third picture, corresponding to the second picture above, is after calling
bar()
function. -
The fourth picture corresponds to the last picture above. Since the stack space of the
foo()
function has been cleaned up just now, the third figure above is skipped.
Compared with the above ordinary process, although the operation steps of this new process have been changed, they have not been reduced, so the performance is not optimized. However, there are optimizations in the use of stack space! The stack space of foo()
has been freed before the bar()
function is executed. 2 layers of function calls, but only takes up 1 layer of space. The advantage brought by this is not obvious in this example, but it is obvious in recursive calls, because there are usually many layers of recursive calls. If the last item of the recursive call satisfies the above tail call, then after applying the new process, it can support unlimited recursive calls without causing stack overflow! The stack overflow here refers to the stack of the Lua virtual machine drawn in the above figure, not the stack overflow of the Rust program.
Compared with the normal process above, this new process has a small difference. The <<
on the stack in each figure above represents the current self.base
position. It can be seen that in the above ordinary process, self.base
has changed; but in the new process, the whole process has not changed.
After introducing the concept of tail call, the specific implementation is introduced below.
Syntax Analysis
Before starting the syntax analysis, clarify the rules of the next tail call again: when the last action of a function is to call another function without doing other work, it forms a tail call. Here are some counterexamples from the book "Lua Programming":
function f1(x)
g(x) -- discard the return value of g(x) before f1() returns
end
function f2(x)
return g(x) + 1 -- also execute +1
end
function f3(x)
return x or g(x) -- also limit the return value of g(x) to 1
end
function f4(x)
return (g(x)) -- also limit the return value of g(x) to 1
end
In the Lua language, only calls of the form return func(args)
are tail calls. Of course, func
and args
here can be very complicated, such as return t.k(a+b.f())
is also a tail call.
After the rules are clarified, it is relatively simple to judge tail calls during syntax analysis. When parsing the return statement, add a judgment on the tail call:
let iret = self.sp;
let (nexp, last_exp) = self.explist();
if let (0, &ExpDesc::Local(i)) = (nexp, &last_exp) {
// There is only 1 return value and it is a local variable
ByteCode::Return(i as u8, 1)
} else if let (0, &ExpDesc::Call(func, narg_plus)) = (nexp, &last_exp) {
// New tail call: only one return value, and it is a function call
ByteCode::TailCall(func as u8, narg_plus as u8)
} else if self. discharge_expand(last_exp) {
// The last return value is a variable type, such as variable arguements or function calls,
// then the number of return values cannot be known during the syntax analysis phase
ByteCode::Return(iret as u8, 0)
} else {
// The last return value is fixed
ByteCode::Return(iret as u8, nexp as u8 + 1)
}
There are 4 cases in the above code. The second case is a newly added tail call, and the other three cases are already supported in the previous sections of this chapter, so they will not be introduced here.
The newly added bytecode TailCall
is similar to the function call bytecode Call
, but since the return value of the tail call must be a function call, the number of return values must be unknown, so the third associated parameter is omitted. So far, there are three bytecodes related to function calls:
pub enum ByteCode {
Call(u8, u8, u8),
CallSet(u8, u8, u8),
TailCall(u8, u8), // add tail call
Virtual Machine Execution
Next, look at the virtual machine execution part of the tail call. From the tail call process introduced at the beginning of this section, it can be concluded that compared with ordinary function calls, there are three differences in the execution of tail calls:
- Before calling the inner function, the stack space of the outer function should be cleared in advance, which is also the meaning of tail call;
- After the inner function returns, since the outer function has been cleaned up, there is no need to return to the outer function, but directly return to the outer calling function.
- There is no need to adjust
self.base
throughout.
Thus, the execution flow of TailCall
bytecode can be realized as follows:
ByteCode::TailCall(func, narg_plus) => {
self.stack.drain(self.base-1 .. self.base+func as usize);
return self. do_call_function(narg_plus);
}
Very simple, just two lines of code:
Line 1, through self.stack.drain()
to clean up the stack space of the outer function.
Line 2 returns directly from the current execute()
through the return
statement, that is to say, after the inner function is executed, it does not need to return to the current function, but directly returns to the outer caller. In addition, according to the rules of tail calls listed above, this line of Rust code itself is also a tail call. So as long as the Rust language also supports tail call elimination, then our Lua interpreter will not increase its own stack during execution.
In addition, the newly added do_call_function()
method in line 2 executes the function call, which is extracted from the call_function()
method called by the Call
and CallSet
bytecodes in the previous section, except that the update to self.base
is removed. And the call_function()
method is modified to wrap this new method:
fn call_function(&mut self, func: u8, narg_plus: u8) -> usize {
self.base += func as usize + 1; // get into new world
let nret = self. do_call_function(narg_plus);
self.base -= func as usize + 1; // come back
nret
}
Test
So far, we have completed the tail call. Verify with the following Lua code:
function f(n)
if n > 10000 then return n end
return f(n+1)
end
print(f(0))
But I get a stack overflow error when executing:
$ cargo r --test_lua/tailcall.lua
thread 'main' has overflowed its stack
fatal runtime error: stack overflow
[1] 85084 abort cargo r -- test_lua/tailcall.lua
At first I thought that the debug version of Rust did not perform tail call optimization, but after adding --release
, it can only support a greater recursion depth, which delays the stack overflow, but eventually the stack overflow will still occur. This goes back to what I just said: "So as long as the Rust language also supports tail call elimination, then...", the assumption in front of this sentence may not be true, that is, the Rust language may not support tail call elimination. Here is an article that introduces the discussion of tail calls in the Rust language. The conclusion is probably due to the implementation too complicated (may involve resource drop), and the benefits are limited (programmers can manually change recursion to loop if necessary), so in the end Rust language does not support tail call elimination. In this way, in order to make the tail call elimination of Lua completed in this section meaningful, we can only change the recursive call to the execute()
function into a loop. This change itself is not difficult, but there are still two places to modify the function call process in the future, one is the calling method of the entry function of the entire program, and the other is to support the state preservation of the function in the coroutine. So we will make this change after completing the final function call process.