Escape from Block and goto

The previous section covered upvalue escapes from functions. But in fact, the scope of local variables is the block, so whenever the block ends, upvalue escape may occur. And a function can also be regarded as a kind of block, so the escape from a function introduced in the previous section can be regarded as a special case of escape from a block.

In addition, there is another escape scenario, that is, the goto statement jumps backwards and skips the definition of local variables, and the local variables will also become invalid at this time.

In the previous section, there was too much content, so in order not to add extra details, these two escape scenes are introduced separately in this section.

Escape from block

First look at a sample code that escapes from the block:

do
     local i = 0
     c1 = function()
         i = i + 1 -- upvalue
         print(i)
     end
end -- the end of the block, the local variable `i` becomes invalid

In this example, the anonymous function defined in do .. end block refers to the local variable i defined in it as upvalue. When the block ends, the local variable i will be invalid, but because it is still referenced by the anonymous function, it needs to escape.

Although a function can be regarded as a special case of a block, a special case is a special case after all, and the more general escape from a block is still very different. When the function ends in the previous section, close all upvalues in relevant bytecodes such as Return/Return0/TailCall, because each function will have one of these bytecodes at the end. However, there is no similar fixed bytecode at the end of the block, so a new bytecode Close is added for this purpose. This bytecode closes the local variable referenced by upvalue in the current block.

The easiest way is to generate a Close bytecode at the end of each block, but since the escape from the block is very rare, it's not worth to add a bytecode to all blocks. Therefore, it is necessary to judge whether there is an escape in this block during the syntax analysis stage. If not, there is no need to generate Close bytecode.

The next step is how to judge whether there are local variables escaping in a block. There are several possible implementations. For example, refer to the multi-level function nesting method in the previous section, and also maintain a block nesting relationship. However, there is a lighter approach, which is to add a flag bit to each local variable, and set this flag bit if it is referenced by upvalue. Then at the end of the block, judge whether the local variables defined in this block have been marked, and then you can know whether you need to generate Close bytecode.

The specific definition and execution flow of Close bytecode is omitted here.

Escape from goto

I really can't think of an example of a reasonable escape from goto. But it is still possible to construct an unreasonable example:

::again::
     if c1 then -- false when the first execution reaches here
         c1() -- after assigning a value to c1 below, c1 is a closure that includes an upvalue
     end

     local i = 0
     c1 = function()
         i = i + 1 -- upvalue
         print(i)
     end
     go to again

In the above code, if is judged to be false at the first execution, the call to c1 is skipped; after assigning a value to c1 below, c1 is a closure that includes an upvalue; then goto jumps back to the beginning, and at this time we can call c1; but at this time the local variable i is also invalid, so it needs to be closed.

In the above code, from the definition of again label at the beginning to the last goto statement can also be regarded as a block, then the method of escaping from the block just introduced can be used to process the goto statement. But the goto statement has a special place. We introduced goto statement before, there are two ways to match label and goto statement:

  • Match while parsing. That is, when the label is parsed, the goto statement that has already appeared is matched; when the goto statement is parsed, the label that has already appeared is matched;
  • After the block ends (that is, when the label defined in it becomes invalid), match the existing label and goto statement at one time.

The implementation difficulty of these two methods is similar, but due to another feature of the goto statement, that is, the goto statement that jumps forward needs to ignore the void statement. In order to process the void statement more conveniently, the second solution above was adopted. However, now to support escapes, when a goto statement is parsed (precisely before the generated Jump bytecode), may generate a Close bytecode. Whether it will be generated or not depends on whether the definition of escaped local variables is skipped when goto jumps backwards. That is, only by matching the label and goto statement can we know whether the Close bytecode is required. If we still follow the second scheme to do the matching after the end of the block, even if you we that Close needs to be generated at the end of the block, it can no longer be inserted into the bytecode sequence. Therefore, it can only be changed to the first solution of matching while parsing, and judge whether it is necessary to generate Close bytecode in time during matching.