Upvalue Escape and Closure

The previous section introduced the concept of upvalue, and took the most basic usage of upvalue as an example to introduce the modification of syntax analysis to support upvalue. This section introduces the complete features of upvalue, mainly the escape of upvalue.

The following refers to the sample code in the book "Lua Programming":

local function newCounter()
     local i = 0
     return function ()
         i = i + 1 -- upvalue
         print(i)
     end
end

local c1 = newCounter()
c1() -- output: 1
c1() -- output: 2

newCounter() in the above code is a typical factory function, which creates and returns an anonymous function. What needs to be explained here is that the returned anonymous function refers to the local variable i in newCounter(), which is upvalue. The second half of the code calls the newCounter() function and assigns the returned anonymous function to c1 and calls it. At this time, the newCounter() function has ended, and it seems that the local variable i defined in it has also exceeded the scope. At this time, calling c1 to refer to the local variable i will cause problems (if you are C Language programmers should understand this). However, in Lua, the closure mechanism ensures that calling c1 here is no problem. That is, the escape of upvalue.

The book "Lua Programming" is for Lua programmers, and it is enough to introduce the concept of upvalue escape. But our purpose is to implement an interpreter (not just use an interpreter), so we must not only know that this is no problem, but also know how to do it, that is, how to realize the escape of upvalue.

Unfeasible Static Storage Solution

The easiest way is to refer to the static variable inside the function in C language. For the local variable referenced by upvalue (such as i in the newCounter() function here), it is not placed on the stack, but placed in a static area. But this solution is not feasible, because the static variable in C language is globally unique, and the upvalue in Lua will generate a new copy every time it is called. For example, following the above code, continue with the following code:

local c2 = newCounter()
c2() -- output: `1`. A new count starts.
c1() -- output: `3`. Continue with the output of c1 above.

Calling newCounter() again will generate a new counter, in which the local variable i will be re-initialized to 0 and start counting again. At this point, there are two counters: c1 and c2, each of which has an independent local variable i. So when c2() is called, it will start counting again from 1; if interspersed with c1() before calling, it will continue the previous counting. How interesting!

   --+---+--              +---+   +---+
     | i |                | i |   | i |
   --+-^-+--              +-^-+   +-^-+
       |                    |       |
   /---+---\                |       |
   |       |                |       |
+----+   +----+          +----+   +----+
| c1 |   | c2 |          | c1 |   | c2 |
+----+   +----+          +----+   +----+

The left figure above shows that i is placed in the globally unique static storage, then all counter functions point to the unique i. This does not meet our needs. What we need is a separate i for each counter function as shown on the right.

Storage Scheme on the Heap

Since it cannot be placed on the stack, nor can it be placed in the global static area, it can only be placed on the heap. The next question is, when to put it on the heap? There are several possible scenarios:

  1. When entering the function, put all local variables referenced by upvalue on the heap;
  2. When a local variable is referenced by upvalue, it is moved from the stack to the heap;
  3. When the function exits, move all local variables referenced by upvalue to the heap;

The first solution does not work, because a local variable may have been used as a local variable before it is referenced by upvalue, and related bytecodes have been generated. The second solution should be feasible, but after the local variable is referenced by upvalue, it may be used as a local variable in the current function. It is not necessary to move it to the heap in advance. After all, access to the stack is faster and more convenient. So we choose option 3.

This operation of moving local variables from the stack to the heap, we follow the code implemented by Lua official, is also called "close".

Next, in order to demonstrate that an upvalue is accessed before and after escaping, we modify the sample code based on the counter sample above. The inner function is called once inside the newCounter() function before returned. To do this, we assign this anonymous function to a local variable retf:

local function newCounter()
     local i = 0
     local function retf()
         i = i + 1 -- upvalue
         print(i)
     end
     retf() -- called inside newCounter()
     return retf -- return retf
end

This example is introduced in two parts. First, retf() is called inside the factory function, and then the upvalue escape caused by the factory function returning retf.

First of all, when retf() is called inside the factory function, i to be operated by retf is still on the stack. The schematic diagram is as follows.

     |          |
     +----------+
base |newCounter|
     +----------+
   0 |    i     |<- - - - - - - - - - \
     +----------+                     |
   1 |   retf   +--+->+-FuncProto-----+--+
     +----------+  |  |byte_codes:    |  |
   2 |   retf   +--/  | GetUpvalue(0, 0) |
     +----------+     | ...              |
     |          |     +------------------+

In the figure, the stack is on the left, where newCounter is the entry of the function call and the base position of the current function. The i and the first retf are local variables, and the second retf is the entry on the stack of the function call. The two retfs point to same function prototype. In the bytecode sequence in the retf function prototype, the first bytecode Getupvalue is to load upvalue i onto the stack to perform addition. This bytecode has two associated parameters. The first is the target address loaded onto the stack, which is ignored here; the second is the source address of upvalue, refer to the syntax analysis of upvalue in the previous section, the meaning of this parameter is: the stack of local variables of the upper function index. In this example, it is the index of i in the newCounter() function, which is 0. So far, it is still the content of the previous section, and escape has not been involved.

Now consider the escape of upvalue. After the newCounter() function exits, the three spaces on the left stack will be destroyed, and i will no longer exist. In order for the retf function to continue to access i, before the newCounter() function exits, it is necessary to close the local variable i and move i from the stack to the heap. The schematic diagram is as follows:

     |          |
     +----------+
base |newCounter|         +===+
     +----------+  close  | i |<- - - \
   0 |    i     +-------->+===+       ?
     +----------+                     ?
   1 |   retf   +---->+-FuncProto-----?--+
     +----------+     |byte_codes:    ?  |
     |          |     | GetUpvalue(0, 0) |
                      | ...              |
                      +------------------+

Although this ensures that i can continue to be accessed, there is a very obvious problem: the second parameter associated with the bytecode Getupvalue cannot locate i on the heap (the continuous ? in the figure Wire). This is also mentioned in the previous section, it is not feasible to directly use the index of the local variable on the stack to represent the upvalue scheme. Improvements need to be made on the basis of this scheme.

Improvement: Upvalue Intermediary

In order to still be able to be accessed by upvalue after closing the local variable, we need an upvalue intermediary. At the beginning, the index on the stack is used to represent the upvalue, and when the local variable of the outer function is closed, it is moved to this intermediary.

The following two figures show the situation after adding the upvalue intermediary.

     |          |             - - - - - - \
     +----------+            |            |
base |newCounter|            |      *-----+-+---
     +----------+            |      |Open(0)|
   0 |    i     |<- - - - - -       *-^-----+---
     +----------+                     |
   1 |   retf   +--+->+-FuncProto-----+--+
     +----------+  |  |byte_codes:    |  |
   2 |   retf   +--/  | GetUpvalue(0, 0) |
     +----------+     | ...              |
     |          |     +------------------+

The figure above is a schematic diagram of calling the retf() function inside the newCounter() function. Compared with the previous version, the upvalue intermediary list (the list with * as the corner in the figure) is added, and there is only one member: Open(0), which means that this local variable has not been closed and is on the stack The relative index is 0. In the function prototype of retf, although the second parameter associated with the bytecode Getupvalue has not changed, its meaning has changed, and it has become the index of the intermediary list. It just happens to be 0 in this example.

     |          |          /----------------\
     +----------+          |                |
base |newCounter|          |        *-------V-+---
     +----------+  close   |        |Closed(i)|
   0 |    i     +----------/        *-^-------+---
     +----------+                     |
   1 |   retf   +---->+-FuncProto-----+--+
     +----------+     |byte_codes:    |  |
     |          |     | GetUpvalue(0, 0) |
                      | ...              |
                      +------------------+

The figure above is a schematic diagram after the local variable i is closed before the newCounter() function returns. The members of the upvalue intermediary list added in the figure above become Closed(i), that is, the local variable i is moved to this intermediary list. In this way, Getupvalue can still locate the 0th upvalue intermediary and access the closed i.

Improvement: Shared Upvalue

The above scheme can support the current simple escape scenario, but it does not support the scenario where multiple closures share the same local variable. For example, the following sample code:

local function foo()
     local i, ip, ic = 0, 0, 0
     local function producer()
         i = i + 1
         ip = ip + 1
     end
     local function consumer()
         i = i - 1
         ic = ic + 1
     end
     return produce, consume
end

The two internal functions returned by the above foo() function both refer to the local variable i, and it is obvious that the two functions share i and operate on the same i instead of being independent i. Then when the foo() function finishes closing i, two functions are needed to share the closed i. Since these two functions have different upvalue lists, namely i, ip and i, ic, the two functions do not need to share the same upvalue list. Then it can only be shared separately for each upvalue.

The following figure shows the scheme of sharing each upvalue separately:

     |          |
     +----------+     +=======+
base |    foo   |     |Open(0)|<===============+------------\
     +----------+     +=======+                |            |
   0 |    i     |<- -/  +=======+              |            |
     +----------+       |Open(1)|<-------------|---\        |
   1 |    ip    |<- - - +=======+              |   |        |
     +----------+         +=======+            |   |        |
   2 |    ic    |<- - - - |Open(2)|<-----------|---|--------|---\
     +----------+         +=======+          *-+-+-+-+--    |   |
   3 | producer +---->+-FuncProto--------+   | i |ip |      |   |
     +----------+     |byte_codes:       |   *-^-+-^-+--    |   |
   4 | consumer +--\  | GetUpvalue(0, 0)-+----/    |        |   |
     +----------+  |  | ...              |         |        |   |
     |          |  |  | GetUpvalue(0, 1)-+---------/        |   |
                   |  | ...              |                  |   |
                   |  +------------------+                  |   |
                   |                                      *-+-+-+-+--
                   \-------------->+-FuncProto-------+    | i |ic |
                                  |byte_codes:       |    *-^-+-^-+--
                                  | GetUpvalue(0, 0)-+-----/    |
                                  | ...              |          |
                                  | GetUpvalue(0, 1)-+----------/
                                  | ...              |
                                  +------------------+

The picture above is slightly more complicated, but most of it is the same as the previous scheme. The leftmost is still the stack. Then see that the content pointed to by the producer() function is still the function prototype and the corresponding upvalue list. Since this function uses two upvalues, two bytecodes are listed. Then there is a difference: in the upvalue list, it is not directly the upvalue, but the address of the upvalue. The real upvalue is allocated on the heap alone, which is Open(0), Open(1) and Open(2) in the figure. These 3 upvalues can access local variables on the stack through indexes. The last consumer() function is similar, the difference is that different upvalues are referenced.

When the foo() function ends and all local variables referenced by upvalue are closed, Open(0), Open(1) and Open(2) in the above figure are replaced by Closed(i), Closed(ip) and Closed(ic). At this time, the i in the upvalue lists corresponding to producer() and consumer() functions point to the same Closed(i). In this way, after the outer foo() function exits, these two functions can still access the same i. Only 3 upvalues are replaced, the changes are relatively small, and the closed picture is omitted here.

Definition of Closure

Before continuing to introduce more upvalue usage scenarios, we first introduce the concept of closure based on the above scheme.

According to the above scheme, the returned retf is not only a function prototype, but also includes the corresponding upvalue list. And the function prototype plus upvalue is closure! Add Lua closure type in Value:

pub enum upvalue { // upvalue intermediary in the above figure
     Open(usize),
     Closed(Value),
}
pub struct LuaClosure {
     proto: Rc<FuncProto>,
     upvalues: Vec<Rc<RefCell<upvalue>>>,
}
pub enum Value {
     LuaFunction(Rc<FuncProto>), // Lua function
     LuaClosure(Rc<LuaClosure>), // Lua closure

In this way, although the different closures returned by multiple calls to the newCounter() function share the same function prototype, each has an independent upvalue. This is also the reason why the two counters c1 and c2 at the beginning of this section can count independently.

The following figure shows a schematic diagram of two counters:

             +-LuaClosure--+
|      |     |       proto-+----------------------+-->+-FuncProto--------+
+------+     |    upvalues-+--->+---+--           |   |byte_codes:       |
|  c1  +---->+-------------+    | i |             |   | GetUpvalue(0, 0) |
+------+                        +-+-+--           |   | ...              |
|  c2  +-\                        |               |   +------------------+
+------+ |                        V=========+     |
|      | |                        |Closed(i)|     |
         |                        +=========+     |
         \-->+-LuaClosure--+                      |
             |       proto-+----------------------/
             |    upvalues-+--->+---+--
             +-------------+    | i |
                                +-+-+--
                                  |
                                  V=========+
                                  |Closed(i)|
                                  +=========+

Similarly, we also modify the schematic diagram in the shared upvalue example above. For clarity, delete the specific content of FuncProto; then merge the function prototype and upvalue list into LuaClosure. As shown below.

     |          |
     +----------+     +=======+
base |    foo   |     |Open(0)|<========+----------\
     +----------+     +=======+         |          |
   0 |    i     |<- -/  +=======+       |          |
     +----------+       |Open(1)|<------|---\      |
   1 |    ip    |<- - - +=======+       |   |      |
     +----------+         +=======+     |   |      |
   2 |    ic    |<- - - - |Open(2)|<----|---|------|---\
     +----------+         +=======+     |   |      |   |
   3 | producer +---->+-LuaClosure--+   |   |      |   |
     +----------+     | proto       |   |   |      |   |
   4 | consumer +--\  | upvalues   -+>*-+-+-+-+--  |   |
     +----------+  |  +-------------+ | i |ip |    |   |
     |          |  |                  *---+---+--  |   |
                   |                               |   |
                   \------------>+-LuaClosure--+   |   |
                                 | proto       |   |   |
                                 | upvalues   -+>*-+-+-+-+--
                                 +-------------+ | i |ic |
                                                 *---+---+--

It can be seen from the figure that compared with the Lua function LuaFunction defined in the previous chapter, although the closure LuaClosure can have an independent upvalue list, it has one more memory allocation and pointer jump. Here we are faced with a choice: to completely replace the function with the closure, or to coexist? The official implementation of Lua is the former, which is also the source of the phrase "all functions in Lua are closures". The advantage of substitution is that there is one less type, and the code is a little simpler; the advantage of coexistence is that the function type allocates less memory and one pointer jump after all. In addition to these two advantages and disadvantages, there is a larger difference that affects behavior. For example, the following sample code:

local function foo()
     return function () print "hello, world!" end
end
local f1 = foo()
local f2 = foo()
print(f1 == f2) -- true or false?

Here the anonymous function returned by calling foo() does not include the upvalue. So the question is, are the two return values ​​of the two calls to foo() equal?

  • If the LuaFunction type is reserved, then the return value is LuaFunction type, and f1 and f2 only involve the function prototype and are equal. Validation can be performed with the code from the previous chapter.

  • If the LuaFunction type is not reserved, the returned function is of the LuaClosure type. Although it does not contain upvalue, it is also two different closures, f1 and f2 are not equal.

So which of the above behaviors meets the requirements of Lua language? The answer is: both can. The description of function comparison in the Lua manual is as follows:

Functions created at different times but with no detectable differences may be classified as equal or not (depending on internal caching details).

That is, it doesn't matter, and there is no guarantee on it. Then we can choose whatever we want. In this project, we initially chose closures instead of functions, and later added function types back. I don't feel much difference.

Syntax Snalysis of Closure

When there was no closure before and it was still LuaFunction, the processing of function definition was very intuitive:

  • Parse the function definition and generate the function prototype FuncProto;
  • Wrap FuncProto with Value::LuaFunction and put it in the constant table;
  • Generate bytecodes such as LoadConst to read the constant table.

Function definitions are treated in a similar way to other types of constants. Recall that the relevant code is as follows:

     fn funcbody(&mut self, with_self: bool) -> ExpDesc {
         // omit preparation

         // The proto returned by the chunk() function is the FuncProto type
         let proto = chunk(self. lex, has_varargs, params, Token::End);
         ExpDesc::Function(Value::LuaFunction(Rc::new(proto)))
     }
     fn discharge(&mut self, dst: usize, desc: ExpDesc) {
         let code = match desc {
             // omit other types

             // Add the function reason to the constant table and generate LoadConst bytecode
             ExpDesc::Function(f) => ByteCode::LoadConst(dst as u8, self. add_const(f) as u16),

Now in order to support closures, the following improvements need to be made:

  • The relevant Value type definition has been changed to LuaClosure(Rc<LuaClosure>), so the parsed function prototype FuncProto cannot be directly put into the constant table. Although it can be placed indirectly, it is not intuitive. It is better to add a new table in the function prototype FuncProto to save the prototype list of the inner function.

  • When the virtual machine executes the function definition, an upvalue is generated in addition to the function prototype. Then the bytecode that directly reads the constant table like LoadConst does not meet the demand. A special bytecode needs to be added to aggregate the function prototype and the generated upvalue into a closure.

  • In addition, when generating upvalue, we need to know which local variables of the upper function are used by this function. Therefore, the function prototype also needs to add a list of upvalue references to upper-level local indexes.

In summary, the newly added bytecode for creating a closure is as follows:

pub enum ByteCode {
     Closure(u8, u16),

The two parameters associated with this bytecode are similar to the LoadConst bytecode, which are the target address on the stack and the index of the internal function prototype list inner_funcs.

In addition, two new members need to be added to the function prototype as follows:

pub struct FuncProto {
     pub upindexes: Vec<usize>,
     pub inner_funcs: Vec<Rc<FuncProto>>,

where inner_funcs is a list of prototypes of the inner functions defined inside the function. upindexes is the index of the local variable that the current function refers to the upper function, and this member needs to be modified later. It should be noted that inner_funcs is used when the current function acts as an outer function, and upindexes is used when the current function acts as a inner function.

After we introduce the complete features of upvalue later, we will introduce the analysis of the upvalue index upindexes.

Now, after introducing the definition and syntax analysis of closures, let's look at other scenarios of upvalue.

Improvement: References to Upvalue

The upvalues introduced before are all references to the local variables of the upper-level functions. Now let’s look at the references to the upvalues of the upper-level functions. Make a modification to the counting closure example at the beginning of this section, and put the incremental code i = i + 1 into a layer of functions:

local function newCounter()
     local i = 0
     return function ()
         print(i) -- upvalue
         local function increase()
             i = i + 1 -- where does `i` refer?
         end
         increase()
     end
end

local c1 = newCounter()
c1()

In this example, the i in the first line of the print statement of the anonymous function returned by the newCounter() function is the ordinary upvalue introduced before, pointing to the local variable of the upper-level function. And what is i in the internal function increase() function? Also upvalue. Who is this upvalue a reference to?

Can it be regarded as a cross-layer reference to the local variable i in the outermost newCounter() function? No, because it cannot be realized when the virtual machine is executed. When the anonymous function returns, the internal increase() function has not been created; only when the anonymous function is called outside, the internal increase() function will be created and executed; at this time the outermost newCounter() has ended, and the local variable i no longer exists, so it cannot be referenced.

Since it cannot be a cross-layer reference to the local variable i in the outermost newCounter() function, it can only be a reference to the upvalue i in the anonymous function of the outer layer.

In order to support references to upvalue, first, modify the definition of the upvalue list in FuncProto just now, from only supporting local variables to also supporting upvalue:

pub enum UpIndex {
     Local(usize), // index of local variables in upper functions
     Upvalue(usize), // index of upvalues in upper functions
}

pub struct FuncProto {
     pub upindexes: Vec<UpIndex>, // change from usize to UpIndex
     pub inner_funcs: Vec<Rc<FuncProto>>,

Then, look at the schematic diagram of calling the internal increase() function when executing the returned anonymous function counter c1 in the above example:

|          |
+----------+
|    c1    +-------------------->+-LuaClosure--+
+----------+                     | proto       |
| increase |                     | upvalues    +--->+---+--
+----------+                     +-------------+    | i |
| increase +-->+-LuaClosure--+                      +-+-+--
+----------+   | proto       |                        |
|          |   | upvalues    +--->+---+--             |
               +-------------+    | i |               |
                                  +-+-+--             V=========+
                                    \---------------->|Closed(i)|
                                                      +=========+

On the left is the stack. Among them, c1 is the function call entry, and the corresponding closureThe upvalue i contained in the package is referenced in the print statement.

The first increase below the stack is a local variable in c1. The second increase is the function call entry, and the upvalue i contained in the corresponding closure is referenced in the statement that performs the increment operation. In the function prototype, this upvalue should correspond to the 0th upvalue of the upper-level function, namely UpIndex::upvalue(0), so when the virtual machine executes and generates this closure, this upvalue points to the 0th of c1 upvalue, which is Closed(i) in the figure. In this way, the increment operation of i in this function will also be reflected in the print statement of c1 function.

Improvement: References Across Multiple Layer Functions

Let's look at another scenario: cross-layer references. Slightly modifying the above use case to put the print statement after the increment operation, we get the following sample code:

local function newCounter()
     local i = 0
     return function ()
         local function increase()
             i = i + 1 -- upvalue of upper-upper local
         end
         increase()
         print(i) -- upvalue
     end
end

The difference between this example and the above example is that when the increase() function is parsed, the upvalue i has not been generated in the anonymous function to be returned, so i in the increase() function points to who? Summarize the previous upvalue types: either it refers to the local variable of the upper-level function, or the upvalue of the upper-level function, and analyzes that it cannot be referenced across multiple layers of functions. Therefore, there is only one solution: create an upvalue in the middle layer function. This upvalue is not used in the current function (by now), it is only used to reference the inner function.

The current function does not use the created upvalue "by now". But in the subsequent analysis process, it may still be used. For example, after the above example, the following print statement uses this upvalue.

In this example, the prototypes and schematic diagrams of the two functions are the same as the above example. omitted here.

At this point, all the upvalue features are finally introduced, and the final solution is given. During this period, syntax analysis and virtual machine execution are also involved. Next, according to the final plan, we will briefly organize syntax analysis and virtual machine execution.

Syntax Analysis of Upvalue Index

When introducing Syntax Analysis of Closures, it is pointed out that in the function prototype FuncProto, a new member upindexes needs to be added to represent the upvalue index of the current function.

In the previous section, the variable parsing process is listed:

  1. Match in the local variable list of the current function, if found, it is a local variable;
  2. Match in the local variable list of upper-level functions, if found, it will be upvalue;
  3. Otherwise it is a global variable.

According to the introduction of the complete features of upvalue earlier in this section, the above-mentioned step 2 is extended to the more detailed analysis steps of the upvalue index. The final process of variable analysis is as follows:

  1. Match in the local variable list of the current function, if found, it is a local variable;
  2. Match in the upvalue list of the current function, if found, the upvalue already exists; (reuse upvalue)
  3. Match in the local variable list of outer functions, if found, add an upvalue; (ordinary upvalue)
  4. Match in the upvalue list of the outer function, if found, add an upvalue; (reference to the upvalue in the upper function)
  5. Match in the local variable list of the outer functions, if found, create an upvalue in all intermediate layer functions, and add an upvalue; (references across multi-layer functions)
  6. Match in the upvalue list of the outer functions, if found, create an upvalue in all intermediate layer functions, and add an upvalue; (a reference to an upvalue across multi-layer functions)
  7. Repeat steps 5 and 6 above, if the outermost function is still not matched, it is a global variable.

There is obviously a lot of duplication in this process. The most obvious is that steps 3 and 4 are special cases of steps 5 and 6, that is, there is no intermediate layer function, so steps 3 and 4 can be removed. In addition, when the code is implemented, steps 1 and 2 can also be omitted as special cases. Since there is too much content in this section, the specific code will not be posted here.

In the syntax analysis in the previous section, in order to support upvalue, it is necessary to access the local variable list of the upper-level function, so the new context ParseContext data structure is added, which contains the local variable list of functions at all levels. This section introduces that upvalue can also refer to the upvalue of upper-level functions, so it is also necessary to add the upvalue list of functions at all levels in ParseContext.

struct ParseContext<R: Read> {
     all_locals: Vec<Vec<String>>,
     all_upvalues: Vec<Vec<(String, UpIndex)>>, // new
     lex: Lex<R>,
}

pub struct FuncProto {
     pub upindexes: Vec<UpIndex>,

In the above code, ParseContext is the parsing context, which is the internal data structure of parsing. The member type of its upvalue list all_upvalues is (String, UpIndex), where String is the name of the upvalue variable, which is used for the matching in step 4 and step 6; UpIndex is the index of upvalue.

FuncProto is the output of the syntax analysis stage, which is used by the virtual machine execution stage. At this time, the upvalue variable name is not needed, and only the UpIndex index is needed.

Virtual Machine Execution

In the front part of this section, when introducing the upvalue design scheme, it was basically introduced according to the execution phase of the virtual machine, so we will go through it again here.

First, the closure is created, that is, the function is defined. To this end, a new bytecode ByteCode::Closure is introduced, whose responsibility is to generate upvalue, package it together with the function prototype as a closure, and load it on the stack.

What needs to be explained here is that in the syntax analysis phase, in order to access the local variables of the upper-level function, the ParseContext context needs to be introduced; however, in the virtual machine execution phase, although upvalue also needs to access the stack space of the upper-level function, it does not need for a similar context. This is because when the closure is created, the upvalue list is generated by the outer function and passed into the closure, and the inner function can indirectly access the stack space of the outer function through the upvalue list.

Besides, in addition to passing the closure into the generated upvalue list, the outer function itself also needs to maintain the list for two purposes:

  • As mentioned in the Shared upvalue section above, if a function contains multiple closures, the upvalue of these closures must share local variables. Therefore, when creating an upvalue, first check whether the upvalue associated with this local variable has been created. If so, share; otherwise, create a new one.

    There is a small problem here, the check of whether this has been created is carried out during the virtual machine execution phase. There are generally not many upvalue lists, so it is not necessary to use a hash table. If Vec is used, the time complexity of this matching check is O(n). When there are many upvalues, this may affect performance. Can this matching check be placed in the syntax analysis stage? This issue will be addressed in detail in the next section.

  • When the outer function exits, upvalue needs to be closed.

    It should be noted that, theoretically speaking, only escaped upvalues need to be closed; there is no need to close unescaped upvalues. However, it is very difficult to determine whether an upvalue escapes or not at the syntax stage. Because except for the obvious escape case where the internal function is used as the return value in the above example, there are also situations such as assigning the internal function to an external table. It is also very troublesome to judge whether to escape in the virtual machine stage. So for the sake of simplicity, we refer to the official implementation of Lua here, and close all upvalues at the end of the function, regardless of whether they escape.

The timing of closing upvalue is where all functions exit, including Return, Return0 and TailCall bytecodes. The specific closing code is omitted here.

Summary

This section introduces the escape of upvalue and adds closure types. But it mainly introduces how to design and manage upvalue, but does not talk about specific operations, including how to create, read, write, and close upvalue. However, after the design plan is explained clearly, these specific operations are relatively simple. This section is already very long, so the introduction and code of this part will be omitted.

Rust DST

Now introduce a feature of the Rust language, DST.

The definition of the closure data structure LuaClosure earlier in this section is as follows:

pub struct LuaClosure {
     proto: Rc<FuncProto>,
     upvalues: Vec<Rc<RefCell<upvalue>>>,
}

The function prototype proto field is ignored here, and only the upvalue list upvalues field is concerned. In order to store any upvalue, the upvalues here are defined as a list Vec. This requires an additional allocation of memory. The memory layout of the entire closure is as follows:

+-LuaClosure--+
| proto       |
| upvalues:   |    upvalue list
|   ptr     --+--->+------+------+-
|   capacity  |    |      |      |
|   length    |    +------+------+-
+-------------+

In the figure above, the closure LuaClosure is on the left, and the extra memory on the right pointed to by ptr is the actual storage space of the upvalue list Vec. There are three disadvantages of allocating an additional memory in this way:

  • Waste of memory, each segment of memory requires additional management space and waste due to alignment;
  • When applying for memory, one more allocation needs to be performed, which affects performance;
  • When accessing upvalue, one more pointer jump is required, which also affects performance.

For the requirement of this variable-length array, the classic approach in C language is: define a zero-length array in the data structure, and then specify the actual length as needed when actually allocating memory. The sample code is as follows:

// define the data structure
struct lua_closure {
     struct func_proto *proto;
     int n_upavlue; // actual number
     struct upvalue upvalues[0]; // zero-length array
}

// request memory
struct lua_closure *c = malloc(sizeof(struct lua_closure) // basic space
         + sizeof(struct upvalue) * n_upvalue); // extra space

// initialization
c->n_upvalue = n_upvalue;
for (int i = 0; i < n_upvalue; i++) {
     c->upvalues[i] = ...
}

The corresponding memory layout is as follows:

+-------------+
| proto       |
| n_upvalue   |
:             : \
:             :  + Upvalue列表
:             : /
+-------------+

This approach can avoid the above three disadvantages. Can this be done in Rust? For example, the following definition:

pub struct LuaClosure {
     proto: Rc<FuncProto>,
     upvalues: [Rc<RefCell<upvalue>>], // slice
}

In this definition, the type of upvalues has changed from list Vec to slice []. The good news is that Rust supports the DST type (that is, the slice here) as the last field of the data structure, which means that the above definition is legal. The bad news is that such data structures cannot be initialized. A data structure that cannot be initialized, is of course useless. To quote The Rustonomicon: custom DSTs are a largely half-baked feature for now.

We can think about why it cannot be initialized? For example, Rc has Rc::new_uninit_slice() API to create slices, so can a similar API be added to create this data structure containing slices? In addition, you can also refer to dyn_struct.

However, even if it can be initialized, and the definition of the above data structure can be used, but there will be another problem: since the upvalues field is DST, then the entire LuaClosure will also become DST, so the pointer will become a fat pointer, including the actual length of the slice, Rc<LuaClosure> becomes 2 words, which in turn causes enum Value to change from 2 words to 3 words. This does not meet our requirements, just like Rc<str> cannot be used to define the string type before.

Since slice cannot be used, is there any other solution? Fixed-length arrays can be used. For example, modify the definition as follows:

enum Varupvalues {
     One(Rc<RefCell<upvalue>>), // 1 upvalue
     Two([Rc<RefCell<upvalue>>; 2]), // 2 upvalues
     Three([Rc<RefCell<upvalue>>; 3]), // 3 upvalues
     Four([Rc<RefCell<upvalue>>; 4]), // 4 upvalues
     More(Vec<Rc<RefCell<upvalue>>>), // more upvalue
}

pub struct LuaClosure {
     proto: Rc<FuncProto>,
     upvalues: Varupvalues,
}

In this way, for closures with no more than 4 upvalues, additional memory allocation can be avoided. This should satisfy most cases. In the case of more upvalues, the waste of allocating another piece of memory is relatively not that great. Another advantage of this solution is that it does not involve unsafe. Of course, the problem with this solution is that it will bring coding complexity. Since the creation of LuaClosure is only generated once when the closure is created, it is not a high-frequency operation, so there is no need to make it so complicated. Therefore, in the end, we still use the original Vec solution.