Upvalue

Before introducing closures, this section first introduces an important part of closures: upvalue.

This section mainly introduces the concept of upvalue, and introduces the changes needed to support upvalue in the syntax analysis and virtual machine execution stages. It is very complicated to realize the complete features of upvalue, so in order to focus on the change of the overall structure and process, this section only supports the most basic upvalue features, and leave it to the next section to introduce the difficult part: escape.

The sample code below shows the most basic scenario of upvalue:

local a = 1
local function foo()
     print(a) -- What type of variable is `a`? Local variable, or global variable?
end

The entire code can be seen as a top-level function, which defines two local variables: a and the function foo. The reference a in print(a) inside the foo() function refers to a local variable defined outside the function, so what kind of variable is the a inside the function? First, it is not defined inside the foo() function, so it is not a local variable; second, it is a local variable defined in the outer function, so it is not a global variable. Local variables that refer to outer functions like this are called upvalue in Lua. There is also the concept of closure in the Rust language, and local variables in the outer function can also be referenced, which is called "capture environment", which should be the same concept with upvalue.

upvalues are very common in Lua. In addition to the above-mentioned obvious cases, there is also the fact that calling local functions at the same level is also an upvalue, such as the following code:

local function foo()
     print "hello, world"
end
local function bar()
     foo() -- upvalue
end

The foo() function called in the bar() function is upvalue. In addition, recursive calls to local functions are also upvalue.

After introducing the concept of upvalue, the syntax analysis process of upvalue is as follows.

Variable Resolution Process

The previous interpreter only supports two variable types: local variables and global variables. The variable parsing process is as follows:

  1. Match in the local variable list of the current function, if found, it is a local variable;
  2. Otherwise, it is a global variable.

Now to add support for the upvalue type, the parsing process of the variable needs to be added one step, which is changed to:

  1. Match in the local variable list of the current function, if found, it is a local variable;
  2. Match in the local variables list of upper layer functions, if found, it will be upvalue; (NEW STEP)
  3. Otherwise it is a global variable.

The newly added step 2 looks simple, but the specific implementation is very complicated, and the description here is not accurate, which will be described in detail in the next section. This section focuses on the overall process, that is, how to deal with it after parsing the upvalue.

Similar to local variables and global variables, a new type of ExpDesc is also added for upvalue:

enum ExpDesc {
     Local(usize), // local variables or temporary variables on the stack
     upvalue(usize), // upvalue
     Global(usize), // global variable

To review, the associated parameter of the local variable ExpDesc::Local represents the index on the stack, and the associated parameter of the global variable ExpDesc::Global represents the index of the variable name in the constant table. What parameters do upvalue need to be associated with? Take the following sample code that contains multiple upvalues as an example:

local a, b, c = 100, 200, 300
local function foo()
     print (c, b)
end

In the above code, there are two upvalues in the foo() function, c and b, which correspond to the index 2 and 1 of the local variable in the upper function respectively (the index starts counting from 0), so naturally, they can be represented by ExpDesc::upvalue(2) and ExpDesc::upvalue(1). In this way, when the virtual machine is executing, it can also conveniently index to the local variables on the stack of the upper layer function. Simple and natural. But when the escape of upvalue is introduced in the next section, this solution cannot meet the requirements. But for the sake of simplicity, this section will be used for the time being.

Parsing Context

The new step 2 in the above variable resolution process requires access to local variables of the outer functions. In the last chapter Analysis Function, recursion is used to support multi-layer function definition. This is not only simple to implement, but also provides a certain degree of encapsulation, that is, only the information of the current function can be accessed. This is originally an advantage, but now in order to support upvalue, we need to access the local variables of the outer function, so this encapsulation becomes a disadvantage that needs to be overcome. Programs are becoming more and more complex and confusing in such ever-increasing demands.

When recursively parsing multi-layer functions before, there is one member throughout, that is, lex: Lex<R> in ParseProto, which needs to be accessed when parsing all functions. Now in order to be able to access the local variables of the outer function, a similar member is needed throughout to store the local variables of each function. To do this, we create a new data structure containing the original lex and the new list of local variables:

struct ParseContext<R: Read> {
     all_locals: Vec<Vec<String>>, // Local variables of each layer function
     lex: Lex<R>,
}

The all_locals member represents the local variable list of each layer function. Each time a function of a new layer is parsed, a new member is pushed into it; after parsing is complete, it is popped. So the last member in the list is the list of local variables for the current function.

Then in ParseProto, replace the original lex with ctx, and delete the original locals:

struct ParseProto<'a, R: Read> {
     // delete: locals: Vec<String>,
     ctx: &'a mut ParseContext<R>, // add ctx to replace the original lex
     ...

And all places where the locals field is used in the syntax analysis code must also be modified to the last member of ctx.all_locals, which is the local variable list of the current function. The specific code is omitted here.

So far, there are three data structures related to syntax analysis:

  • FuncProto, which defines the function prototype, is the output of the syntax analysis stage and the input of the virtual machine execution stage, so all fields are pub;
  • ParseProto, used internally in the parsing phase, and only in the current function;
  • ParseContext, the global state used internally by the parsing phase and accessible at all function levels.

After the transformation of ParseProto, with the ability to access the outer function, the upvalue can be parsed. But here is just saying that it has the ability to analyze, and the specific analysis process will be introduced in the next section.

Bytecode

After parsing the upvalue, for its processing, you can refer to the previous discussion of global variables, the conclusion is as follows:

  • read, first loaded on the stack, converted to a temporary variable;
  • Assignment, only supports assignment from local/temporary variables and constants. For other types of expressions, it is first loaded into a temporary variable on the stack and then assigned.

To this end, compared with global variables, add 3 upvalue-related bytecodes:

pub enum ByteCode {
     // global variable
     GetGlobal(u8, u8),
     SetGlobal(u8, u8),
     SetGlobalConst(u8, u8),

     //upvalue
     Getupvalue(u8, u8), // Load upvalue onto the stack
     Setupvalue(u8, u8), // Assign value from the stack
     SetupvalueConst(u8, u8), // assign value from constant

The generation of these three new bytecodes can also be completed by referring to global variables. The specific code is omitted here.

Virtual Machine Execution

The analysis process of upvalue is introduced above, and the corresponding bytecode has completed the syntax analysis stage. The rest is the virtual machine execution phase.

According to the above processing scheme for upvalue, that is, the associated parameter of ExpDesc::upvalue represents the local variable index of the upper-level function. When the virtual machine is executed, it will also encounter the same problem as the syntax analysis: the current function needs to access the upper-level functions' local variables. Therefore, in order to complete the virtual machine execution phase, big changes must be made to the current code structure.

However, the above-mentioned upvalue processing scheme is only a temporary scheme in this section. In the next section, in order to support the escape of upvalue, there will be a completely different scheme and a completely different virtual machine execution process. Therefore, in order to avoid useless work, the execution of the virtual machine under this scheme will not be implemented for the time being. Interested friends can try to change it.