Local Variables

This section describes the definition and access of local variables, while the assignment is covered in the next section.

For the sake of simplicity, we only support the simplified format for defining local variable statements: local name = expression, that is to say, it does not support multiple variables or no initialization. We will support the full format in later chapter. The target code is as follows:

local a = "hello, local!" -- define new local var 'a'
print(a) -- use 'a'

How are local variables managed, stored, and accessed? First refer to the results of luac:

main <local.lua:0,0> (6 instructions at 0x6000006e8080)
0+ params, 3 slots, 1 upvalue, 1 local, 2 constants, 0 functions
	1	[1]	VARARGPREP	0
	2	[1]	LOADK    	0 0	; "hello, world!"
	3	[2]	GETTABUP 	1 0 1	; _ENV "print"
	4	[2]	MOVE     	2 0
	5	[2]	CALL     	1 2 1	; 1 in 0 out
	6	[2]	RETURN   	1 1 1	; 0 out

Compared with the program that directly prints "hello, world!" in the previous chapter, there are several differences:

  • 1 local in the second line of the output, indicating that there is 1 local variable. But this is just an illustration and has nothing to do with the following bytecode.
  • LOADK, loads the constant at index 0 of the stack. Corresponding to line [1] of the source code, that is, defining local variables. It can be seen that variables are stored on the stack and assigned during execution.
  • The target address of GETTABUP is 1 (it was 0 in the previous chapter), that is, print is loaded into location 1, because location 0 is used to store local variables.
  • MOVE, the new bytecode, is used to copy the value in the stack. The two parameters are the destination index and the source index. Here is to copy the value of index 0 to index 2. It is to use the local variable a as the parameter of print.

After the first 4 bytecodes are executed, the layout on the stack is as follows:

  +-----------------+   MOVE
0 | local a         |----\
  +-----------------+    |
1 | print           |    |
  +-----------------+    |
2 | "hello, world!" |<---/
  +-----------------+
  |                 |

It can be seen that local variables are stored on the stack during execution. In the previous chapter, the stack was only used for function calls, and now it stores local variables too. Relatively speaking, local variables are more persistent and only become invalid after the end of the current block. The function call is invalid after the function returns.

Define Local Variables

Now we add support of handling local variables. First define the local variable table locals. In the value and type section, it shows that Lua variables only contain variable name information, but no type information, so this table only saves variable names, defined as Vec<String>. In addition, this table is only used during syntax analysis, but not needed during virtual machine execution, so it does not need to be added to ParseProto.

Currently, 2 statements are supported (2 formats of function calls):

Name String
Name ( exp )

Among them, exp is an expression, which currently supports a variety of constants, such as strings and numbers.

Now we add a new statement, the simplified form of defining a local variable:

localName = exp

This also includes exp. So extract this part as a function load_exp(). Then the syntax analysis code corresponding to the definition of local variables is as follows:

     Token::Local => { // local name = exp
         let var = if let Token::Name(var) = lex.next() {
             var // can not add to locals now
         } else {
             panic!("expected variable");
         };

         if lex.next() != Token::Assign {
             panic!("expected `=`");
         }

         load_exp(&mut byte_codes, &mut constants, lex.next(), locals.len());

         // add to locals after load_exp()
         locals. push(var);
     }

The code is relatively simple and needs no explaination. The load_exp() function refers to the following section.

What needs special attention is that when the variable name var is first parsed, it cannot be directly added to the local variable table locals, but can only be added after the expression is parsed. It can be considered that when var is parsed, there is no complete definition of local variables; it needs to wait until the end of the entire statement to complete the definition and add it to the local variable table. The following subsections explain the specific reasons.

Access Local Variables

Now access the local variable, that is, the code print(a). That is to increase the processing of local variables in exp.

In fact, in the Name ( exp ) format of the function call statement in the previous section, you can add global variables in exp. In this way, Lua code such as print(print) can be supported. It's just that at that time, I only cared about adding other types of constants, and forgot to support global variables. This also reflects the current state, that is, the addition of functional features is all based on feeling, while the completeness or even correctness cannot be guaranteed at all. We will address this issue in subsequent chapters.

So modify the code of load_exp() (the processing part of the original various constant types is omitted here):

fn load_exp(byte_codes: &mut Vec<ByteCode>, constants: &mut Vec<Value>,
         locals: &Vec<String>, token: Token, dst: usize) {

     let code = match token {
         ... // other type consts, such as Token::Float()...
         Token::Name(var) => load_var(constants, locals, dst, var),
         _ => panic!("invalid argument"),
     };
     byte_codes. push(code);
}

fn load_var(constants: &mut Vec<Value>, locals: &Vec<String>, dst: usize, name: String) -> ByteCode {
     if let Some(i) = locals.iter().rposition(|v| v == &name) {
         // local variable
         ByteCode::Move(dst as u8, i as u8)
     } else {
         // global variable
         let ic = add_const(constants, Value::String(name));
         ByteCode::GetGlobal(dst as u8, ic as u8)
     }
}

The processing of variables in the load_exp() function is also placed in a separate load_var() function, because the "function" part of the previous function call statement can also call this load_var() function, so that local variables can also be supported as a function.

The processing logic for variables is to search the Name in the local variable table locals,

  • if exist, it is a local variable, then generate the Move bytecode, which is a new bytecode;
  • otherwise, it is a global variable. The handling process was introduced in the previous chapter, so it is skipped here.

It is foreseeable that after supporting Upvalue, it will also be handled in this function.

When the load_var() function looks up variables in the variable table, it searches from the back to the front, that is, the .rposition() function is used. This is because we did not check for duplicate names when registering local variables. If there is a duplicate name, it will be registered as usual, that is, it will be pushed at the end of the local variable table. In this case, the reverse search will find the variable registered later, and the variable registered first will never be located. It is equivalent to the variable registered later covering the previous variable. For example, the following code is legal and outputs 456:

local a = 123
local a = 456
print(a) -- 456

I find this approach very ingenious. If you check if a local variable exists every time adding a local variable, it will definitely consume performance. And this kind of repeated definition of local variables is rare (maybe I am ignorant), and it is not worth checking duplication (whether it is error reporting or reuse) for this small probability situation. The current approach (reverse lookup) not only guarantees performance, but also can correctly support this situation of repeated definitions.

There are similar shadow variables in Rust. However, I guess Rust should not be able to ignore it so simply, because when a variable in Rust is invisible (such as being shadowed), it needs to be dropped, so it is still necessary to specially judge this shadow situation and handle it specially.

Another problem is that as mentioned at the end of the previous paragraph Define Local Variables, when the variable name var is parsed, it cannot be directly added to the local variable table locals, but must only be added after parsing the expression. At that time, because there was no "access" to the local variable, the specific reason was not explained. Now it can be explained. For example for the following code:

local print = print

This kind of statement is relatively common in Lua code, that is, assign a commonly used "global variable" to a "local variable" with the same name, so that it will be the local variable accessed when this name is referenced later. Local variables are much faster than global variables (local variables are accessed through the stack index, while global variables need to look up the global variable table in real time, which is the difference between the two bytecodes of Move and GetGlobal), which will improve performance.

Going back to the question just now, if the variable name print is just added to the local variable table when it is parsed, then when the expression print behind = is parsed, the local variable table will find the newly added print, then it is equivalent to assigning the local variable print to the local variable print, and the cycle is meaningless (if you do this, print will be assigned the value of nil).

To sum up, variables must be added to the local variable table after parsing the expression behind =.

Where the Function is Called

Previously, our interpreter only supported function call statements, so the stack is only a place for function calls. When a function call is executed, the function and parameters are fixed at 0 and 1 respectively. Now that local variables are supported, the stack is not just a place for function calls, and the positions of functions and parameters are not fixed, but need to become the first free position on the stack, that is, the next position of local variables. to this end:

  • During syntax analysis, we can get the number of local variables through locals.len(), that is, the first free position on the stack.

  • When the virtual machine is executing, we need to add a field func_index in ExeState, set this field before the function call to indicate this position, and use it in the function. The corresponding codes are as follows:

     ByteCode::Call(func, _) => {
         self.func_index = func as usize; // set func_index
         let func = &self. stack[self. func_index];
         if let Value::Function(f) = func {
             f(self);
         } else {
             panic!("invalid function: {func:?}");
         }
     }
fn lib_print(state: &mut ExeState) -> i32 {
     println!("{:?}", state.stack[state.func_index + 1]); // use func_index
     0
}

Test

So far, we have realized the definition and access of local variables, and also re-organized the code in the process, making the previous function call statement more powerful. Both the function and the parameter support global variables and local variables. So the 2-line object code at the beginning of this article is too simple. You can try the following code:

local a = "hello, local!" -- define a local by string
local b = a -- define a local by another local
print(b) -- print local variable
print(print) -- print global variable
local print = print --define a local by global variable with same name
print "I'm local-print!" -- call local function

Results of the:

[src/parse.rs:71] &constants = [
     hello, local!,
     print,
     I'm local-print!,
]
byte_codes:
   LoadConst(0, 0)
   Move(1, 0)
   GetGlobal(2, 1)
   Move(3, 1)
   Call(2, 1)
   GetGlobal(2, 1)
   GetGlobal(3, 1)
   Call(2, 1)
   GetGlobal(2, 1)
   Move(3, 2)
   LoadConst(4, 2)
   Call(3, 1)
hello, local!
function
I'm local-print!

In line with expectations! The bytecodes are a bit long, you can compare it with the output of luac. We used to be able to analyze and imitate the bytecode sequence compiled by luac, but now we can compile and output bytecode independently. Great progress!

OO in Syntax Analysis Code

The feature has been completed. However, with the increase of features, the code in the syntax analysis part becomes more chaotic. For example, the definition of the above load_exp() function has a bunch of parameters. In order to organize the code, the syntax analysis is also transformed into an object-oriented model, and methods are defined around ParseProto. These methods can get all the information through self, so there is no need to pass many parameters. For specific changes, see commit f89d2fd.

Bringing together several independent members also presents a small problem, a problem specific to the Rust language. For example, the original code for reading a string constant is as follows, first call load_const() to generate and return the bytecode, and then call byte_codes.push() to save the bytecode. These two function calls can be written together:

byte_codes.push(load_const(&mut constants, iarg, Value::String(s)));

After changing to object-oriented mode, the code is as follows:

self.byte_codes.push(self.load_const(iarg, Value::String(s)));

But this cannot be compiled, and the error is as follows:

error[E0499]: cannot borrow `*self` as mutable more than once at a time
  --> src/parse.rs:70:38
   |
70 |                 self.byte_codes.push(self.load_const(iarg, Value::String(s)));
   |                 ---------------------^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^-
   |                 |               |    |
   |                 |               |    second mutable borrow occurs here
   |                 |               first borrow later used by call
   |                 first mutable borrow occurs here
   |
help: try adding a local storing this argument...
  --> src/parse.rs:70:38
   |
70 |                 self.byte_codes.push(self.load_const(iarg, Value::String(s)));
   |                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
help: ...and then using that local as the argument to this call
  --> src/parse.rs:70:17
   |
70 |                 self.byte_codes.push(self.load_const(iarg, Value::String(s)));
   |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

For more information about this error, try `rustc --explain E0499`.

Although the Rust compiler is very strict, the error message is still very clear, and even gives correct modification method.

self is referenced 2 times by mut. Although self.byte_codes is not used in self.load_const(), and there is no conflict in fact, the compiler does not know these details. The compiler only knows that self is referenced twice. This is the consequence of bringing together multiple members. The solution is to introduce a local variable as suggested by Rust, and then split this line of code into two lines:

let code = self.load_const(iarg, Value::String(s));
self.byte_codes.push(code);

The situation here is simple and easy to fixed, because the returned bytecode code is not related to self.constants, so it has no connection with self, so self.byte_codes can be used normally below. If the content returned by a method is still associated with this data structure, the solution becomes not so simple. This situation will be encountered later when the virtual machine is executed.