Variable Assignment

In the program that prints "hello, world!" at the beginning of Chapter 1, we support global variables, namely the print function. However, it only supports access, but not assignment or creation. Now the only global variable print is manually added to the global variable table when creating a virtual machine. In the previous section, we implemented the definition and access of local variables, but assignment is also not supported. This section will implement assignment of global variables and local variables.

The assignment of simple variables is relatively simple, but the complete assignment statement in Lua is very complicated, such as t[f()] = 123. Here we first realize the variable assignment, and then briefly introduce the difference between the complete assignment statement.

Combination of Assignments

The variable assignment statements to be supported in this section are expressed as follows:

Name = exp

The left side of the equal sign = (lvalue) currently has two categories, local variables and global variables; the right side is the expression exp in the previous chapter, which can be roughly divided into three categories: constants, local variables, and global variables. So this is a 2*3 combination:

  • local = const, load the constant to the specified location on the stack, corresponding to the bytecode LoadNil, LoadBool, LoadInt and LoadConst, etc.

  • local = local, copy the value on the stack, corresponding to the bytecode Move.

  • local = global, assign the value on the stack to the global variable, corresponding to the bytecode GetGlobal.

  • global = const, to assign a constant to a global variable, you need to add the constant to the constant table first, and then complete the assignment through the bytecode SetGlobalConst.

  • global = local, assign local variable to global variable, corresponding to bytecode SetGlobal.

  • global = global, assign global variable to global variable, corresponding to bytecode SetGlobalGlobal.

Among these 6 cases, the first 3 are assigned to local variables. The load_exp() function in the previous section has been implemented and will not be introduced here. The latter three are assigned to global variables, and three new bytecodes are added accordingly. The parameter format of these 3 bytecodes is similar, and they all have 2 parameters, which are:

  1. The index of the name of the target global variable in the constant table, similar to the second parameter of the previous GetGlobal bytecode. So in these three cases, you need to add the name of the global variable to the constant table first.
  2. Source index, the three bytecodes are: the index in the constant table, the address on the stack, and the index of the name of the global variable in the constant table.

The fourth case above, that is global = const, handles all constant types with only one bytecode, not like previous local variables which set different bytecodes for some types (such as LoadNil, LoadBool, etc.). This is because local variables are located directly through the index on the stack, and the virtual machine executes its assignment very quickly. If the source data can be inlined into the bytecode and reduce the access to the constant table once, it can be significantly proportional performance improvement. However, accessing global variables requires a table lookup, and the execution of the virtual machine is slow. At this time, the performance improvement brought by the inline source data is relatively small, so it is unnecessary. After all, more bytecodes bring more complexity in the parsing and execution stages.

Lexical Analysis

Originally, function calls and local variable definitions were supported, but now variable assignment statements are added. as follows:

Name String
Name ( exp )
localName = exp
Name = exp # add new

There is a problem here. The newly added variable assignment statement also starts with Name, which is the same as function call. Therefore, based on the indistinguishability of the first token at the beginning, it is necessary to "peek" forward at another token: if it is an equal sign =, it is a variable assignment statement, otherwise it is a function call statement. The "peek" here is in quotation marks to emphasize that it is a real peek but not take the token, because the subsequent statement analysis still needs to use this token. To this end, the lexical analysis also adds a peek() method:

    pub fn next(&mut self) -> Token {
        if self.ahead == Token::Eos {
            self.do_next()
        } else {
            mem::replace(&mut self.ahead, Token::Eos)
        }
    }

    pub fn peek(&mut self) -> &Token {
        if self.ahead == Token::Eos {
            self.ahead = self.do_next();
        }
        &self.ahead
    }

The ahead is a newly added field in the Lex structure, which is used to save the Token that is parsed from the character stream but cannot be returned. According to the convention of Rust language, this ahead should be of type Option<Token>, Some(Token) means that there is a Token read ahead, and None means there is no Token. But for the same reason as next()return value type, the Token type is directly used here, and Token::Eos is used to represent no Read Token in advance.

The original external next() function is changed to do_next() internal function, which is called by the newly added peek() and new next() functions.

The newly added peek() function returns &Token instead of Token, because the owner of the Token is still Lex, and it has not been handed over to the caller. Just "lending" it to the caller to "look". If the caller not only wants to "see" but also "change", then &mut Token is needed, but we only need to look, and do not need to change. Now that there is & borrowing, it involves lifetime in Rust. Since this function has only one input lifetime parameter, that is &mut self, according to elision rules, which is given to all output lifetime parameters, the annotation of the lifetime can be omitted below. This default lifetime means to the compiler that the legal cycle of the returned reference &Token is less than or equal to the input parameter, namely &mut self, that is, Lex itself.

I personally think that the owner of variables, borrowing (reference), and variable borrowing are the core concepts of the Rust language. The concept itself is very simple, but it takes a period of in-depth struggle with the compiler to understand it deeply. The concept of lifetime is based on the above-mentioned core concepts, but it is also slightly more complicated and needs to be understood in practice.

The new next() is a simple wrapper for the original do_next() function, which handles the Token that may be stored in ahead and peeked before: if it exists, it will directly return this Token without calling do_next(). But this "direct return" in Rust is not very straightforward. Since Token type is not Copy (because its String(String) type is not Copy), so cannot return directly. The simple solution is to use Clone, but the meaning of Clone is to tell us that there is a price to pay, for example, for string type, we need to copy the string content; and we don't need 2 copies of strings, because the Token is returned. After that, we don't need this Token anymore. So the result we need now is: return the Token in ahead, and simultaneously clean up ahead (here naturally set to represent "no" Token::Eos). This scene is very similar to the gif of "Raiders of the Lost Ark" that is widely circulated on the Internet (search for "Raiders of the Lost Ark gif" on the internet), and the sandbag in the hand "replaces" the treasure on the mechanism. "Replace" here is a keyword, and this requirement can be fulfilled with the std::mem::replace() function in the standard library. This requirement is so common (at least very common in C language projects) that it feels surprised to use such a long-name function to achieve it. But it is precisely because of these restrictions that the security promised by Rust is guaranteed. But if ahead is of Option<Token> type, then you can use the take() method of Option, which looks simpler and has exactly the same function.

Syntax Analysis

With the increase of functions, there will be more and more internal codes in the big cycle of syntax analysis, so we first put each statement into an independent function, namely function_call() and local(), and then add variable assignment statement assignment(). The peek() function added in the lexical analysis just now is used here:

     fn chunk(&mut self) {
         loop {
             match self. lex. next() {
                 Token::Name(name) => {
                     if self.lex.peek() == &Token::Assign {
                         self. assignment(name);
                     } else {
                         self. function_call(name);
                     }
                 }
                 Token::Local => self. local(),
                 Token::Eos => break,
                 t => panic!("unexpected token: {t:?}"),
             }
         }
     }

Then look at the assignment() function:

     fn assignment(&mut self, var: String) {
         self. lex. next(); // `=`

         if let Some(i) = self. get_local(&var) {
             // local variable
             self. load_exp(i);
         } else {
             // global variable
             let dst = self.add_const(var) as u8;

             let code = match self. lex. next() {
                 // from const values
                 Token::Nil => ByteCode::SetGlobalConst(dst, self.add_const(Value::Nil) as u8),
                 Token::True => ByteCode::SetGlobalConst(dst, self.add_const(Value::Boolean(true)) as u8),
                 Token::False => ByteCode::SetGlobalConst(dst, self.add_const(Value::Boolean(false)) as u8),
                 Token::Integer(i) => ByteCode::SetGlobalConst(dst, self.add_const(Value::Integer(i)) as u8),
                 Token::Float(f) => ByteCode::SetGlobalConst(dst, self.add_const(Value::Float(f)) as u8),
                 Token::String(s) => ByteCode::SetGlobalConst(dst, self.add_const(Value::String(s)) as u8),

                 // from variable
                 Token::Name(var) =>
                     if let Some(i) = self. get_local(&var) {
                         // local variable
                         ByteCode::SetGlobal(dst, i as u8)
                     } else {
                         // global variable
                         ByteCode::SetGlobalGlobal(dst, self. add_const(Value::String(var)) as u8)
                     }

                 _ => panic!("invalid argument"),
             };
             self.byte_codes.push(code);
         }
     }

For the case where the lvalue is a local variable, call load_exp() to handle it. For the case of global variables, according to the type of the expression on the right, generate SetGlobalConst, SetGlobal and SetGlobalGlobal bytecodes respectively.

Test

Use the following code to test the above six variable assignments:

local a = 456
a = 123
print(a)
a = a
print(a)
a = g
print(a)
g = 123
print(g)
g = a
print(g)
g = g2
print(g)

Execution is as expected. The specific execution results will no longer be posted.

Complete Assignment Statement

The function of the above variable assignment is very simple, but the complete assignment statement of Lua is very complicated. Mainly manifested in the following two places:

First of all, the left side of the equal sign = now only supports local variables and global variables, but the assignment of table fields is also supported in the complete assignment statement, such as t.k = 123, or the more complex t[f()+g ()] = 123. The above assignment() function is difficult to add table support. For this reason, it is necessary to add an intermediate expression layer, that is, the ExpDesc structure introduced by the subsequent chapter.

Second, the expression following the equal sign = is now divided into 3 categories, for 3 bytecodes. If we want to introduce other types of expressions later, such as upvalue, table index (such as t.k), or operation results (such as a+b), do we have to add a bytecode to each type? The answer is, no. But this will involve some problems that have not been encountered yet, so it is not easy to explain. If not, what needs to be done? This also involves the ExpDesc mentioned above.

We will implement Lua's complete assignment statement in the future, and the current assignment code will be completely discarded at that time.