ExpDesc Rewrite

The previous section introduced the concept of ExpDesc and introduced its function and role. This section is based on ExpDesc to modify the existing code. This transformation does not support new features, but only lays the foundation for the reading and writing functions of the next table and more features to follow.

First of all, the most important thing is the function load_exp() for parsing expressions. This function originally generated bytecode directly from Token. Now it needs to be split into two steps: Token to ExpDesc, and generating bytecode from ExpDesc. Then, on this basis, transform the table constructor and variable assignment statement.

exp()

Transform the load_exp() function step 1, Token to ExpDesc, create a new exp() function, the code is as follows:

     fn exp(&mut self) -> ExpDesc {
         match self. lex. next() {
             Token::Nil => ExpDesc::Nil,
             Token::True => ExpDesc::Boolean(true),
             Token::False => ExpDesc::Boolean(false),
             Token::Integer(i) => ExpDesc::Integer(i),
             Token::Float(f) => ExpDesc::Float(f),
             Token::String(s) => ExpDesc::String(s),
             Token::Name(var) => self. simple_name(var),
             Token::CurlyL => self. table_constructor(),
             t => panic!("invalid exp: {:?}", t),
         }
     }

     fn simple_name(&mut self, name: String) -> ExpDesc {
         // search reversely, so new variable covers old one with same name
         if let Some(ilocal) = self.locals.iter().rposition(|v| v == &name) {
             ExpDesc::Local(ilocal)
         } else {
             ExpDesc::Global(self. add_const(name))
         }
     }

It is relatively simple, similar to the main structure of the previous load_exp() function, or even simpler, that is, several Token types supported by the expression statement are converted into the corresponding ExpDesc. Among them, Name and table construction need further processing. Name is to be distinguished from a local variable or a global variable by the simple_name() function. The processing of the table construction branch becomes a lot more reasonable, [before] (./ch04-02.table_constructor.md#other scenes) need to add an ugly return in this branch, now because this function does not generate bytes code, so this branch can also end naturally. However, although bytecode is no longer needed, ExpDesc is required, so the table constructor table_constructor() needs to return an ExpDesc. Because the newly created table is finally put on the stack, it returns ExpDesc::Local(i). Note that the ExpDesc::Local type does not just represent "local variables", but "variables on the stack". The name "Local" is used to be consistent with the official Lua code.

In addition to not generating bytecode, this function has another change compared with load_exp(), that is, there is no dst parameter. In most cases, it is fine, but there is a problem with the constructor of the table. Because the table construction process is to create a table on the stack first, the bytecode generated by the subsequent initialization statement needs to bring the index of the table on the stack as a parameter. For example SetTable 3 4 5, the first parameter is the index of the table on the stack. So the original table_constructor() function needs a dst parameter. Now there is no such parameter, what should I do? We can assume that all table constructions create new tables at the top of the stack. So it is necessary to maintain the current top position of the stack.

Stack Top `sp`

To maintain the current stack top position, first add sp indicating the current stack top in ParseProto. In the past, the current position of the top of the stack was calculated in real time wherever it was needed, but now it is changed to a global variable, and many places are suddenly coupled. Later, as the characteristics increase, this coupling will become larger and larger, and it will become more and more out of control. But it is too cumbersome to pass the top position of the stack through parameters. In comparison, it is more convenient to maintain a global stack top delegate, but be careful.

The stack has three functions: function calls, local variables, and temporary variables. The first two have specific statements (function call statements and local variable definition statements) for specific processing. The last one, temporary variables, are used in many places, such as the table construction statement mentioned above, so they need to be carefully managed when they are used, and they cannot affect each other. In addition, because local variables also occupy the stack space, before parsing a statement each time, the value of sp on the top of the stack is initialized to the number of current local variables, which is where temporary variables are allowed to be used.

Let's look at the use of the sp in the table constructor table_constructor():

     fn table_constructor(&mut self) -> ExpDesc {
         let table = self.sp; // Create a new table at the top of the stack
         self.sp += 1; // update sp, if subsequent statements need temporary variables, use the stack position behind the table

         // omit intermediate construction code

         self.sp = table + 1; // Before returning, set the sp on the top of the stack, keep only the newly created table, and clean up other temporary variables that may be used during construction
         ExpDesc::Local(table) // return the type of the table (temporary variable on the stack) and the position on the stack
      }

Use the sp at the beginning of the function to replace the dst parameter passed in the previous version as the location of the new table. Before the function ends, reset the top position of the stack. In the following subsections, we will continue to introduce the use of the sp of the stack when this function actually builds the table.

discharge()

The second step of transforming the load_exp() function is from ExpDesc to bytecode. In fact, it is more accurate to say that ExpDesc is loaded onto the stack. We use the function name discharge in the official Lua code to represent "loading".

     // discharge @desc into @dst, and set self.sp=dst+1
     fn discharge(&mut self, dst: usize, desc: ExpDesc) {
         let code = match desc {
             ExpDesc::Nil => ByteCode::LoadNil(dst as u8),
             ExpDesc::Boolean(b) => ByteCode::LoadBool(dst as u8, b),
             ExpDesc::Integer(i) =>
                 if let Ok(i) = i16::try_from(i) {
                     ByteCode::LoadInt(dst as u8, i)
                 } else {
                     self. load_const(dst, i)
                 }
             ExpDesc::Float(f) => self. load_const(dst, f),
             ExpDesc::String(s) => self. load_const(dst, s),
             ExpDesc::Local(src) =>
                 if dst != src {
                     ByteCode::Move(dst as u8, src as u8)
                 } else {
                     return;
                 }
             ExpDesc::Global(iname) => ByteCode::GetGlobal(dst as u8, iname as u8),
         };
         self.byte_codes.push(code);
         self.sp = dst + 1;
     }

This function is also very simple. Generate the corresponding bytecode according to ExpDesc, and discharge the expression statement represented by ExpDesc onto the stack. Note that the last line of this function updates the top position of the stack to the next position of dst. In most cases, it is as expected. If it is not as expected, the caller needs to update the top position of the stack after the function returns.

In addition to this most basic function, there are several helper functions. The discharge() function is to force the expression discharge to the dst position of the stack. But sometimes you just want to discharge the expression on the stack. If the expression is already on the stack, such as ExpDesc::Local type, then you don’t need to discharge it. A new function discharge_if_need() is introduced for this purpose. In most cases, it doesn't even care where it is loaded, so create a new function discharge_top(), using the top position of the stack. The two function codes are as follows:

    // discharge @desc into the top of stack, if need
    fn discharge_top(&mut self, desc: ExpDesc) -> usize {
        self.discharge_if_need(self.sp, desc)
    }

    // discharge @desc into @dst, if need
    fn discharge_if_need(&mut self, dst: usize, desc: ExpDesc) -> usize {
        if let ExpDesc::Local(i) = desc {
            i // no need
        } else {
            self.discharge(dst, desc);
            dst
        }
    }

In addition, the discharge_const() function is added, and several constant types are added to the constant table, and other types are discharged as needed. This function will be used in the construction and assignment statements of the following tables:

    // for constant types, add @desc to constants;
    // otherwise, discharge @desc into the top of stack
    fn discharge_const(&mut self, desc: ExpDesc) -> ConstStack {
        match desc {
            // add const
            ExpDesc::Nil => ConstStack::Const(self.add_const(())),
            ExpDesc::Boolean(b) => ConstStack::Const(self.add_const(b)),
            ExpDesc::Integer(i) => ConstStack::Const(self.add_const(i)),
            ExpDesc::Float(f) => ConstStack::Const(self.add_const(f)),
            ExpDesc::String(s) => ConstStack::Const(self.add_const(s)),

            // discharge to stack
            _ => ConstStack::Stack(self.discharge_top(desc)),
        }
    }

After completing the exp() and discharge() functions, the previous load_exp() function can be combined with these two new functions:

     fn load_exp(&mut self) {
         let sp0 = self.sp;
         let desc = self. exp();
         self. discharge(sp0, desc);
     }

At the end of this chapter, the parsing of expressions in the parsing process will directly call a series of functions of exp() and discharge, instead of calling the load_exp() function.

table_constructor()

After splitting the load_exp() function into exp() and discharge(), the constructor of the table can be transformed. Or take general-purpose initialization as an example, in previous version, the key and value are directly loaded onto the stack, no matter what type. We can now call exp() to read the key and value, and then do different processing according to the type. The specific processing method can refer to the official implementation of Lua. There are three bytecodes SETTABLE, SETFIELD and SETI, corresponding to the three types of key variables on the stack, string constants, and small integer constants. In addition, these 3 bytecodes have 1 bit to mark whether the value is a variable or a constant on the stack. There are 3 key types and 2 value types, a total of 3*2=6 situation. Although we can also distinguish between variables and constants on the stack by reserving a bit in value, this will result in only 7bit address space. So we still distinguish variables and constants on the stack by adding bytecode types. It ends up as follows:

    value\key | variable      | string constant | small integer constant
   -----------+---------------+-----------------+---------------
    variable  | SetTable      | SetField        | SetInt
   -----------+---------------+-----------------+---------------
    constant  | SetTableConst | SetFieldConst   | SetIntConst

Another rule is that nil and Nan of floating-point numbers are not allowed to be used as keys. The parsing code for the key is as follows:

     let entry = match self. lex. peek() {
         Token::SqurL => { // `[` exp `]` `=` exp
             self. lex. next();

             let key = self.exp(); // read key
             self.lex.expect(Token::SqurR); // `]`
             self.lex.expect(Token::Assign); // `=`

             TableEntry::Map(match key {
                 ExpDesc::Local(i) => // variables on the stack
                     (ByteCode::SetTable, ByteCode::SetTableConst, i),
                 ExpDesc::String(s) => // string constant
                     (ByteCode::SetField, ByteCode::SetFieldConst, self. add_const(s)),
                 ExpDesc::Integer(i) if u8::try_from(i).is_ok() => // small integer
                     (ByteCode::SetInt, ByteCode::SetIntConst, i as usize),
                 ExpDesc::Nil =>
                     panic!("nil can not be table key"),
                 ExpDesc::Float(f) if f.is_nan() =>
                     panic!("NaN can not be table key"),
                 _ => // For other types, discharge them onto the stack uniformly and turn them into variables on the stack
                     (ByteCode::SetTable, ByteCode::SetTableConst, self.discharge_top(key)),
             })
         }

The above code handles three types of keys: local variables, string constants, and small integers. In addition, nil and floating-point numbers Nan are prohibited. For other types, they are uniformly discharged to the top of the stack and converted to variables on the stack.

Then parse the value to distinguish between variables and constants on the stack. code show as below:

     match entry {
         TableEntry::Map((op, opk, key)) => {
             let value = self.exp(); // read value
             let code = match self. discharge_const(value) {
                 // value is a constant, use opk, such as `ByteCode::SetTableConst`
                 ConstStack::Const(i) => opk(table as u8, key as u8, i as u8),

                 // value is not a constant, then discharge to the stack, and use op, such as `ByteCode::SetTable`
                 ConstStack::Stack(i) => op(table as u8, key as u8, i as u8),
             };
             self.byte_codes.push(code);

             nmap += 1;
             self.sp = sp0;
         }

The logic of the above two pieces of code itself is very clear, but the parameter types associated with TableEntry::Map are somewhat special. The first piece of code deals with the type of the key, and determines the type of 2 bytecodes, or the tag of ByteCode. This tag is to be used as an associated parameter of TableEntry::Map. Then what type is that? It must not be ByteCode, because the enum type includes not only the tag, but also the associated value. If it is a ByteCode type, then it is not ByteCode::SetTable but a complete ByteCode::SetTable(table,key,0), that is, first generate a complete bytecode, and then read Modify the bytecode when the value is reached. That would be too complicated.

《Rust Programming Language》 introduces these enums with () as initialization syntax, looks like a function call, they are indeed implemented as functions returning an instance constructed from parameters. That is to say ByteCode::SetTable can be regarded as a function, and its parameter type is fn(u8,u8,u8)->ByteCode. When I read this book for the first time, I was confused by the countless new concepts in it, so I had no impression of reading this sentence at all, and even if I saw it, I couldn’t understand it or remember it. When I wrote this project for more than half, I read this book completely again. This time, I understood most of the concepts in it very smoothly, and I can also notice the introduction of function pointers. And it just happened to work, what a find!

Build a Lua Interpreter in Rust

ExpDesc Rewrite

exp()

Stack Top sp

discharge()

table_constructor()

Stack Top `sp`