`if` statement

The biggest difference between the conditional judgment statement and the previously implemented statement is that the bytecode is no longer executed sequentially, and jumps may occur. To this end, we add a new bytecode Test, associated with 2 parameters:

The first parameter, u8 type, determines the location of the condition on the stack;
The second parameter, u16 type, the number of bytecodes to jump forward.

The semantics of this bytecode is: if the statement represented by the first parameter is false, then jump forward to the bytecode of the number specified by the second parameter. The control structure diagram is as follows:

+-------------------+
| if condition then |---\ skip the block if $condition is false
+-------------------+   |
                        |
    block               |
                        |
+-----+                 |
| end |                 |
+-----+                 |
<-----------------------/

The definition of Test bytecode is as follows:

pub enum ByteCode {
     // condition structures
     Test(u8, u16),

The second parameter is the number of bytecodes to jump to, that is, the relative position. If absolute positions are used, the code to parse and execute is slightly simpler, but less expressive. The range of 16bit is 65536. If absolute position is used, the code beyond 65536 in a function cannot use jump bytecode. And if you use the relative position, then it supports jumping within the range of 65536 of the bytecode itself, and you can support very long functions. So we use relative positions. This also introduces a problem that has been ignored, which is the range of parameters in the bytecode. For example, the stack index parameters are all of the u8 type, so if there are more than 256 local variables in a function, it will overflow and cause bugs. In the follow-up, the range of parameters needs to be specially dealt with.

According to the above control structure diagram, the syntax analysis code for completing the if statement is as follows:

     fn if_stat(&mut self) {
         let icond = self.exp_discharge_top(); // read condition statement
         self.lex.expect(Token::Then); // `then` keyword

         // generate `Test` placeholder, and the 2 parameters will be added later
         self.byte_codes.push(ByteCode::Test(0, 0));
         let itest = self.byte_codes.len() - 1;

         // parse the block! And it is expected to return the `end` keyword,
         // does not support `elseif` and `else` branches temporarily.
         assert_eq!(self. block(), Token::End);

         // Fix Test bytecode parameter.
         // `iend` is the current position of the bytecode sequence,
         // `itest` is the position of the Test bytecode, and the difference
         // between the two is the number of bytecodes that need to be jumped.
         let iend = self.byte_codes.len() - 1;
         self.byte_codes[itest] = ByteCode::Test(icond as u8, (iend - itest) as u16);
     }

The code flow has been explained line by line in the comments. What needs to be explained in detail here is the block() function called recursively.

End of Block

The original block() function is actually the entry point of the entire syntax analysis, which is executed only once (without recursive calls), and reads to the end of the source code Token::Eos as the end:

     fn block(&mut self) {
         loop {
             match self. lex. next() {
                 // Other statement parsing is omitted here
                 Token::Eos => break, // Eos exits
             }
         }
     }

The expected end of the code block in the if statement to be supported is the keyword end; other keywords such as elseif and else will be included in the future. The end of the code block is not just Token::Eos, we need to modify the block() function, and consider the Token that is not the beginning of a legal statement (such as Eos, keyword end, etc.) as a block End, and it is up to the caller to determine whether it is the expected end. There are 2 ways to modify the specific code:

Use lex.peek() instead of lex.next() in the above code. If the Token you see is not the beginning of a legal statement, exit the loop. At this time, the Token has not been read by consumption. The external caller then calls lex.next() to read the Token for judgment. If this is done, then all the current statement processing codes must add a lex.next() at the very beginning to skip the seen Token, which is more verbose. For example, in the if_stat() function in the previous paragraph, it is necessary to use lex.next() to skip the keyword if.
Still use lex.next(), for the Token that is not read at the beginning of a legal statement, it will be returned to the caller as the function return value. We adopt this method, the code is as follows:

     fn block(&mut self) -> Token {
         loop {
             match self. lex. next() {
                 // Other statement parsing is omitted here
                 t => break t, // return t
             }
         }
     }

So in the if_stat() function above, it is necessary to judge the return value of block() as Token::End:

         // parse syntax block! And it is expected to return the end keyword, temporarily does not support elseif and else branches
         assert_eq!(self. block(), Token::End);

The original syntax analysis entry function chunk() also needs to increase the judgment of the return value of block():

     fn chunk(&mut self) {
         assert_eq!(self. block(), Token::Eos);
     }

Variable Scope in Block

Another area of the block() function that needs to be changed is the scope of local variables. That is, local variables defined inside the block are not visible outside.

This feature is very core! But the implementation is very simple. Just record the number of current local variables at the entry of block(), and then clear the newly added local variables before exiting. code show as below:

     fn block(&mut self) -> Token {
         let nvar = self.locals.len(); // record the original number of local variables
         loop {
             match self. lex. next() {
                 // Other statement parsing is omitted here
                 t => {
                     self.locals.truncate(nvar); // invalidate local variables defined inside the block
                     break t;
                 }
             }
         }
     }

After the Upvalue is introduced later, other processing is required.

`do` Statement

The above two subsections deal with the problem of blocks. The simplest statement to create a block is the do statement. Because it is too simple, we introduce it here by the way. The syntax analysis code is as follows:

     // BNF:
     // do block end
     fn do_stat(&mut self) {
         assert_eq!(self. block(), Token::End);
     }

Virtual Machine Execution

The previous virtual machine execution was to execute bytecodes sequentially, and use Rust's for statement to loop through:

     pub fn execute<R: Read>(&mut self, proto: &ParseProto<R>) {
         for code in proto.byte_codes.iter() {
             match *code {
                 // All bytecode pre-defined logic is omitted here
             }
         }
     }

Now to support the jump of the Test bytecode, it is necessary to be able to modify the position of the next traversal during the loop traversal of the bytecode sequence. Rust's for statement does not supported modifies the traversal position during the loop, so we need to manually control the loop:

     pub fn execute<R: Read>(&mut self, proto: &ParseProto<R>) {
         let mut pc = 0; // bytecode index
         while pc < proto.byte_codes.len() {
             match proto.byte_codes[pc] {
                 // The pre-defined logic of other bytecodes is omitted here

                 // condition structures
                 ByteCode::Test(icond, jmp) => {
                     let cond = &self. stack[icond as usize];
                     if matches!(cond, Value::Nil | Value::Boolean(false)) {
                         pc += jmp as usize; // jump if false
                     }
                 }
             }

             pc += 1; // next bytecode
         }
     }

Loop execution is controlled by the bytecode location pc. After all bytecodes are executed, pc will be incremented by 1, pointing to the next bytecode; for the jump bytecode Test, pc will be modified additionally. Since Test bytecode will also execute pc auto-increment at the end, so its jump position is actually the target address minus 1. In fact, you can add a continue; statement here to skip the last auto-increment of pc. I don't know which of these two approaches is better.

As can be seen from the judgment of the above code, there are only two false values in the Lua language: nil and false. Other values, such as 0, empty table, etc., are all true values.

Test

So far we have implemented the simplest if statement.

Since we do not yet support relational operations, the judgment condition after if can only use other statements. The test code is as follows:

if a then
    print "skip this"
end
if print then
    local a = "I am true"
    print(a)
end

print (a) -- should be nil

The conditional statement a in the first judgment statement is an undefined global variable, the value is nil, which is false, so the internal statement is not executed.

The conditional statement print in the second judgment statement is a defined global variable and is true, so the internal statement will execute. The local variable a is defined inside the block, which is executed normally inside, but after the end of the block, a is invalid, and then it is used as an undefined global variable, and the print is nil.

Build a Lua Interpreter in Rust

if statement