numerical-for Statement

Lua's for statement supports two types:

  • Numeric: for Name '=' exp ',' exp [',' exp] do block end
  • Generics: for namelist in explist do block end

Generic-for requires function support, and it will be implemented after introducing functions in the next chapter. This section implements numeric-for. It can be seen from the BNF definition that the first two tokens of the two types are the same, and the third token of the numeric type is =. By this distinction two types can be distinguished:

     fn for_stat(&mut self) {
         let name = self. read_name();
         if self.lex.peek() == &Token::Assign {
             self.for_numerical(name); // numerical
         } else {
             todo!("generic for"); // generic
         }
     }

Control Structure

The semantics of the numerical-for statement is obvious. The three expressions after the equal sign = are the initial value init, the limit, and the step. step can be positive or negative, but not 0. The control structure diagram is as follows (assuming step>0 in the diagram):

     +--------------+
/--->| i <= limit ? |--No--\ jump to the end if exceed limit
|    +--------------+      |
|                          |
|        block             |
|                          |
|    +-----------+         |
\----| i += step |         |
     +-----------+         |
         <-----------------/

The execution logic in the boxes can be implemented with 1 bytecode respectively, so 2 bytecodes must be executed in each loop: first i+=step, and then judge i<=limit. For performance, the judgment function of the first bytecode can also be added to the bottom bytecode, so that only one bytecode is executed each loop. The control structure diagram is as follows:

       +--------------+
       | i <= limit ? |--No--\ jump to the end if exceed limit
       +--------------+      |
/------>                     |
|       block                |
|                            |
|       +--------------+     |
|       | i += step    |     |
\--Yes--| i <= limit ? |     |
        +--------------+     |
            <----------------/

Add 2 new bytecodes:

pub enum ByteCode {
     // for-loop
     ForPrepare(u8, u16),
     ForLoop(u8, u16),

These two bytecodes correspond to the bytecodes of the two boxes in the above figure, and the two associated parameters are the stack start position and jump position respectively. Later, we will see that the first bytecode needs to do other preparations besides judging the jump, so it is called prepare.

Variable Storage

The first parameter associated with the above two bytecodes is the starting position of the stack. To be precise, it is the location where the above three values (init, limit, step) are stored. These 3 values naturally need to be stored on the stack, because one of the functions of the stack is to store temporary variables, and because there is no other place available. The 3 values are stored sequentially, so only one parameter is needed to locate 3 values.

In addition, the for statement also has a control variable, which can reuse the position on the stack of init. During parsing, create an internal temporary variable whose name is Name in BNF, pointing to the position of the first variable on the stack. In order to keep the positions of the other 2 temporary variables from being occupied, 2 more anonymous local variables need to be created. Therefore, the stack at execution time is as follows:

      |        |
sp    +--------+
      | init/i |  control variable Name
sp+1  +--------+
      | limit  |  anonymous variable ""
sp+2  +--------+
      | step   |  anonymous variable ""
      +--------+
      |        |

The numerical-for statement is special only in the above three temporary variables, and the rest is similar to the control structure introduced before, which is nothing more than jumping according to the conditional judgment statement. The syntax analysis code is as follows:

     fn for_numerical(&mut self, name: String) {
         self.lex.next(); // skip `=`

         // Read 3 expressions: init, limit, step (default is 1), and place
         // them on the stack in turn
         match self.explist() {
             2 => self.discharge(self.sp, ExpDesc::Integer(1)),
             3 => (),
             _ => panic!("invalid numerical-for exp"),
         }

         // Create 3 local variables to occupy the position on the stack.
         // Subsequent if the internal block needs local or temporary variables,
         // The position after these 3 variables on the stack will be used.
         self.locals.push(name); // control variable, can be referenced in internal block
         self.locals.push(String::from("")); // anonymous variable, purely for placeholder
         self.locals.push(String::from("")); // Same as above

         self.lex.expect(Token::Do);

         // Generate ForPrepare bytecode
         self.byte_codes.push(ByteCode::ForPrepare(0, 0));
         let iprepare = self.byte_codes.len() - 1;
         let iname = self.sp - 3;

         self. push_loop_block();

         // inner block
         assert_eq!(self. block(), Token::End);

         // delete 3 temporary variables
         self. locals. pop();
         self. locals. pop();
         self. locals. pop();

         // Generate ForLoop bytecode and fix the previous ForPrepare
         let d = self.byte_codes.len() - iprepare;
         self.byte_codes.push(ByteCode::ForLoop(iname as u8, d as u16));
         self.byte_codes[iprepare] = ByteCode::ForPrepare(iname as u8, d as u16);

         self.pop_loop_block(self.byte_codes.len() - 1);
     }

Integer and Float-point Types

The previously supported control statements(such as if, while) mainly introduce the syntax analysis part; while the virtual machine execution part only performs simple operations on the stack according to the bytecode. However, the syntax analysis part of the numerical-for loop is relatively simple (mainly because it is similar to the previous control structures), while the virtual machine execution part is very complicated. In fact, it is not difficult, it is just cumbersome. The reason is that Lua supports 2 numeric types, integers and floats. There are a total of 3 expressions (or called variables) in the numeric-for statement, init, limit, and step, each of which may be one of two types, and there are 8 possibilities in total. Although in some cases the type of some variables (such as constants) can be determined in the syntax analysis stage, it is of little significance to deal with this special case alone, and finally it is necessary to deal with all three variables of unknown type situation, which needs to be handled during the execution phase of the virtual machine.

It is too complicated to deal with 8 types one by one; and they cannot be completely classified into one type, because the representation ranges of integers and floating-point numbers are different. In this regard, the Lua language regulations is divided into two categories:

  • If init and step are integers, then treat them as integers;
  • Otherwise, handle them as floating point numbers.

As for why the second limit variable is not considered in the first category, it is not clear. I think there are some possible reasons, but I'm not sure about them, so I won't discuss them here. It can be realized according to the regulations of Lua. But it does introduce some complications.

Somewhere the 8 possibilities need to be grouped into the 2 types above. It can't be done in the syntax analysis phase, and it is too costly to perform each time the loop is executed, so it is classified once at the beginning of the loop. This is what the ForPrepare bytecode does:

  • If init and step are integers, then convert limit to an integer;
  • Otherwise, convert all 3 variables to floats.

In this way, each time the loop is executed, that is, the ForLoop bytecode, only two cases need to be handled.

It is easy to convert integers to floating-point numbers in the second category, but to convert the floating-point limit to integers in the first category, you must pay attention to the following two points:

  • If step is positive, limit is rounded down; if step is negative, limit is rounded up.
  • If the limit exceeds the representation range of the integer, then it is converted to the maximum or minimum value of the integer. There is an extreme situation here, such as step is negative, init is the maximum value of an integer, and limit exceeds the maximum value of an integer, then init is smaller than limit, and because Lua clearly stipulates that the control variable of the numerical-for loop will not overflow and reverse, So the expectation is that the loop will not be executed. But according to the above conversion, limit is converted to the maximum value because it exceeds the maximum value of the integer, which is equal to init, and 1 cycle will be executed. Therefore, for special treatment, you can set init and limit to 0 and 1 respectively, so that the loop will not be executed.

The specific code for limit variable conversion is as follows:

fn for_int_limit(limit: f64, is_step_positive: bool, i: &mut i64) -> i64 {
     if is_step_positive {
         if limit < i64::MIN as f64 {
             *i = 0; // Modify init together to ensure that the loop will not be executed
             -1
         } else {
             limit.floor() as i64 // round down
         }
     } else {
         if limit > i64::MAX as f64 {
             *i = 0;
             1
         } else {
             limit.ceil() as i64 // round up
         }
     }
}

Virtual Machine Execution

After introducing the above integer and floating-point number types and conversion details, the next step is to implement the virtual machine execution part of the two bytecodes.

The ForPrepare bytecode does two things: first, it is divided into integer and floating point type loops according to the variable type; Then compare init and limit to determine whether to execute the first cycle. code show as below:

                ByteCode::ForPrepare(dst, jmp) => {
                    // clear into 2 cases: integer and float
                    // stack: i, limit, step
                    if let (&Value::Integer(mut i), &Value::Integer(step)) =
                            (&self.stack[dst as usize], &self.stack[dst as usize + 2]) {
                        // integer case
                        if step == 0 {
                            panic!("0 step in numerical for");
                        }
                        let limit = match self.stack[dst as usize + 1] {
                            Value::Integer(limit) => limit,
                            Value::Float(limit) => {
                                let limit = for_int_limit(limit, step>0, &mut i);
                                self.set_stack(dst+1, Value::Integer(limit));
                                limit
                            }
                            // TODO convert string
                            _ => panic!("invalid limit type"),
                        };
                        if !for_check(i, limit, step>0) {
                            pc += jmp as usize;
                        }
                    } else {
                        // float case
                        let i = self.make_float(dst);
                        let limit = self.make_float(dst+1);
                        let step = self.make_float(dst+2);
                        if step == 0.0 {
                            panic!("0 step in numerical for");
                        }
                        if !for_check(i, limit, step>0.0) {
                            pc += jmp as usize;
                        }
                    }
                }

The ForLoop bytecode also does two things: first, add step to the control variable; then compare the control variable and limit to determine whether to execute the next loop. The code is omitted here.

So far, we have completed the numeric-for statement.