Binary operations

Compared with the unary operation in the previous section, although the binary operation only has one more operand, it introduces many problems, mainly including BNF left recursion, priority, operand type, and evaluation order, etc.

BNF Left Recursive

The complete syntax of the binary operation statement in Lua is as follows:

exp ::= nil | false | true | Numeral | LiteralString | '...' | functiondef |
prefixexp | tableconstructor | exp binop exp | unop exp

For simplicity, the other parts are simplified to OTHERS, then we get:

exp ::= exp binop exp | OTHERS

It is a left recursion rule, we need to eliminate left recursion according to the method introduced before, and get:

exp ::= OTHERS A'
A' := binop exp A' | Epsilon

The previous exp() function only implemented the OTHERS part of the first line above, and now we need to add the A' part of the second line, which is also a recursive reference, which is implemented using a loop. Modify the exp() function structure as follows:

     fn exp(&mut self) -> ExpDesc {
         // OTHERS
         let mut desc = match self. lex. next() {
             // The original various OTHERS type processing is omitted here
         };

         // A' := binop exp A' | Epsilon
         while is_binop(self. lex. peek()) {
             let binop = self.lex.next(); // operator
             let right_desc = self.exp(); // second operand
             desc = self. process_binop(binop, desc, right_desc);
         }
         desc
     }

Among them, the second operand right_desc is also recursively called exp() function to read, which leads to a problem: priority.

Priority

In the unary operation statement in the previous section, the exp() function is also called recursively to read the operand, but because there is only one operand, so no need for priority. Or we can say that all unary operators have the same priority. And unary operators are right associative. For example, the following two examples of consecutive unary operations are executed in order from right to left, regardless of the specific operator:

~ -10, take negative first, then invert bit by bit,
- ~10, first bitwise invert, then negative.

But for the binary operation statement, it is necessary to consider the priority. For example, the following two statements:

a + b - c, perform the previous addition first, and then perform the subsequent subtraction,
a + b * c, perform the subsequent multiplication first, and then perform the previous addition.

Corresponding to the exp() function code above, the OTHERS part at the beginning reads the first operand a; then reads the operator + in the while loop; and then calls the exp() function recursively to read the right operand, so it needs to be calculated at this time. Also take the above two sentences as an example:

a + b - c, end after reading b and use it as the right operand; then perform addition a + b; and then loop through the following - c part again;
a + b * c, after reading b, continue down, read and execute the entire b * c and use the execution result as the right operand; then perform addition; and end the loop.

     -             +
   /   \         /   \
  +     c       a     *
/   \               /   \
a   b               b   c

So in syntax analysis, how to judge which of the above situations is the case? After reading b, should we stop parsing and calculate addition first, or continue parsing? It depends on the priorities of the next operator and the current operator:

When the priority of the next operator is not greater than the current operator, it is the first case, stop parsing and complete the current operation first;
When the priority of the next operator is greater than the current operator, it is the second case and needs to continue parsing.

For this, refer to the list of all operator precedence in the Lua language:

or
and
<     >     <=    >=    ~=    ==
|
~
&
<<    >>
..
+     -
*     /     //    %
unary operators (not   #     -     ~)
^

From top to bottom, the priority becomes higher. The connectors .. and exponentiation ^ are right associative, and other operators are left associative. In the judging rules listed above, parsing is stopped (instead of continuing parsing) for cases of equal priority, so the default is left associative. Therefore, special treatment is required for two right-associated operators, that is, different priorities are defined for them to the left and to the right, and the one to the left is higher, which will become a right-association.

In summary, define the priority function:

fn binop_pri(binop: &Token) -> (i32, i32) {
    match binop {
        Token::Pow => (14, 13), // right associative
        Token::Mul | Token::Mod | Token::Div | Token::Idiv => (11, 11),
        Token::Add | Token::Sub => (10, 10),
        Token::Concat => (9, 8), // right associative
        Token::ShiftL | Token::ShiftR => (7, 7),
        Token::BitAnd => (6, 6),
        Token::BitNot => (5, 5),
        Token::BitOr => (4, 4),
        Token::Equal | Token::NotEq | Token::Less | Token::Greater | Token::LesEq | Token::GreEq => (3, 3),
        Token::And => (2, 2),
        Token::Or => (1, 1),
        _ => (-1, -1)
    }
}

For Tokens that are not binary operators, -1 is returned, which is the lowest priority, and parsing can be stopped no matter what the current operator is. According to Rust's customary practice, this function should return Option<(i32, i32)> type, and then return None for tokens that are not binary operators. But it is simpler to return -1 at the calling place, and there is no need to process Option one more time.

This function appears to be a property of the Token type, so it seems to be a suitable method defined as Token. But Token type is defined in lex.rs; while priority is a concept of syntax, it should be implemented in parse.rs. The Rust language does not allow methods to be added to a type's non-defining file. So the above function is defined as an ordinary function in the parse.rs file (rather than the method of ParseProto like other functions).

Now, according to the priority, modify the exp() function again:

     fn exp(&mut self) -> ExpDesc {
         self.exp_limit(0)
     }
     fn exp_limit(&mut self, limit: i32) -> ExpDesc {
         // OTHERS
         let mut desc = match self. lex. next() {
             // The original various OTHERS type processing is omitted here
         };

         // A' := binop exp A' | Epsilon
         loop {
             let (left_pri, right_pri) = binop_pri(self. lex. peek());
             if left_pri <= limit {
                 return desc; // stop parsing
             }

             // continue parsing
             let binop = self. lex. next();
             let right_desc = self.exp_limit(right_pri);
             desc = self. process_binop(binop, desc, right_desc);
         }
     }

First, add a limit parameter to exp(), as the priority of the current operator, and limit the subsequent parsing range. However, this parameter belongs to the internal concept of the statement, and the caller of this function does not need to know this parameter; therefore, the actual processing function exp_limit() is added, and exp() is turned into an outer encapsulation function, using limit=0 to call the former. The reason why the initial call uses limit=0 is that 0 is less than any binary operator priority defined in the binop_pri() function, so the first operator will continue to be parsed (rather than return to exit the loop ); but 0 is greater than the priority -1 of the non-operator, so if it is followed by the non-operator, it will also exit normally.

The above parsing code combines loops and recursive calls, which is very difficult for those who are not familiar with the algorithm (like me), and it is difficult to write the complete code directly. However, according to the BNF specification after eliminating left recursion, the loop and recursion can be completed, and then the function can be easily completed according to the priority and conditional exit.

In addition, it should be noted that unary operators are also listed in the operator precedence table above, so when parsing unary operation statements in the previous section, the exp() function cannot be used when reading the operand expression (initial Priority 0), instead specify an initial priority of 12:

    fn exp_unop(&mut self) -> ExpDesc {
        self.exp_limit(12) // 12 is all unary operators' priority
    }

The priority of the exponentiation operation ^ is actually higher than that of the unary operator, so the execution order of the statement -a^10 is: first exponentiation, and then negation.

Evaluation Order

There is a very subtle bug in the parsing code above, which concerns the order in which the operands are evaluated.

The processing of each operand requires 2 steps: first call the exp() function to read the operand and return ExpDesc, and then call the discharge() function to discharge the operand to the stack for bytecode operation. The binary operation has 2 operands, so a total of 4 steps are required. Now discuss the sequence of these 4 steps.

According to the processing logic of the binary operation in the exp() function of the current version:

read the first operand first, desc;
After judging that it is a binary operation, call exp_limit() recursively, and read the second operand, right_desc;
Then discharge the ExpDesc of the above two operands to the stack in the process_binop() function.

Simplified is:

parse the first operand;
parse the second operand;
discharge the first operand;
discharge the second operand.

During the parsing and discharge stages, bytecode may be generated. So in this order, the bytecodes related to the two operands may be interspersed. Like the following example:

local a = -g1 + -g2

Ignoring the previous local variable definition, and ignoring the operation of undefined global variables will throw an exception. Here, the focus is only on the subsequent addition statement. Generates the following bytecode sequence with the current version of the interpreter:

constants: ['g1', 'g2']
byte_codes:
   GetGlobal(0, 0) # parse the first operand
   GetGlobal(1, 1) # parse the second operand
   Neg(2, 0)       # discharge the first operand
   Neg(3, 1)       # discharge the second operand
   Add(0, 2, 3)

It can be seen that the bytecodes related to the two operands are interspersed here. In this example, interleaving is fine. But in some cases, parsing the second operand will affect the evaluation of the first operand, and interleaving will cause problems at this time. Like the following example:

local t = { k = 1 }
local function f(t) t.k = 100; return 2 end -- modify the value of t.k
local r = t.k + f(t)*3

For the last sentence, we expected 1 + 2*3, but if we follow the current order of evaluation:

First parse the left operand t.k to generate ExpDesc::IndexField, but not discharge;
Then parse the right operand f(t)*2, and execute f(t) during the parsing process, thus modifying the value of t.k to 100;
Then discharge the left operationNumber, generate GetField bytecode, but at this time t.k has been modified by the previous step! Here comes the error. What is actually executed is 100 + 2*3.

In summary, we need to ensure that the bytecodes of the two operands cannot be interspersed! Then modify the exp_limit() function as follows:

     fn exp_limit(&mut self, limit: i32) -> ExpDesc {
         // The original various OTHERS type processing is omitted here

         loop {
             // Omit the processing of judging the priority

             // discharge the first operand! ! !
             if !matches!(desc, ExpDesc::Integer(_) | ExpDesc::Float(_) | ExpDesc::String(_)) {
                 desc = ExpDesc::Local(self. discharge_top(desc));
             }

             // continue parsing
             let binop = self. lex. next();
             let right_desc = self.exp_limit(right_pri); // parse the second operand
             desc = self. process_binop(binop, desc, right_desc);
         }
     }

Discharge the first operand onto the stack before parsing the second operand. However, this is not necessary for constant types, because:

the constant will not be affected by the second operand as in the above example;
Constants are also to be directly folded in subsequent attempts.

So far, the transformation of exp_limit() function for binary operation syntax analysis has been completed. As for the specific processing process_binop() function of the binary operation, it is introduced below.

Bytecode

The unary operation introduced in the previous section has only one operand, which can be divided into two cases: constants and variables. Constants are evaluated directly, and variables generate bytecodes. So each unary operation has only one bytecode. Binary operations are more complicated because they involve 2 operands.

First of all, although binary operators are mostly numerical calculations, because Lua's metatable is similar to operator overloading, other types of constants (such as strings, bools, etc.) may be legal operands. When parsing unary operations, these types of constants will directly report an error, but for binary operations, it needs to be executed at the execution stage to determine whether it is legal.

Secondly, if both operands are constants of numeric type (integer and floating point), then the result can be directly calculated during syntax analysis, which is called constant folding.

Otherwise, bytecode is generated and executed by the virtual machine. Similar to Read Global Variables and Read Table operations that have been supported before, each binary operator is also set to 3 types of right operands: variables on the stack, constants, and small integers.

The left operand is uniformly discharged to the stack, because it is rare for the left operand to be a constant. If we also add corresponding bytecodes for constants and small integer types, such as 10-a, then there are too many bytecode types.

Finally, for addition and multiplication that satisfy the commutative law, if the left operation is a constant, then it can be exchanged. For example, 10+a can be converted to a+10 first. Since the right operand 10 is a small integer, it can be use AddInt bytecode then.

ExpDesc

Similar to the new ExpDesc type introduced by the unary operation introduced in the previous section, the binary operation also needs a new type because it has one more operand:

enum ExpDesc {
     UnaryOp(fn(u8,u8)->ByteCode, usize), // (opcode, operand)
     BinaryOp(fn(u8,u8,u8)->ByteCode, usize, usize), // (opcode, left-operand, right-operand)

Syntax analysis

So far, the basic requirements of the binary operation statement have been introduced. Let's look at the code implementation, that is, the process_binop() function called in the exp() function:

     fn process_binop(&mut self, binop: Token, left: ExpDesc, right: ExpDesc) -> ExpDesc {
         if let Some(r) = fold_const(&binop, &left, &right) { // constant fold
             return r;
         }

         match binop {
             Token::Add => self.do_binop(left, right, ByteCode::Add, ByteCode::AddInt, ByteCode::AddConst),
             Token::Sub => self.do_binop(left, right, ByteCode::Sub, ByteCode::SubInt, ByteCode::SubConst),
             Token::Mul => self.do_binop(left, right, ByteCode::Mul, ByteCode::MulInt, ByteCode::MulConst),
             // omit more types
         }
     }

Try constant folding first. This part of the function is introduced in the next section because it involves the processing of integer and floating point types. Because the two operands are not necessarily constants, they may not be able to be folded. If the fold is not successful, then the operator and the two operands will be used later, so the fold_const() function here can only pass in references.

If it is not a constant and cannot be folded, then call the do_binop() function to return ExpDesc. Here, the enum tag is used as a function, which has been introduced before, and will not be introduced here.

Let's look at the do_binop() function:

    fn do_binop(&mut self, mut left: ExpDesc, mut right: ExpDesc, opr: fn(u8,u8,u8)->ByteCode,
            opi: fn(u8,u8,u8)->ByteCode, opk: fn(u8,u8,u8)->ByteCode) -> ExpDesc {

        if opr == ByteCode::Add || opr == ByteCode::Mul { // commutative
            if matches!(left, ExpDesc::Integer(_) | ExpDesc::Float(_)) {
                // swap the left-const-operand to right, in order to use opi/opk
                (left, right) = (right, left);
            }
        }

        let left = self.discharge_top(left);

        let (op, right) = match right {
            ExpDesc::Integer(i) =>
                if let Ok(i) = u8::try_from(i) {
                    (opi, i as usize)
                } else {
                    (opk, self.add_const(i))
                }
            ExpDesc::Float(f) => (opk, self.add_const(f)),
            _ => (opr, self.discharge_top(right)),
        };

        ExpDesc::BinaryOp(op, left, right)
    }

First, judge if it is addition or multiplication, and the left operand is a numeric constant, then exchange the two operands, so that the bytecode of xxCoust or xxInt can be generated later.

Then, discharge the left operand onto the stack;

Then, judge whether the type of the right operand is a numeric constant, or discharge it to the stack.

Finally, ExpDesc::BinaryOp is generated.

So far, the grammatical analysis of the binary operation statement is basically completed.

Integer and Float

So far, we have introduced the general analysis process of binary operations, but there is still a detail, that is, the different processing rules for integer and floating point types. Since there is a lot of content in this aspect, and it is relatively independent from the above-mentioned main analysis process, it will be introduced separately in the next section.

Build a Lua Interpreter in Rust