Relational Operations in Conditional Judgment

The previous two sections describe logical operations, and the next two describe relational operations.

Relational operations, that is, compares, have 6 operators: equal, not equal, greater than, less than, greater than or equal to, less than or equal to. When introducing logical operations in the previous two sections, it was said that logical operations cannot use the analysis process of binary numerical operations in Chapter 5 because of the short-circuit feature. The relational operations did not use the parsing process in Chapter 5, for a different reason: for performance.

If performance is not considered, relational operations can use the parsing process in Chapter 5. For example, for the equal operation, the following bytecode can be generated: EQ $r $a $b, that is, compare a and b, and assign the Boolean result to r. If performance is to be considered, it depends on the application scenarios of relational operations. This part is almost the same as the logical operations introduced in the previous two sections, and there are also two application scenarios:

  1. As a judgment condition, such as the judgment condition statement in the if, while and other statements in the previous chapter, such as if a == b then ...;
  2. Evaluation, such as print(a == b).

Like logical operations, the first scenario can be regarded as a simplified version of the second scenario. It does not require specific evaluation, but only needs to judge whether it is true or false. For example, the example of the if statement above can also be interpreted according to the second scenario. It is considered that a == b is first evaluated to a temporary variable, and then it is judged whether the temporary variable is true to decide whether to jump. Temporary variables can be omitted here! Since most applications of relational computing are in the first scenario, it is worthwhile to separate this scenario from the second general scenario for special optimization, by omitting temporary variables and directly judging whether to jump based on the evaluation result.

As the title of this section indicates, this section only introduces the first scenario; the next section will introduce the second scenario.

Bytecode

Still using the if statement and the equal operation as an example, in the if a == b then ... end scenario, the first bytecode sequence that comes to mind is as follows:

EQ $tmp $a $b    # Compare whether a and b are equal, and the result is stored in a temporary variable
TEST $tmp $jmp   # Determine whether to jump according to the temporary variable

Now save the temporary variable $tmp and merge the two bytecodes, as follows:

EQ $a $b $jmp   # Compare whether a and b are equal to decide whether to jump

But the problem is that this requires 3 parameters, leaving only 1 byte of space for the last jump parameter, indicating that the range is too small. For this reason, it can be split into 2 bytecodes:

EQ $a $b      # Determine whether a and b are equal, if they are equal, skip the next statement, ie pc++
JUMP $jmp     # unconditional jump

In this way, 2 bytes can be used to represent the jump parameter. However, since 2 bytecodes are still needed, what is the difference from the original "EQ+TEST" scheme? Why make it so complicated?

  • When the virtual machine is executing, if it is judged that a and b are equal and the following JUMP bytecode is skipped, then only 1 bytecode is executed; while the original "EQ+TEST" scheme always executes 2 bytes code. I don’t know the probability that the if statement is true, but the probability of the while statement is true is still very high, so this is equivalent to saving the execution of 1 bytecode with a high probability;

  • Even if the judgment is false and the following JUMP bytecode needs to be executed, then the next bytecode can be read directly when the EQ bytecode is executed, without having to go through another instruction distribution. The JUMP bytecode here is equivalent to an extended parameter of the EQ bytecode, rather than an independently executed bytecode. This is what Lua's official implementation does. This is also because the type of bytecode can be ignored in C language, and the parameters in the bytecode can be directly read through bit operations. But in the Rust language, if unsafe is not used, the enum tag cannot be ignored and the parameters can be read directly, so this optimization cannot be implemented in our interpreter.

  • We can directly decide whether to jump or not according to the judgment result. In the original "EQ+TEST" scheme, it is necessary to write the judgment result into a temporary variable on the stack first, then read the temporary variable when the TEST bytecode is executed, and then judge true or false again, thus adding a temporary variable Reading and writing, but also a true or false judgment.

The advantage is such an advantage. Yes, but not much. Especially compared with the implementation complexity it brings, it is even less. The original "EQ+TEST" scheme only needs to add a few operators to the Binary Numerical Operation introduced earlier; but the new scheme needs to be described earlier logical operation coordination. However, we still choose to follow the official implementation of Lua, and trade the complexity of the implementation for some execution efficiency optimization.

In addition, regarding the types of the two operands in the bytecode, according to the previous description of Bytecode Parameter Type, it is similar to the bytecode of the binary value operation, each relational operator also corresponds to 3 bytecodes, for example, for equality operators: Equal, EqualInt and EqualConst, a total of 3 bytecodes. A total of 6 relational operators are 18 bytecodes.

Combined with Logical Operations

Combining relational and logical operations is very common. Take the a>b and b<c statement as an example. According to the introduction in the previous two sections, this is a logical operation statement. The two operands are a>b and b<c respectively. The operand discharges to a temporary variable on the stack in order to judge true or false. In order to avoid the use of temporary variables here, it is necessary to make relational operations and logical operations cooperate with each other.

For relational operation statements, the ExpDesc type needs to be added: Compare. Let's see what parameters need to be associated with this type if it is to be combined with logical operations, that is, for logical operation statements that use relational operations as operands.

First of all, if it is not converted to the ExpDesc::Test type, then the Compare type needs to maintain two jump lists of True and False;

Secondly, for the two jumps of True and False, the previous logical operations are distinguished by 2 bytecodes, TestAndJump and TestOrJump. The same can be done for relational operations, such as EqualTrue and EqualFalse bytecodes for equal operations. However, the relational operators have a total of 18 bytecodes. If each bytecode needs to distinguish between True and False jumps, then 36 bytecodes are required. That's too many! Fortunately, there is another method. The EQ bytecode introduced above only has 2 parameters, and a Boolean parameter can be added to indicate whether to jump True or False.

Finally, for the two jumps of True and False, it needs to be determined according to the logical operator behind it. For example, in the above example of a>b and b<c, it cannot be determined when it is parsed to a>b, but it can only be determined when it is parsed to and. Therefore, the complete bytecode cannot be generated when parsing the relational operation statement, so the relevant information can only be stored in the Compare type first, and then the bytecode is generated after the jump type is determined.

In summary, the new types of relational operations are defined as follows:

enum ExpDesc {
     Compare(fn(u8,u8,bool)->ByteCode, usize, usize, Vec<usize>, Vec<usize>),

The first 3 parameters are bytecode type and the first 2 parameters are used to generate bytecode after determining the jump type; the latter 2 parameters are True and False jump lists. The whole type is equivalent to the combination of BinaryOp and Test types.

Here is the same problem as the logical operation introduced earlier. When the bytecode is generated, the destination address of the jump cannot be determined, and the complete bytecode cannot be generated immediately. It needs to be processed after determining the destination address. . However, this is different from the previous logical operation solution. The previous logical operation method is: first generate a bytecode placeholder, and only leave the parameters of the jump destination address blank; after determining the destination address, fix the corresponding parameters in the bytecode (fix_test_list() function ). The method of relational operation here is to store all the information in ExpDesc::Compare (causing the definition of this type to be very long), and then directly generate the complete bytecode after the destination address is determined later.

In fact, for the processing of relational operations, theoretically, logical operations can also be used to generate bytecodes and then repair them. However, there are 18 bytecodes corresponding to relational operations, which is too many. If you still follow fix_test_list() the method of function matching first and then generating bytecode, the code is too complicated. If it is in the C language, the parameters in the bytecode can be directly corrected by bit operations, regardless of the bytecode type; while directly modifying the associated parameters in the enum in Rust requires unsafe.

Another difference is that when parsing logical operations, bytecodes must be generated immediately to take place. The Compare type operand of the relational operation will determine the jump type in the test_or_jump() function immediately after, and then the bytecode can be generated, so there is no need to occupy a place, and there is no need to generate a word first. section code then fixed it again.

Syntax Analysis

The syntax analysis of relational operations is divided into two parts:

  • The parsing operation itself generates the corresponding ExpDesc::Compare according to the operator. This part is similar to Binary Numerical Operation, which is skipped here.

  • The combination of relational operations and logical operations, that is, the combination of ExpDesc::Compare and ExpDesc::Test. In the previous analysis of logical operations, the processing of ExpDesc::Compare has been added.

For example, when the left operand is logically operated, bytecode is generated and two jump lists are processed:

     fn test_or_jump(&mut self, condition: ExpDesc) -> Vec<usize> {
         let (code, true_list, mut false_list) = match condition {
             ExpDesc::Boolean(true) | ExpDesc::Integer(_) | ExpDesc::Float(_) | ExpDesc::String(_) => {
                 return Vec::new();
             }
             // Add a Compare type.
             // Generate 2 bytecodes.
             // The two jump lists are handled in the same way as `ExpDesc::Test` below.
             ExpDesc::Compare(op, left, right, true_list, false_list) => {
                 // If it is determined to be a True jump, that is, the associated
                 // third parameter, the complete bytecode can be generated.
                 self.byte_codes.push(op(left as u8, right as u8, true));

                 // Generate Jump bytecode, but the jump destination address is not
                 // yet known, and subsequent repairs are required. to this end,
                 // Add processing of Jump bytecode in fix_test_list().
                 (ByteCode::Jump(0), Some(true_list), false_list)
             }
             ExpDesc::Test(condition, true_list, false_list) => {
                 let icondition = self.discharge_any(*condition);
                 (ByteCode::TestOrJump(icondition as u8, 0), Some(true_list), false_list)
             }
             _ => {
                 let icondition = self.discharge_any(condition);
                 (ByteCode::TestOrJump(icondition as u8, 0), None, Vec::new())
             }
         };

now dealing with the right operand:

     fn process_binop(&mut self, binop: Token, left: ExpDesc, right: ExpDesc) -> ExpDesc {
         match binop {
             Token::And | Token::Or => {
                 if let ExpDesc::Test(_, mut left_true_list, mut left_false_list) = left {
                     match right {
                         // Add a Compare type.
                         // The processing method is similar to the `ExpDesc::Test` type below.
                         ExpDesc::Compare(op, l, r, mut right_true_list, mut right_false_list) => {
                             left_true_list.append(&mut right_true_list);
                             left_false_list.append(&mut right_false_list);
                             ExpDesc::Compare(op, l, r, left_true_list, left_false_list)
                         }
                         ExpDesc::Test(condition, mut right_true_list, mut right_false_list) => {
                             left_true_list.append(&mut right_true_list);
                             left_false_list.append(&mut right_false_list);
                             ExpDesc::Test(condition, left_true_list, left_false_list)
                         }
                         _ => ExpDesc::Test(Box::new(right), left_true_list, left_false_list),
                     }
                 } else {
                     panic!("impossible");
                 }
             }

Virtual Machine Execution

There are 6 relational operators in total. Since we have previously implemented the Eq trait for Value, the equal and not equal operations can use == and != to directly compare the Value operands. But for the other 4 operators, you need to implement a new trait for Value, which is PartialOrd. The reason why it is not Ord is because different types of Value cannot be compared in size. There is no need to use PartialEq because different types of Value can be compared for equality, and the return result is False. For example, the following two statements:

print (123 == 'hello') -- prints false
print (123 > 'hello') -- throw exception

Lua's comparison operators only support numeric and string types. So the PartialOrd implementation of Value is as follows:

impl PartialOrd for Value {
    fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
        match (self, other) {
            // numbers
            (Value::Integer(i1), Value::Integer(i2)) => Some(i1.cmp(i2)),
            (Value::Integer(i), Value::Float(f)) => (*i as f64).partial_cmp(f),
            (Value::Float(f), Value::Integer(i)) => f.partial_cmp(&(*i as f64)),
            (Value::Float(f1), Value::Float(f2)) => f1.partial_cmp(f2),

            // strings
            (Value::ShortStr(len1, s1), Value::ShortStr(len2, s2)) => Some(s1[..*len1 as usize].cmp(&s2[..*len2 as usize])),
            (Value::MidStr(s1), Value::MidStr(s2)) => Some(s1.1[..s1.0 as usize].cmp(&s2.1[..s2.0 as usize])),
            (Value::LongStr(s1), Value::LongStr(s2)) => Some(s1.cmp(s2)),

            // strings of different types
            (Value::ShortStr(len1, s1), Value::MidStr(s2)) => Some(s1[..*len1 as usize].cmp(&s2.1[..s2.0 as usize])),
            (Value::ShortStr(len1, s1), Value::LongStr(s2)) => Some(s1[..*len1 as usize].cmp(s2)),
            (Value::MidStr(s1), Value::ShortStr(len2, s2)) => Some(s1.1[..s1.0 as usize].cmp(&s2[..*len2 as usize])),
            (Value::MidStr(s1), Value::LongStr(s2)) => Some(s1.1[..s1.0 as usize].cmp(s2)),
            (Value::LongStr(s1), Value::ShortStr(len2, s2)) => Some(s1.as_ref().as_slice().cmp(&s2[..*len2 as usize])),
            (Value::LongStr(s1), Value::MidStr(s2)) => Some(s1.as_ref().as_slice().cmp(&s2.1[..s2.0 as usize])),

            (_, _) => None,
        }
    }
}

For floating-point numbers, the partial_cmp() method needs to be called because the Nan of floating-point numbers cannot be compared.

Types that implement the PartialOrd trait can directly use several relatively large symbols such as >, <, >=, and <=. But PartialOrd actually has 3 return results for comparison: true, false, and not comparable. Corresponding to the Lua language, they are true, false, and throw an exception. However, the above-mentioned 4 comparison symbols can only give 2 results, and return false if they cannot be compared. So in order to be able to judge the situation that cannot be compared, we cannot use these 4 symbols directly, but use the original partial_cmp() function. The following is the execution code of LesEq and Less two bytecodes:

     ByteCode::LesEq(a, b, r) => {
         let cmp = &self.stack[a as usize].partial_cmp(&self.stack[b as usize]).unwrap();
         if !matches!(cmp, Ordering::Greater) == r {
             pc += 1;
         }
     }
     ByteCode::Less(a, b, r) => {
         let cmp = &self.stack[a as usize].partial_cmp(&self.stack[b as usize]).unwrap();
         if matches!(cmp, Ordering::Less) == r {
             pc += 1;
         }
     }

Here unwarp() is used to throw an exception. In the follow-up, when standardizing error handling, improvements need to be made here.