Environment _ENV

Go back to the first "hello, world!" example in Chapter 1. In the output of luac -l displayed at that time, the bytecode for reading the global variable print is as follows:

2 [1] GETTABUP 0 0 0 ; _ENV "print"

Looking at the complex name of the bytecode and the strange _ENV comment behind it, it is not simple. At that time, this bytecode was not introduced, but the more intuitive bytecode GetGlobal was redefined to read global variables. In this section, let’s supplement the introduction of _ENV.

Current Global Variables

Our current handling of global variables is straightforward:

  • In the syntax analysis stage, variables that are not local variables and upvalues are considered as global variables, and corresponding bytecodes are generated, including GetGlobal, SetGlobal and SetGlobalConst;

  • In the execution phase of the virtual machine, define global: HashMap<String, Value> in the execution state ExeState data structure to represent the global variable table. Subsequent reads and writes to global variables operate on this table.

This approach is intuitive and has no downsides. However, there is another way to bring more powerful features, which is the environment _ENV introduced in Lua version 5.2. "Lua Programming" has a very detailed description of _ENV, including why _ENV should be used instead of global variables and application scenarios. We will not go into details here, but directly introduce its design and implementation.

How _ENV Works

The principle of _ENV:

  • In the parsing phase, convert all global variables into indexes of table "_ENV", such as g1 = g2 to _ENV.g1 = _ENV.g2;

  • So what is _ENV itself? Since all Lua code segments can be considered as a function, _ENV can be considered as a local variable outside the code segment, which is upvalue. For example, for the above code segment g1 = g2, a more complete conversion result is as follows:

local _ENV = XXX -- predefined global variable table
return function (...)
     _ENV.g1 = _ENV.g2
end

All "global variables" have become indexes of _ENV, and _ENV itself is also an upvalue, so there are no global variables! In addition, the key point is that _ENV itself has nothing special except that it is preset in advance, it is just an ordinary variable. This means that it can be manipulated like ordinary variables, which brings a lot of flexibility, such as a sandbox can be easily implemented. The specific usage scenarios will not be expanded here. If you are interested, you can refer to "Lua Programming".

Implementation of _ENV

According to the above introduction, use _ENV to transform global variables.

First, in the syntax analysis stage, the global variable is transformed into an index to _ENV. The relevant code is as follows:

fn simple_name(&mut self, name: String) -> ExpDesc {
     // Omit the matching of local variables and upvalue, and return directly if they match.

     // If there is no match,
     // - Previously considered to be a global variable, return ExpDesc::Global(name)
     // - Now transformed into _ENV.name, the code is as follows:
     let env = self.simple_name("_ENV".into()); // call recursively, look for _ENV
     let ienv = self. discharge_any(env);
     ExpDesc::IndexField(ienv, self. add_const(name))
}

In the above code, first try to match the variable name from local variables and upvalue. This part was introduced in detail in upvalue and is omitted here. Here we only look at the case where the matching fails. In this case, name was previously considered to be a global variable, and ExpDesc::Global(name) was returned. Now to transform it into _ENV.name, it is necessary to locate _ENV first. Since _ENV is also an ordinary variable, the simple_name() function is called recursively with _ENV as an argument. In order to ensure that this call does not recurse infinitely, it is necessary to pre-set _ENV in the preparation phase of syntax analysis. So in this recursive call, _ENV will definitely be matched as a local variable or upvalue, and will not be called recursively again.

So how to pre-set _ENV? In the above introduction, _ENV is the upvalue as the whole code block. But for the sake of convenience, we can use _ENV as a parameter in the load() function to achieve the same effect:

pub fn load(input: impl Read) -> FuncProto {
     let mut ctx = ParseContext { /* omitted */ };

     // _ENV as the first and only parameter
     chunk(&mut ctx, false, vec!["_ENV".into()], Token::Eos)
}

In this way, when parsing the outermost code of the code block, when the simple_name() function is called, a local variable of _ENV will be matched; and an upvalue of _ENV will be matched for the code inside the function.

This is just a promise that there must be a _ENV variable. And the fulfillment of this promise needs to be performed in the virtual machine execution stage. When creating an execution state ExeState, immediately after the function entry, _ENV must be pushed onto the stack as the first parameter. In fact, the previous initialization of the global member in ExeState is transferred to the stack. code show as below:

impl ExeState {
     pub fn new() -> Self {
         // global variable table
         let mut env = Table::new(0, 0);
         env.map.insert("print".into(), Value::RustFunction(lib_print));
         env.map.insert("type".into(), Value::RustFunction(lib_type));
         env.map.insert("ipairs".into(), Value::RustFunction(ipairs));
         env.map.insert("new_counter".into(), Value::RustFunction(test_new_counter));

         ExeState {
             // Push 2 values on the stack: the virtual function entry, and the global variable table _ENV
             stack: vec![Value::Nil, Value::Table(Rc::new(RefCell::new(env)))],
             base: 1, // for entry function
         }
     }

In this way, the transformation of _ENV is basically completed. This transformation is very simple, but the function it brings is very powerful, so _ENV is a very beautiful design.

In addition, since there is no concept of global variables, the previous codes related to global variables, such as ExpDesc::Global and the generation and execution of 3 bytecodes related to global variables, can be deleted. Note that no new ExpDesc or bytecode is introduced to implement _ENV. But just not yet.

Optimization

Although the above transformation is fully functional, there is a performance problem. Since _ENV is upvalue in most cases, for global variables, two bytecodes will be generated in the above simple_name() function:

Getupvalue ($tmp_table, _ENV) # first load _ENV onto the stack
GetField ($dst, $tmp_table, $key) # before indexing

In the original scheme that does not use _ENV, only one bytecode GetGlobal is needed. This new solution obviously reduces performance. To make up for the performance loss here, it is only necessary to provide bytecodes that can directly index the upvalue table. To do this, add 3 bytecodes:

pub enum ByteCode {
     // Deleted 3 old bytecodes that directly manipulate the global variable table
     // GetGlobal(u8, u8),
     // SetGlobal(u8, u8),
     // SetGlobalConst(u8, u8),

     // Add 3 corresponding bytecodes for operating the upvalue table
     GetUpField(u8, u8, u8),
     SetUpField(u8, u8, u8),
     SetUpFieldConst(u8, u8, u8),

Correspondingly, the expression of the upvalue table index should also be increased:

enum ExpDesc {
     // deleted global variable
     // Global(usize),

     // Added index to upvalue table
     IndexUpField(usize, usize),

The index to the upvalue table here only supports string constants, which is also the scenario for global variables. Although this IndexUpField is added for global variable optimization, it can also be applied to ordinary upvalue table indexes. Therefore, in the function of parsing table indexes, IndexUpField optimization can also be added. The specific code is omitted here.

After defining IndexUpField, the original variable parsing function can be modified:

fn simple_name(&mut self, name: String) -> ExpDesc {
     // Omit the matching of local variables and upvalue, and return directly if they match.

     // If there is no match,
     // - Previously considered to be a global variable, return ExpDesc::Global(name)
     // - Now transformed into _ENV.name, the code is as follows:
     let iname = self. add_const(name);
     match self. simple_name("_ENV".into()) {
         ExpDesc::Local(i) => ExpDesc::IndexField(i, iname),
         ExpDesc::upvalue(i) => ExpDesc::IndexUpField(i, iname), // new IndexUpField
         _ => panic!("no here"), // because "_ENV" must exist!
     }
}

As before, after a variable fails to match both the local variable and the upvalue, it still uses _ENV as a parameter to recursively call the simple_name() function. But here we know that the result returned by _ENV must be a local variable or an upvalue. In these two cases, ExpDesc::IndexField and ExpDesc::IndexUpField are generated respectively. Then generate the 3 new bytecodes above when reading and writing ExpDesc::IndexUpField.

In this way, it is equivalent to replacing ExpDesc::Global with ExpDesc::IndexUpField. The processing of ExpDesc::Global was deleted before, and now it is added back from ExpDesc::IndexUpField.