Environment _ENV
Go back to the first "hello, world!" example in Chapter 1. In the output of luac -l
displayed at that time, the bytecode for reading the global variable print
is as follows:
2 [1] GETTABUP 0 0 0 ; _ENV "print"
Looking at the complex name of the bytecode and the strange _ENV
comment behind it, it is not simple. At that time, this bytecode was not introduced, but the more intuitive bytecode GetGlobal
was redefined to read global variables. In this section, let’s supplement the introduction of _ENV
.
Current Global Variables
Our current handling of global variables is straightforward:
-
In the syntax analysis stage, variables that are not local variables and upvalues are considered as global variables, and corresponding bytecodes are generated, including
GetGlobal
,SetGlobal
andSetGlobalConst
; -
In the execution phase of the virtual machine, define
global: HashMap<String, Value>
in the execution stateExeState
data structure to represent the global variable table. Subsequent reads and writes to global variables operate on this table.
This approach is intuitive and has no downsides. However, there is another way to bring more powerful features, which is the environment _ENV
introduced in Lua version 5.2. "Lua Programming" has a very detailed description of _ENV
, including why _ENV
should be used instead of global variables and application scenarios. We will not go into details here, but directly introduce its design and implementation.
How _ENV Works
The principle of _ENV
:
-
In the parsing phase, convert all global variables into indexes of table "_ENV", such as
g1 = g2
to_ENV.g1 = _ENV.g2
; -
So what is
_ENV
itself? Since all Lua code segments can be considered as a function,_ENV
can be considered as a local variable outside the code segment, which is upvalue. For example, for the above code segmentg1 = g2
, a more complete conversion result is as follows:
local _ENV = XXX -- predefined global variable table
return function (...)
_ENV.g1 = _ENV.g2
end
All "global variables" have become indexes of _ENV
, and _ENV
itself is also an upvalue, so there are no global variables! In addition, the key point is that _ENV
itself has nothing special except that it is preset in advance, it is just an ordinary variable. This means that it can be manipulated like ordinary variables, which brings a lot of flexibility, such as a sandbox can be easily implemented. The specific usage scenarios will not be expanded here. If you are interested, you can refer to "Lua Programming".
Implementation of _ENV
According to the above introduction, use _ENV
to transform global variables.
First, in the syntax analysis stage, the global variable is transformed into an index to _ENV
. The relevant code is as follows:
fn simple_name(&mut self, name: String) -> ExpDesc {
// Omit the matching of local variables and upvalue, and return directly if they match.
// If there is no match,
// - Previously considered to be a global variable, return ExpDesc::Global(name)
// - Now transformed into _ENV.name, the code is as follows:
let env = self.simple_name("_ENV".into()); // call recursively, look for _ENV
let ienv = self. discharge_any(env);
ExpDesc::IndexField(ienv, self. add_const(name))
}
In the above code, first try to match the variable name
from local variables and upvalue. This part was introduced in detail in upvalue and is omitted here. Here we only look at the case where the matching fails. In this case, name
was previously considered to be a global variable, and ExpDesc::Global(name)
was returned. Now to transform it into _ENV.name
, it is necessary to locate _ENV
first. Since _ENV
is also an ordinary variable, the simple_name()
function is called recursively with _ENV
as an argument. In order to ensure that this call does not recurse infinitely, it is necessary to pre-set _ENV
in the preparation phase of syntax analysis. So in this recursive call, _ENV
will definitely be matched as a local variable or upvalue, and will not be called recursively again.
So how to pre-set _ENV
? In the above introduction, _ENV
is the upvalue as the whole code block. But for the sake of convenience, we can use _ENV
as a parameter in the load()
function to achieve the same effect:
pub fn load(input: impl Read) -> FuncProto {
let mut ctx = ParseContext { /* omitted */ };
// _ENV as the first and only parameter
chunk(&mut ctx, false, vec!["_ENV".into()], Token::Eos)
}
In this way, when parsing the outermost code of the code block, when the simple_name()
function is called, a local variable of _ENV
will be matched; and an upvalue of _ENV
will be matched for the code inside the function.
This is just a promise that there must be a _ENV
variable. And the fulfillment of this promise needs to be performed in the virtual machine execution stage. When creating an execution state ExeState
, immediately after the function entry, _ENV
must be pushed onto the stack as the first parameter. In fact, the previous initialization of the global
member in ExeState
is transferred to the stack. code show as below:
impl ExeState {
pub fn new() -> Self {
// global variable table
let mut env = Table::new(0, 0);
env.map.insert("print".into(), Value::RustFunction(lib_print));
env.map.insert("type".into(), Value::RustFunction(lib_type));
env.map.insert("ipairs".into(), Value::RustFunction(ipairs));
env.map.insert("new_counter".into(), Value::RustFunction(test_new_counter));
ExeState {
// Push 2 values on the stack: the virtual function entry, and the global variable table _ENV
stack: vec![Value::Nil, Value::Table(Rc::new(RefCell::new(env)))],
base: 1, // for entry function
}
}
In this way, the transformation of _ENV
is basically completed. This transformation is very simple, but the function it brings is very powerful, so _ENV
is a very beautiful design.
In addition, since there is no concept of global variables, the previous codes related to global variables, such as ExpDesc::Global
and the generation and execution of 3 bytecodes related to global variables, can be deleted. Note that no new ExpDesc or bytecode is introduced to implement _ENV
. But just not yet.
Optimization
Although the above transformation is fully functional, there is a performance problem. Since _ENV
is upvalue in most cases, for global variables, two bytecodes will be generated in the above simple_name()
function:
Getupvalue ($tmp_table, _ENV) # first load _ENV onto the stack
GetField ($dst, $tmp_table, $key) # before indexing
In the original scheme that does not use _ENV
, only one bytecode GetGlobal
is needed. This new solution obviously reduces performance. To make up for the performance loss here, it is only necessary to provide bytecodes that can directly index the upvalue table. To do this, add 3 bytecodes:
pub enum ByteCode {
// Deleted 3 old bytecodes that directly manipulate the global variable table
// GetGlobal(u8, u8),
// SetGlobal(u8, u8),
// SetGlobalConst(u8, u8),
// Add 3 corresponding bytecodes for operating the upvalue table
GetUpField(u8, u8, u8),
SetUpField(u8, u8, u8),
SetUpFieldConst(u8, u8, u8),
Correspondingly, the expression of the upvalue table index should also be increased:
enum ExpDesc {
// deleted global variable
// Global(usize),
// Added index to upvalue table
IndexUpField(usize, usize),
The index to the upvalue table here only supports string constants, which is also the scenario for global variables. Although this IndexUpField
is added for global variable optimization, it can also be applied to ordinary upvalue table indexes. Therefore, in the function of parsing table indexes, IndexUpField
optimization can also be added. The specific code is omitted here.
After defining IndexUpField
, the original variable parsing function can be modified:
fn simple_name(&mut self, name: String) -> ExpDesc {
// Omit the matching of local variables and upvalue, and return directly if they match.
// If there is no match,
// - Previously considered to be a global variable, return ExpDesc::Global(name)
// - Now transformed into _ENV.name, the code is as follows:
let iname = self. add_const(name);
match self. simple_name("_ENV".into()) {
ExpDesc::Local(i) => ExpDesc::IndexField(i, iname),
ExpDesc::upvalue(i) => ExpDesc::IndexUpField(i, iname), // new IndexUpField
_ => panic!("no here"), // because "_ENV" must exist!
}
}
As before, after a variable fails to match both the local variable and the upvalue, it still uses _ENV
as a parameter to recursively call the simple_name()
function. But here we know that the result returned by _ENV
must be a local variable or an upvalue. In these two cases, ExpDesc::IndexField
and ExpDesc::IndexUpField
are generated respectively. Then generate the 3 new bytecodes above when reading and writing ExpDesc::IndexUpField
.
In this way, it is equivalent to replacing ExpDesc::Global
with ExpDesc::IndexUpField
. The processing of ExpDesc::Global
was deleted before, and now it is added back from ExpDesc::IndexUpField
.