To Be Continued
We have implemented the core features of the Lua interpreter. Still, we're far from our original goal - a complete, performant, production-grade interpreter. I will continue to improve this interpreter, but due to busy work and insufficient spare time, I will suspend this series of articles. Writing articles is more tiring than writing code. Combined with my experience of reading Yu Yuan's "Handwriting Operating System by Yourself" when I was in school, I only followed the first half of the book and practiced it. After mastering the basic development methods and getting started with writing an operating system, the rest is to write it by myself. I think the parts of this series of articles that have been completed so far should also provide an introductory knowledge of implementing a Lua interpreter, and interested readers can implement the remaining parts independently.
Here is a partial list of some unfinished features:
-
Metatable is a very important feature of Lua language, providing flexible and powerful features. However, its implementation principle is very simple. It only needs to make an extra layer of judgment when the virtual machine executes the relevant bytecode, and it does not even need to modify the part of syntax analysis. Here is an implementation detail: the garbage collection of our interpreter uses RC, which may cause circular references and lead to memory leaks. A table setting itself as its own metatable is a common circular reference. To avoid circular references in this common scenario, special handling is required for this case.
-
UserData, is one of the basic types of Lua. However, we have not yet encountered the need to use UserData. We can implement this type later when we encounter this requirement when implementing the standard library. In the official implementation of Lua, creating a new UserData is to allocate for memory in Lua, and then hand it over to the C function to initialize. However, uninitialized memory is not allowed in Rust, so we have to think about how to create a UserData value.
-
LightUserData, also one of the basic types of Lua. It's just a raw pointer, and doesn't need to do anything special about it.
-
Error handling. Our current way of handling all errors is to panic, which is not feasible. At least we need to distinguish between expected errors and program bugs. The former may also need to subdivide lexical analysis, syntax analysis, virtual machine execution, Rust API and other types. Error handling is also a feature of the Rust language. It's also a great opportunity to experience Rust's error handling.
-
Performance optimization. High performance is one of our initial goals, and some optimizations have been made during the implementation, such as the design of the string type, but the final result is not yet known, and we still need to test to know. There are some benchmark examples codes of Lua performance tests on the Internet, we can follow the Lua official Implement a comparative test. This also verifies correctness by the way.
-
Optimized table construction. For the table construction with all constant elements, there is no need to load it on the stack, and even the table can be created directly in the syntax analysis stage.
-
Rust API. The more usage scenarios of the Lua language are glue languages, so the external API is very important. Our interpreter is mainly used for programs written in Rust language, so it should provide a set of APIs that conform to the Rust calling method. This is inconsistent with the C API provided by the official Lua implementation. We have already implemented some basic APIs, such as reading values on the stack, etc., using generics, which simplifies the API and calling methods, which is inconsistent with the C API. Here has a comparative survey of the calling methods of the scripting language implemented by Rust.
-
Library. The current interpreter is a stand-alone program, but the most common usage scenario for Lua is a library that is called by other programs. So we need to transform our project into a library.
-
Support parameter passing and return value of the entire code segment.
-
The standard library, which is a feature other than the core of the interpreter, involves more aspects. In addition to the packages listed below, there are some basic functions in the standard library, such as
type()
andipairs()
, which we have already implemented, and most of the rest are not difficult. The only trouble ispairs()
function. The efficient implementation of thepairs()
function in the official Lua implementation depends on the implementation of the table. And we use Rust'sHashMap
to implement the dictionary part of the table, there may be no simple way to implement it. -
The math library, most functions have corresponding implementations in the Rust standard library, the only thing that needs to be manually implemented is the function to generate random numbers. Since this function is not provided in the C language standard, Lua's official implementation makes this function itself. Although we can also use the
random
crate, it is better to refer to the official Lua implementation and implement this random number generation function by ourselves. In addition, generating random numbers requires maintaining a global state. In the official implementation of Lua, this state is a UserData type and is added to Lua's Register. And we can use the characteristics of Rust closures to put this state in closures, which is more convenient and efficient. -
The string library, the trouble is regular matching. For convenience, Lua language defines and implements a set of regular matching rules. So we can only follow its definition and reimplement it in Rust. It should be very complicated here, but after completion, we will have a deeper understanding of regular matching.
-
The io library, the trouble is the representation of the file. The
FILE
type is provided in the C language standard, which can represent all file types, including standard input and output, ordinary files, etc., and can also represent multiple modes such as read-only, write-only, and read-write. But in the Rust language these seem to be independent. If we want to provide an API consistent with the io library, we need to do encapsulation. -
The coroutine library, requires a thorough understanding of Lua's coroutines, and will also make great adjustments to the existing function call process.
-
The debug library, I have not used this library, I don't know much, but I feel that if to implement this library I will either need a lot of unsafe code, or make a lot of changes to the existing process. So in the end one may choose not to implement this library.
In addition to the above list of unfinished functions, there are some small improvements to the current code, such as refining the comments, applying the let..else
syntax supported in the new version of Rust, and some small code optimizations, etc. For this reason, we add to_be_continued, which can also be seen as the final version of the code corresponding to this series of articles.