Garbage Collection and Rc
In the above section String Definition, we used Rc
to define the string type in Lua, which involves an important topic: garbage collection(GC). Garbage collection is a very general and in-depth topic. Here we only introduce the parts related to our interpreter implementation.
GC vs RC
The Lua language manages memory automatically, meaning it releases unused memory automatically through garbage collection. There are two main ways to implement garbage collection: mark-and-sweep and reference counting (RC). Sometimes RC is not considered GC, so the narrow GC refers to the former, that is, the mark-clear scheme. The GC mentioned below in this section is its narrow meaning.
In comparison, RC has two disadvantages:
-
It is impossible to judge circular references, which can lead to memory leaks. This is fatal. In fact,
Rc
in Rust also has this problem. Rust's strategy for this is: it's up to the programmer to avoid circular references. -
Performance is worse than GC. This is not absolute, but it seems to be the mainstream view. The main reason is that each clone or drop operation needs to update the reference counter, which in turn affects the CPU cache.
Based on the above reasons, the mainstream languages will not adopt the RC scheme, but chose the GC scheme, including the official implementation version of Lua. However, we still chose to use Rc
in the definition of strings in this chapter, that is, to adopt the RC scheme, because of two shortcomings of GC:
-
Implement complex. Although it may be relatively simple to implement a simple GC solution, it is very difficult to pursue performance. Many languages (such as Python, Go, Lua) also continuously improve their GC during versions. It's hard to get there in one step.
-
More complex in Rust. Originally, the biggest feature of the Rust language is automatic memory management. The manual memory management function of the GC scheme is contrary to this feature of Rust, will make it more complicated. There are many discussions and projects on the Internet about implementing GC with Rust (such as 1, 2, [3](https://manishearth.github.io/blog/2021/04/05/a-tour-of-safe-tracing-gc-designs-in-rust/), [4](https://coredumped.dev/2022/04/11/implementing-a-safe-garbage-collector-in-rust/), etc.), obviously beyond Rust beginners range of capabilities.
In contrast, if you adopt the RC scheme, you only need to use Rc
in Rust, and no additional memory management is required. That is, the garbage collection part can be avoided entirely.
Countermeasures for the two shortcomings of the above-mentioned RC scheme: one is circular references, which can only be handed over to Lua programmers to avoid circular references, but in common cases, such as a table setting itself as a metatable, we can handle it specially to avoid memory leaks. The second is performance, and we can only give up the pursuit of this aspect.
It is a difficult decision to adopt the RC scheme to realize garbage collection. Because the goal of this project from the beginning is to fully comply with the Lua manual and be fully compatible with the official implementation version. After adopting the RC scheme, the scenario of circular reference cannot be handled, which destroys this goal. Due to my limited ability, I have to do this for the time being. However, GC solutions may also be tried in the future. Alternative GC schemes have very little impact on the rest of our interpreter. Interested readers can try it out by themselves first.
Rc in Rust
Ok, now let's leave the topic of garbage collection and discuss Rc
in Rust simply.
In many articles introducing Rust, it is mentioned that Rc
should be avoided as much as possible, because the unique ownership mechanism of Rust language not only provides automatic memory management at compile time, but also optimizes program design. Other languages that support pointers (such as C, C++) can use pointers to point at will, and each object may be pointed to by many other objects, and the entire program can easily form a chaotic "Object Sea". However, Rust's ownership mechanism forces Rust programmers to have only one owner for each object when designing a program, and the entire program forms a clear "Object Tree". In most scenarios, the latter (Object Tree) is obviously a better design. However, Rc
breaks this specification, and the whole program becomes a chaotic "Object Sea" again. So try to avoid using Rc
.
I agree with this point of view very much. In the process of implementing this Lua interpreter project, in order to follow Rust's ownership mechanism, I had to adjust the previous C language design ideas, and the adjusted results were often indeed clearer.
From a certain point of view, the Lua interpreter project can be divided into two parts:
- The interpreter itself, mainly lexical analysis and syntax analysis;
- The Lua code to be interpreted and executed, including the value, stack, and preset execution flow corresponding to the bytecode, which is the virtual machine part.
For the former part, we fully follow the design requirements of Object Tree and strive for a clear program structure. For the latter, since we cannot limit the Lua code written by Lua programmers (for example, Lua code can easily realize the data structure of the graph, which obviously does not conform to Object Tree), so we will not go into this part pursue Object Tree. Even if GC is used to achieve garbage collection, it will inevitably involve a lot of unsafe code, which is more contrary to the design intent of Rust than Rc
.