Drafting a JVM With Rust
How to implement a stack-based virtual machine.
By Nishanth Shetty
Virtual machines emulate computer systems by providing a high-level abstraction of the underlying operating system, and therefore provide the same functionality.
Broadly, they’re of two types — one is the System Virtual Machine that emulates an entire system and allows users to run an Operating System (ex: VMWare).
The other kind is the Process Virtual Machine, or Managed Runtime Environment (MRE), which runs a user program on a host machine. Ex : JVM, CLR (Common Language Runtime) a .Net runtime.
We’ll be focusing on the latter in this post.
Implementation
There are two ways to implement the Virtual Machine — Stack-based and Register-based. JVM is a stack-based virtual machine whereas Dalvik VM is register-based.
Stack-based VM operates on two well-known stack operations — push and pop. Data is pushed on to the stack and operation is performed on the data at the top of the stack. The result will be pushed back to stack.
We will try to implement our own stack-based virtual machine which will perform basic arithmetic operations. But first, we need to define runtime specification for our VM.
1. VM Will perform addition, subtraction, multiplication, and division.
2. VM will use wordcode of size 32bit for data and instruction set. Bytecode for the VM will use 32bit word size, hence the wordcode.
3. VM will reach halt state when `HALT` instruction is encountered and every program must end with `HALT` instruction.
Operation
VM Performs fetch-decode-execute
cycle,
fetch
Retrieve data/instruction from VM program memorydecode
Identify the type of the data fetched, eitherVM Instruction
(Operation Instruction
) orData
execute
Perform the operation based on the decode type.
Operation Instruction
Performs the operation defined by the instruction on the data present in the top of the stack.
Data
Push the data to stack.
Defining Instruction Set
As per specification, our VM will use 32 bit to store the instruction and data. To separate instruction and data in our program, we need to have a way of communicating the same to the VM. We will allocate 2 bits to indicate the type; that leaves us with 30bits to store the data (i.e we can have numbers value range 2³⁰ in positive and negative values).
We will use 2 MSB bits for code type as follows:
|bits| Type |
|----|-----------------|
| 00 | Data (Positive number)|
| 01 | Operation Instruction (OI)|
| 11 | Data (negative number)|
| 10 | undefined|
Stack-VM Instructions:
0x40000001 ADD
0x40000010 SUB
0x40000011 MUL
0x40000100 DIV
0x40000000 HALT
The code to perform 22 + 32
will look like this:
| Hex word | VM Value |
|0x00000020 | Data (32 in decimal)|
|0x00000016 | Data (22 in decimal)|
|0x40000001 | OI (ADD)|
|0x40000000 | OI (HALT)|
Now that we have defined basic specification for the VM, its time for some code. We will use rust-lang for the implementation.
Create new Rust project with cargo new Stack-VM
Before we start, we will make following assumptions:
- VM uses little endian byte order.
- VM uses separate memory for runtime stack and program memory to load program.
Add the byteorder
crate to your cargo.toml file as follows:
[dependencies]
byteorder = "1.0.0"
Create runtime stack
Create lib module in your src
folder, we will create all required modules here.
Create stack
module and implement the stack with its primitive operation — push()
and pop()
.
Create VM Runtime.
Stack VM will have its internal state structure to store current program counter, vm state, runtime stack, and program memory. Create the module called vm
and create the below struct:
pub struct VM {
//program counter
pc: usize,
//typ of value read
type_info: i32,
//data/instruction
data: i32,
running: bool,
stack: VMStack,
program_memory: Vec<i32>,
}
Initialise the VM with default state as follows:
VM {
program_memory: Vec::new(),
pc: 0,
type_info: 0,
data: 0,
running: true,
}
We will implement a few methods on VM as follows:
1. load_program
Loads the user program from file to VM program_memory
. When we run fetch as a first task or run cycle, it will advance the program counter, so PC will start the code execution at memory index 1. Hence, we need to load our program to code area starting from index 1. For this reason, we will push magic bit 0xBADC0DE
to first position of vector, at index 0.
pub fn load_program(&mut self, instructions: &Vec<u32>) {
debug!("Loading program...");
//Insert magic bits to beginning of the program memory, so that we can start pc from 1
self.program_memory.push(0xBADC0DE);
for instruction in instructions {
self.program_memory.push(*instruction);
}
debug!("Memory content : {:?}", self.program_memory);
}
2. fetch
Loads instructions from the program memory. We will just advance the program counter in this step:
fn fetch(&mut self) {
if self.pc < self.program_memory.len() {
self.pc += 1;
} else {
panic!("Incomplete code execution, memory boundary reached without reading HALT instruction");
}
}
3. decode
Extracts the operation instruction and data from the word(byte):
fn decode(&mut self) {
let word = self.program_memory[self.pc];
self.data = VM::get_data(word);
self.type_info = VM::get_type(word);
}
4. run
Executes fetch-decode-execute
cycle till program halts.
5. exec
Based on the decode information, it either pushes the data to stack or performs operation if the code type is Operation Instruction
.
fn exec(&mut self) {
if self.current_instruction_type() == CODE_TYPE_DATA_POSITIVE || self.current_instruction_type() == CODE_TYPE_DATA_NEGATIVE {
debug!("Instruction type Data ({} = {} ) ", self.current_instruction_type(), self.current_data());
self.stack.push(self.data);
} else {
//execute instruction
debug!("Instruction type Operation ({}) , ", self.current_instruction_type());
self.do_primitive();
}
}
6. do_primitive
Executes instruction (do_primitive()
).
This method will perform the arithmetic operation as defined in the spec. It will pop values from the top of the stack, perform the operation specified in instruction, and push the result back to the top of the stack. This is the Arithmetic Unit of our VM.
Put together all the above, and create main.rs as follows:
To run, we need to generate binary for our VM. We can use stack-langc compiler to generate the binary for us.
stack-langc uses simple postfix expression to generate the binary, so in order to get our program to add two numbers, create program.vm
file with below content:
32 22 +
To generate binary, move to stack-langc
repo and run the following:
$cargo run program.vm
This will generate a.out binary file with Stack-VM bytecode.
$hexdump a.out
will show the following content for the a.out.
0000000 20 00 00 00 16 00 00 00 01 00 00 40 00 00 00 400000010
Now run Stack-VM program using:
$cargo run a.out
As we are using debug mode, you should see debug logs printed with stack content and the operation the VM is performing.
Memory content : [195936478, 32, 22, 1073741825, 1073741824]
Executing instructions...
Instruction type Data (0 = 32 )
Instruction type Data (0 = 22 )
Instruction type Operation (1) ,
[ ADD ] : 22 32
TOS now : 54
Instruction type Operation (1) ,
[ HALT ] : Stopping VM
That’s all there is to it. This exercise is an attempt to understand Rust and stack-based VM implementation better. You can find the full project here at my Github page.