Drafting a JVM With Rust

How to implement a stack-based virtual machine.

Drafting a JVM With Rust

By Nishanth Shetty

Virtual machines emulate computer systems by providing a high-level abstraction of the underlying operating system, and therefore provide the same functionality.

Broadly, they’re of two types — one is the System Virtual Machine that emulates an entire system and allows users to run an Operating System (ex: VMWare).

The other kind is the Process Virtual Machine, or Managed Runtime Environment (MRE), which runs a user program on a host machine. Ex : JVM, CLR (Common Language Runtime) a .Net runtime.

We’ll be focusing on the latter in this post.

Implementation

There are two ways to implement the Virtual Machine — Stack-based and Register-based. JVM is a stack-based virtual machine whereas Dalvik VM is register-based.

Stack-based VM operates on two well-known stack operations — push and pop. Data is pushed on to the stack and operation is performed on the data at the top of the stack. The result will be pushed back to stack.

We will try to implement our own stack-based virtual machine which will perform basic arithmetic operations. But first, we need to define runtime specification for our VM.

1. VM Will perform addition, subtraction, multiplication, and division.
2. VM will use wordcode of size 32bit for data and instruction set. Bytecode for the VM will use 32bit word size, hence the wordcode.
3. VM will reach halt state when `HALT` instruction is encountered and every program must end with `HALT` instruction.

Operation

VM Performs fetch-decode-execute cycle,

  • fetch Retrieve data/instruction from VM program memory
  • decode Identify the type of the data fetched, either VM Instruction(Operation Instruction) or Data
  • execute Perform the operation based on the decode type.

Operation Instruction Performs the operation defined by the instruction on the data present in the top of the stack.

Data Push the data to stack.

Defining Instruction Set

As per specification, our VM will use 32 bit to store the instruction and data. To separate instruction and data in our program, we need to have a way of communicating the same to the VM. We will allocate 2 bits to indicate the type; that leaves us with 30bits to store the data (i.e we can have numbers value range 2³⁰ in positive and negative values).

We will use 2 MSB bits for code type as follows:

|bits| Type |
|----|-----------------|
| 00 | Data (Positive number)|
| 01 | Operation Instruction (OI)|
| 11 | Data (negative number)|
| 10 | undefined|

Stack-VM Instructions:

0x40000001    ADD           
0x40000010    SUB           
0x40000011    MUL           
0x40000100    DIV           
0x40000000    HALT

The code to perform 22 + 32 will look like this:

| Hex word  | VM Value |
|0x00000020 | Data (32 in decimal)|
|0x00000016 | Data (22 in decimal)|
|0x40000001 | OI (ADD)|
|0x40000000 | OI (HALT)|

Now that we have defined basic specification for the VM, its time for some code. We will use rust-lang for the implementation.

Create new Rust project with cargo new Stack-VM

Before we start, we will make following assumptions:

  1. VM uses little endian byte order.
  2. VM uses separate memory for runtime stack and program memory to load program.

Add the byteorder crate to your cargo.toml file as follows:

[dependencies]
byteorder = "1.0.0"

Create runtime stack

Create lib module in your src folder, we will create all required modules here.

Create stack module and implement the stack with its primitive operation — push() and pop() .

Create VM Runtime.

Stack VM will have its internal state structure to store current program counter, vm state, runtime stack, and program memory. Create the module called vm and create the below struct:

pub struct VM {
   
    //program counter
    pc: usize,
   
    //typ of value read
    type_info: i32,

    //data/instruction
    data: i32,
    running: bool,
    stack: VMStack,
    program_memory: Vec<i32>,
}

Initialise the VM with default state as follows:

VM {
    program_memory: Vec::new(),
    pc: 0,
    type_info: 0,
    data: 0,
    running: true,
}

We will implement a few methods on VM as follows:

1. load_program

Loads the user program from file to VM program_memory . When we run fetch as a first task or run cycle, it will advance the program counter, so PC will start the code execution at memory index 1. Hence, we need to load our program to code area starting from index 1. For this reason, we will push magic bit 0xBADC0DE to first position of vector, at index 0.

pub fn load_program(&mut self, instructions: &Vec<u32>) {
    debug!("Loading program...");
    //Insert magic bits to beginning of the program memory, so that we can start pc from 1
    self.program_memory.push(0xBADC0DE);
    for instruction in instructions {
        self.program_memory.push(*instruction);
    }
    debug!("Memory content : {:?}", self.program_memory);
}

2. fetch

Loads instructions from the program memory. We will just advance the program counter in this step:

fn fetch(&mut self) {
   if self.pc < self.program_memory.len() {
        self.pc += 1;
    } else {
        panic!("Incomplete code execution, memory boundary reached   without reading HALT instruction");
    }
}

3. decode

Extracts the operation instruction and data from the word(byte):

fn decode(&mut self) {
    let word = self.program_memory[self.pc];
    self.data = VM::get_data(word);
    self.type_info = VM::get_type(word);
}

4. run

Executes fetch-decode-execute cycle till program halts.

5. exec

Based on the decode information, it either pushes the data to stack or performs operation if the code type is Operation Instruction.

fn exec(&mut self) {
    if self.current_instruction_type() == CODE_TYPE_DATA_POSITIVE || self.current_instruction_type() == CODE_TYPE_DATA_NEGATIVE {
        debug!("Instruction type Data ({} = {} ) ", self.current_instruction_type(), self.current_data());
        self.stack.push(self.data);
    } else {
        //execute instruction
        debug!("Instruction type Operation ({}) , ", self.current_instruction_type());
        self.do_primitive();
    }
}

6. do_primitive

Executes instruction (do_primitive()).

This method will perform the arithmetic operation as defined in the spec. It will pop values from the top of the stack, perform the operation specified in instruction, and push the result back to the top of the stack. This is the Arithmetic Unit of our VM.

Put together all the above, and create main.rs as follows:

To run, we need to generate binary for our VM. We can use stack-langc compiler to generate the binary for us.

stack-langc uses simple postfix expression to generate the binary, so in order to get our program to add two numbers, create program.vm file with below content:

32 22 +

To generate binary, move to stack-langc repo and run the following:

$cargo run program.vm

This will generate a.out binary file with Stack-VM bytecode.

$hexdump a.out will show the following content for the a.out.

0000000 20 00 00 00 16 00 00 00 01 00 00 40 00 00 00 400000010

Now run Stack-VM program using:

$cargo run a.out

As we are using debug mode, you should see debug logs printed with stack content and the operation the VM is performing.

Memory content : [195936478, 32, 22, 1073741825, 1073741824]
Executing instructions...
Instruction type Data (0 = 32 ) 
Instruction type Data (0 = 22 ) 
Instruction type Operation (1) , 
[ ADD ] : 22 32 
TOS now : 54
Instruction type Operation (1) , 
[ HALT ] : Stopping VM

That’s all there is to it. This exercise is an attempt to understand Rust and stack-based VM implementation better. You can find the full project here at my Github page.