Compilation and Code Generation
This document outlines the process of compiling Ribbon code into executable bytecode and the optional, high-performance path of Just-In-Time (JIT) compilation to native machine code. The entire system is designed around a central principle: simplicity and modularity. The core VM interpreter is completely unaware of the JIT’s existence; it only knows how to execute bytecode and call functions that adhere to the builtin
ABI.
The Compilation Artifact: core.Bytecode
The fundamental unit of compiled Ribbon code is a core.Bytecode
object. This is a self-contained block of memory that holds everything needed to execute one or more functions. It consists of two main parts:
core.Header
: A metadata section at the beginning of the block. The header contains:AddressTable
: The heart of the linking strategy. This table is an array of pointers that maps theId
s used in instruction operands (likeId.of(core.Function)
) to the actual memory addresses of functions, globals, and constants. This indirection is what makes dynamic linking and JIT compilation possible.SymbolTable
: Optional metadata that maps human-readable names (e.g.,"my_module/my_function"
) to their correspondingId
s. This is primarily used for debugging, reflection, and dynamic linking by name.
- Instruction Data: A contiguous block of raw
InstructionBits
that immediately follows the header. TheAddressTable
contains pointers into this region for each defined function.
The Standard Compilation Process (AOT)
The default Ahead-of-Time (AOT) compilation path transforms Ribbon source code into an executable core.Bytecode
unit in memory.
- Parsing: Source code is parsed into an Abstract Syntax Tree (AST).
- Code Generation: A compiler backend traverses the AST and uses the
bytecode.Builder
API to generate a high-level, unencoded representation of the instructions for each function. - Encoding: The
bytecode.Encoder
takes the unencoded instructions and writes them out as a stream ofInstructionBits
into a memory buffer. It populates theAddressTable
andSymbolTable
within the header as it goes. - Finalization: The final result is a single, contiguous block of memory representing the
core.Bytecode
unit, ready to be loaded and executed by aFiber
.
At this stage, all functions in the unit are standard bytecode functions, and the call
instructions targeting them will use the bytecode
ABI.
The JIT Compilation Process (Optional Optimization)
The JIT compiler is an optional layer that dramatically improves performance by translating “hot” functions from bytecode into native machine code. Its integration is designed to be seamless and invisible to the core VM.
The Base Strategy: From bytecode
to builtin
The JIT compiler is a specialized function that:
- Takes as Input: A
core.Function
(i.e., a pointer to a function’s bytecode and its associated metadata) within acore.Bytecode
unit. - Produces as Output: A
core.AllocatedBuiltinFunction
. This is a block of executable machine code with a function signature that matches thebuiltin
ABI:fn (*mem.FiberHeader) callconv(.c) BuiltinSignal
.
The “magic” happens in the final step:
The “Swap”: After generating the new native code, the JIT (or the host environment managing it) modifies the AddressTable
within the Header
. It finds the entry for the original core.Function
and replaces the pointer to point to the new core.AllocatedBuiltinFunction
.
From this point on, any call_c
instruction that uses the ID of the JIT-compiled function will be dispatched by the interpreter as a standard builtin
call. The interpreter simply calls the function pointer; it has no knowledge that the native code it’s executing was generated just moments before. This allows JIT-compiled code and interpreted code to coexist and call each other transparently.
Note on Dynamic Calls: This ‘swap’ strategy is most effective for direct, statically-known calls (via
call_c
). Handling dynamic calls (via thecall
instruction, where the function address is in a register) is more complex, as theAddressTable
is not consulted at the call site. Ensuring that dynamically-loaded function pointers are also JIT-aware is a subject for future design and optimization.
Handling Foreign (C) Calls
Directly calling C functions requires conforming to the platform’s specific C ABI (e.g., passing arguments in specific physical registers). The VM interpreter cannot perform this complex task. This is a primary responsibility of the JIT compiler.
When the JIT encounters a call
instruction with the foreign
ABI, it generates the necessary “glue” code to marshal arguments from the Fiber
’s virtual registers into the correct physical registers and stack layout required by the C ABI, and then performs the native call.
JIT-only Instructions
The ISA includes instructions marked with a jit
tag. These instructions serve as hints or directives for an optimizing JIT compiler.
- Behavior: The interpreter must treat all
jit
instructions asnop
. This ensures that bytecode containing these optimizations can still be executed correctly by the baseline VM, maintaining forward compatibility. - Purpose: These instructions can be used to pass higher-level information to the JIT that would be difficult to recover from the bytecode alone, such as aliasing information, loop metadata, or profile-guided optimization hints.