Twitter: @Darkfloyd1014
Happy lunar new year everyone who supports us, wish you all have a healthy and wealthy year.
Other than fuzzing for vulnerabilities of our previous blogpost, it would be good to understand how Maglev compiler works via debugging. In the following article, we are going to briefly introduce about it.
In Figure 1, it shows a simple JavaScript program. It calls a function which set up a new array. Afterwards, there is a for loop to access and update element of array. In the main program, we call once, and trigger Maglev optimization with Native Call instruction for the function f0, finally, we call the function again.
Figure 2 shows disassembly of bytecode generated by the Maglev JIT compiler of the V8 JavaScript engine. This bytecode represents the low-level operations that the V8 engine’s interpreter will execute. Each line corresponds to a different operation or instruction. The hexadecimal numbers on the left represent memory addresses, and the numbers in square brackets indicate operands or arguments for the instructions. We can walk through each instruction corresponding to the JavaScript program in Figure 1, it is pretty straight forward and easy to understand:
LdaGlobal [0], [0]: Load a global variable into the accumulator. The indices are likely offsets in a context pool.
Star2: Store the value from the accumulator into register r2.
LdaSmi [7]: Load a small integer (Smi) with value 7 into the accumulator.
Star3: Store the value from the accumulator into register r3.
Ldar r2: Load the value from register r2 into the accumulator.
Construct r2, r3-r3, [2]: It is an interesting instruction. Invoke a constructor with the new target in r2, the arguments in registers r3 to r3 (meaning just one argument), and an argument count of 2 (including the receiver). You can say that it is going to create a new array construct.
Star0: Store the value from the accumulator into register r0.
LdaZero: Load the integer value 0 into the accumulator.
Star1: Store the value from the accumulator into register r1.
LdaSmi [25]: Load a small integer (Smi) with value 25 into the accumulator.
TestLessThan r1, [4]: Compare if the value in register r1 is less than the Smi 4.
JumpIfFalse [26]: If the comparison is false, jump to the instruction at offset 26 (from the current position). Simply jump to (0x57a0000224c @ 48)
LdaConstant [1]: Load a value from the constant pool at index 1 into the accumulator.
Star3: Store the value from the accumulator into register r3.
Ldar r1: Load the value from register r1 into the accumulator.
Add r3, [5]: Add the value in register r3 to the Smi 5 and store the result in the accumulator.
Star3: Store the value from the accumulator into register r3.
Ldar r1: Load the value from register r1 into the accumulator.
SetKeyedProperty r0, r3, [6]: Set a property on the object in register r0 with a key corresponding to the value in register r3 and a value of Smi 6. It simply shows the array value is accessed and updated.
Ldar r1: Load the value from register r1 into the accumulator.
Inc [8]: Increment the value in the accumulator by Smi 8.
Star1: Store the value from the accumulator into register r1.
JumpLoop [27], [0], [9]: Jump back to a loop start at offset 27 from the current position (i.e. 0x57a0000222d @ 17), with loop nesting level 0 and a trip count offset of 9.
LdaUndefined: Load the undefined value into the accumulator.
Return: Return the value in the accumulator as the result of the function and finished.
Core of Maglev Compilation
Be honest, we have no idea how it works internally in Maglev until with reference to the Blackhat Europe 2023 presentation (P.11 — P.20) from Bohan Liu from Tencent. Here is the compilation stages of Maglev (captured from the Liu’s presentation). However, we can obtain more information from the documentation of LLVM:
Figures 3.x – Figures 7.x are the complete screenshots cover the entire Maglev compilation stages.
Fundamentals
We are better to cover the following core concepts about SSA and Phi before discussing further.
SSA: Static single assignment form (SSA) is a property of an intermediate representation (IR) that requires each variable to be assigned exactly once and defined before it is used.
Phi(Φ): Generate a new definition by “choosing” either y1 or y2, depending on the control flow in the past.
The first is that all assignments in SSA are to variables with distinct names, hence the term static single-assignment and Phi function is to combine two definitions of variable x (shown in the above digram) (reference: p.369–370, Compilers Principles, Techniques & Tools).
With the code example from https://mapping-high-level-constructs-to-llvm-ir.readthedocs.io/en/latest/control-structures/ssa-phi.html, it shows that the Phi instruction provides SSA form such that each variable is defined once. Meanwhile, avoiding unnecessary store/load operations into/from memory, it depends on saving the value to register.
For this Phi instruction, it means if the previous block is %btrue, then use value %a; if the previous block is %bfalse, then use value %b.
%retval = phi i32 [%a, %btrue], [%b, %bfalse]
From the above Phi diagram, it displays generating new definition based on the previous control flow.
Graph Building
During this stage, it attempts to do the following actions:
Turn bytecode to SSA nodes
Create Phi Nodes Loop & Try/Catch
Split Nodes into basic blocks
Store a snapshot copy of the interpreter frame
Inlining
Here’s an explanation of the various parts of the IR (Figures 3–1 to 3–3):
Constants: These are constant values that have been identified in the code. They are labeled with their memory addresses, descriptions, and the number of times they are used within the function.
Blocks: These represent basic blocks, a sequence of instructions that will be executed sequentially without any jumps or branches (except at the end). For instance, Block b1 is the entry point of the function, and Block b2, Block b3, Block b4, and Block b5 represent other parts of the control flow within the function.
Operations: Each line within a block represents an operation or instruction. Some key instructions include:
LdaGlobal:
Load a global variable.
InitialValue:
The initial value of a parameter or this.
FunctionEntryStackCheck:
A check performed to prevent stack overflow.
Jump:
A control flow operation to jump to another block.
Construct: An operation to construct a new object, probably using the new keyword.
Add:
An addition operation.
SetKeyedProperty:
Setting a property of an object with a dynamic key.
Inc:
Increment an integer value.
Int32IncrementWithOverflow:
Same as Inc, but specifically for 32-bit integers and checking for overflow.
Int32ToNumber:
Convert a 32-bit integer to a JavaScript number.
TestLessThan:
A comparison operation to test if one value is less than another.
CheckedSmiUntag:
Converting a value from a small integer (Smi) format to a regular number.
Int32Compare:
A comparison operation using 32-bit integers.
JumpIfFalse:
Jump to a block if a condition is false.
BranchIfInt32Compare:
Branch to one of two blocks based on a 32-bit integer comparison.
Return:
Return from the function.
Deoptimization Points: The ↳ lazy @ and ↱ eager @ notations indicate points in the code where lazy or eager deoptimization checks are inserted. Lazy deoptimization means the function will continue running the optimized code until it exits, at which point it may deoptimize if needed. Eager deoptimization means it will deoptimize immediately if the conditions for optimization are no longer met.
Live Variables: The mentions of “live vars” show which variables are “live” at each operation, meaning they hold values that are needed for future operations.
Gap Moves: These are used to move values around to satisfy the register allocation constraints during the compilation. They typically occur during control flow changes (like jumps).
φᵀ (Phi) Nodes: These are typically used in SSA (Static Single Assignment) form, which Maglev likely uses, to merge different values coming from different predecessors in the control flow graph. In this case, φᵀ r1 (n22, n33) merges the values from n22 and n33 into r1.
At the beginning, we cannot identify any documents from Google about Phi. After checking out a post from Stack Overflow and the example mentioned before in LLVM, we got a clearer picture about it.
With the tagging, it is related to the analysis passes in LLVM, this LLVM documentation about Phi and passes gives a lot of details. Phi Tagging is part of Representation Selecting stage.
Representation Selecting: Phi Tagging and Untagging
In the previous section, we have found all Phi nodes are tagged after graph building. For Phi untagging, it helps to remove unnecessary nodes with single predecessor. However, with this JavaScript code example, there is nothing changed after untagging stage.
Phi Untagging means remove the tagging of some Phi(s) based on their input and output representation
Use Marking
The purpose of this stage is to annotate the intermediate representation (IR) of the code with information about how values (constants, variables, etc.) are used within the program. This helps the compiler make decisions about optimizing the code, such as register allocation, instruction scheduling, and identifying opportunities for code simplification and dead code elimination.
In this context, the “uses” refers to how many times a particular value is used in the code after being defined. This information is critical for the compiler to make efficient use of CPU registers and to minimize the need to access memory (which is slower than accessing registers). The more a value is used, the more beneficial it may be to keep it in a register.
Here’s a breakdown of the “After Use Marking” information provided (Figures 5–1 and 5–2) :
Constant references (like 4: Constant(…) → (x), 14 uses)
indicate that a particular constant value is used 14 times within the scope of the generated code.
The “After Use Marking” stage is a form of data flow analysis that informs later optimization passes in the Maglev JIT compilation process. It allows the compiler to make more informed decisions about which values to keep in registers and when to spill them to the stack, as well as identifying when certain values are no longer needed and can be discarded.
Register Allocation Pre-Processing
Before a function can be executed, the variables it uses must be allocated to specific locations in memory or registers. This process involves determining the lifetime of each variable, which is known as its “live range.” The live range is the portion of the program where the variable holds a value that may be needed.
During this stage, it attempts to carry out the following activities:
Dead code marking and removing
Collect input/output location constraints
Find the maximum number of stack arguments passed to calls
Collect use information, for SSA liveness and next use distance
For Figures 6–1, let us attempt to break down the details within the function’s execution as below:
Constant(0x125000143c89 <NativeContext[285]>) is a reference to some native context, likely related to internal V8 context for global object or functions. It is assigned to a virtual register v-1 and is alive from the very beginning of the function (point 1) to point 30 in the execution flow.
Constant(0x12500015af55 <JSFunction f0 (sfi = 0x12500015aea9)>) represents a JavaScript function f0, also assigned to v-1, and has a live range from point 2 to point 30.
Constant(0x12500014e62d <JSFunction Array (sfi = 0x1250002adde5)>) is the built-in JavaScript Array constructor function, assigned to v-1 and needed from point 3 to point 15.
Constant(0x125000002b39 <String[1]: #p>) is a constant string with a single character "p", assigned to v-1, and its value is required from point 4 to point 30.
RootConstant(undefined_value) represents the JavaScript undefined value, assigned to v-1, with a live range from point 5 to point 32.
SmiConstant(0) indicates a small integer (Smi) constant with the value 0, assigned to v-1, and is needed from point 6 to point 18.
SmiConstant(7) is another Smi constant with the value 7, assigned to v-1, and is alive from point 7 to point 15.
SmiConstant(25) represents a Smi with the value 25, assigned to v-1, and its value is required from point 8 to point 30.
Int32Constant(0) and Int32Constant(25) are 32-bit integer constants with values 0 and 25, respectively, both assigned to v-1 and have live ranges from points 9 to 18 and 10 to 30, respectively.
All the constants are allocated to a virtual register v-1
which suggests that they are not assigned to physical registers at this point in time. This is part of the register allocation preprocessing where the compiler is determining the necessary lifetimes of values but hasn't yet made any final decisions about physical register assignments.
The → v-1
indicates that these values are assigned to a temporary virtual register by the register allocator. During the register allocation process, virtual registers are used as placeholders before the allocator maps them to physical registers or stack slots.
The numbers preceding each constant (e.g., 1/4:
, 2/3:
) are identifiers for each operation or constant value, along with a count that might represent the number of references or uses of that constant within the live range.
The live range is critical for the register allocator to efficiently use the limited number of CPU registers and to determine when values need to be stored to or loaded from memory. The allocation has to balance the need for fast access to these values (ideally keeping them in registers as much as possible) with the constraints of the target machine’s architecture.
We attempt to break down the details of Block 1 and Block 2 (according to Figure 6–1 and Figure 6–2) as below:
Here’s a breakdown of the information provided:
Block b1
This block is labeled b1.
It references a SharedFunctionInfo for function f0 and the source file poc_maglev_1.js.
The line 0 : LdaGlobal [0], [0] is a load accumulator instruction to load a global variable.
11/1: InitialValue(<this>) → v-1(=-6S) indicates the initial value of this is assigned to virtual register v-1, which corresponds to a stack slot -6S. Its live range extends from instruction 11 to 30.
12/2: InitialValue(a0) → v-1(=-7S) is similar, but for the first argument a0, also assigned to v-1 and a stack slot -7S, with the same live range as the this value.
13/6: FunctionEntryStackCheck is a stack check at the function entry point.
14/7: Jump b2 indicates a jump to block b2.
Block b2
This block is labeled b2.
9 : Construct r2, r3-r3, [2] is a constructor call that creates a new object. The registers r2, r3 to r3 with an argument count [2] are involved in this operation.
15/10: 🐢 Construct [...] → v-1(=rax) indicates a constructor call with a turtle (🐢) emoji, which might symbolize a slow path or a deoptimization checkpoint. The result of the constructor call is placed in v-1, which corresponds to the register rax. The list within the brackets are the arguments and their corresponding registers. The live range is from 15 to 30, and a lazy deoptimization point is noted at instruction 9 with 4 live variables at that point.
29 : Add r3, [5] is an addition operation.
16/17: 🐢 GenericAdd [...] → v-1(=rax) represents a generic addition operation with a potential deoptimization point at instruction 29. The result is again in v-1 (rax). The live range is only from 16 to 17, indicating a very short usage window for the result.
35 : SetKeyedProperty r0, r3, [6] is a keyed property set operation.
17/18: 🐢 CallBuiltin(KeyedStoreIC_Megamorphic) [...] → v-1(=rax) suggests a call to a built-in megamorphic inline cache (IC) for setting a keyed property. The input operands and the output are indicated, along with the register assignments. The live range is not indicated, but there's a deoptimization point at instruction 35 with 5 live variables.
Each 🐢
followed by an operation (like Construct
, GenericAdd
, or CallBuiltin
) indicates that these operations have deoptimization points. The "lazy" tag signifies that the deoptimization information is only considered if those operations indeed deoptimize.
The notation vX/nY:v-Z(=rW)
seems to represent virtual register vX
with some node ID nY
and a mapping to a stack slot or CPU register rW
. The v-1
indicates the use of a single virtual register for the result of these operations. The actual CPU register in which this virtual register is placed is indicated in parentheses (e.g., rax
, rbx
, rdi
, rdx
, rsi
).
The presence of stack slots (e.g., -6S
or -7S
) shows that some values are stored on the stack instead of registers, which can happen when there are not enough physical registers available or when preparing for a possible deoptimization.
Register Allocation
Tracing the register allocation is useful especially whether there are conditions that register value are accessed or overwritten by other operations.
We attempt to break down the analysis for the Block 1 and Block 2 (Figures 7–1 and 7–2 ) as below:
Block b1
The block starts with a reference to the SharedFunctionInfo for function f0 and its source file poc_maglev_1.js.
0 : LdaGlobal [0], [0]: This operation loads a global variable into the accumulator.
11/1: InitialValue(<this>) → [stack:-6|t]: The initial value of this is stored on the stack at position -6. The |t indicates that the value is tagged, which in V8 typically means it’s a pointer to a JavaScript object or a small integer. Its live range is from instruction 11 to 30.
12/2: InitialValue(a0) → [stack:-7|t]: The initial value of the first argument a0 is also stored on the stack at position -7, with the same tagging convention and live range as the this value.
13/6: FunctionEntryStackCheck: This is a check to ensure there’s enough stack space for the function to execute without causing a stack overflow.
14/7: Jump b2: Control jumps to block b2.
Block b2
36: ConstantGapMove(v3/n8 → [rdi|R|t]): This moves a constant value v3/n8 into the rdi register. The R indicates it’s a register, and |t again indicates tagging.
37: GapMove([rdi|R|t] → [rdx|R|t]): This moves the value from rdi to rdx, both registers.
38: ConstantGapMove(v1/n4 → [rsi|R|t]): This moves another constant value v1/n4 into the rsi register.
9 : Construct r2, r3-r3, [2]: This operation constructs a new object with the Construct instruction, likely calling a constructor function with arguments.
15/10: 🐢 Construct [...] → [rax|R|t] (spilled: [stack:0|t]): This represents a constructor call that may trigger a deoptimization (indicated by the turtle emoji). The result of the constructor call is assigned to the rax register and also spilled to the stack at position 0. The live range is from 15 to 30, and a lazy deoptimization checkpoint is at instruction 9 with 4 live variables.
39: GapMove([rax|R|t] → [rdx|R|t]): This moves the value from rax to rdx.
40: ConstantGapMove(v4/n16 → [rax|R|t]): This moves a constant value v4/n16 into rax.
41: ConstantGapMove(v6/n11 → [rbx|R|t]): This moves another constant value v6/n11 into rbx.
Figure 7–1. After register allocation
Summary and Credits
It is a real good experience to write an article about compilation stages, it grants a way to us and get to know every stages. We will keep this as a series to show the Maglev vulnerability analysis with debugging for any interesting cases. Moreover, this blogpost can be extended by referencing to the Maglev codes and verify every action.
We are very thankful to the discussion with and knowledge sharing from Bohan Liu (Twitter: @P4nda20371774) from Tencent Security Xuanwu Lab and his presentation about Maglev vulnerabilities at Blackhat Europe 2023 is insightful and detailed. Meanwhile, he is going to give a talk at Blackhat Asia 2024, we are looking forward for it.
In addition, we are thankful to @wwkenwong and Carl Smith from Google v8 security for his review and advice.