From 85b5c28700d18fbf8256914475ef7ded670935d2 Mon Sep 17 00:00:00 2001 From: Diego Civini Date: Tue, 27 May 2025 18:18:38 -0300 Subject: [PATCH 01/28] Add SPECS.md with instructions info --- docs/SPECS.md | 219 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 219 insertions(+) create mode 100644 docs/SPECS.md diff --git a/docs/SPECS.md b/docs/SPECS.md new file mode 100644 index 0000000000..a0032886aa --- /dev/null +++ b/docs/SPECS.md @@ -0,0 +1,219 @@ +# Specifications + +## Instructions +### Operations +Each Cairo instruction represents one of the following operations: + +- `AddAP`: Increases the AP register. +- `AssertEq`: Asserts that two values are equal. Also used to write memory cells. +- `Call`: A relative or absolute call. +- `Jnz`: A conditional jump. +- `Jump`: An unconditional jump. +- `Ret`: Returns from a call operation. + +However, the binary encoding is not centered around the operation to perform, but around specific aspects of the instruction execution (i.e. how to modify the AP register). + +### Encoding + +The instruction encoding is specified in the [Cario whitepaper](https://eprint.iacr.org/2021/1063.pdf), page 32. + +``` +┌─────────────────────────────────────────────────────────────────────────┐ +│ off dst (biased representation) │ +├─────────────────────────────────────────────────────────────────────────┤ +│ off op0 (biased representation) │ +├─────────────────────────────────────────────────────────────────────────┤ +│ off op1 (biased representation) │ +├─────┬─────┬───────────┬───────┬───────────┬─────────┬──────────────┬────┤ +│ dst │ op0 │ op1 src │ res │ pc update │ ap │ opcode │ 0 │ +│ reg │ reg │ │ logic │ │ update │ │ │ +├─────┼─────┼───┬───┬───┼───┬───┼───┬───┬───┼────┬────┼────┬────┬────┼────┤ +│ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │ 9 │ 10 │ 11 │ 12 │ 13 │ 14 │ 15 │ +└─────┴─────┴───┴───┴───┴───┴───┴───┴───┴───┴────┴────┴────┴────┴────┴────┘ +``` +The figure shows the structure of the 63-bit that form the first word of each instruction. +- The bits are ordered in a little-endian-like encoding, this implies that `off_dst` is located at bits `[0;15]`, while `dst_reg` is located at bit `48`. +- The last bit (`63`) is unused. +- **TODO**: document the opcode extensions. + +Each sets of fields determine different aspects of the instruction execution: +- `off_op0`, `off_op1`, `op0_reg`, `op1_src` determine the location of each operand. +- `off_dst`, `dst_reg` determine the location of the destionation. +- `res_logic` determine which operation to compute with the operands. +- `pc_update` determines how the PC register is updated. +- `ap_update` determines how the AP register is updated. +- `opcode` determines both how the FP register is updated, and how some memory cell values are deduced. + +> [!NOTE] +> In our VM: +> off0 = off_dst +> off1 = off_op0 +> off2 = off_op1 + +### Auxiliary Variables + +The instruction execution uses four auxiliary variables, that can be computed from the memory values, and the instruction fields: +- `dst`: Destination. +- `op0`: First operand. +- `op1`: Second operand. +- `res`: Operation result. + +Depending on the instruction, the values of `dst`, `op0` and `op1` may be unknown at the start of execution, and will be deduced during it. + +### Computing `dst` + +The value of `dst` is computed as the value at address `register + off_dst`, where `register` depends on the value of `dst_reg`: +- `dst_reg == 0`: We use `AP`. +- `dst_reg == 1`: We use `FP`. + +If the value at the specified address is undefined, then it must be deduced instead. + +### Computing `op0` + +The value of `op0` is computed as the value at address `register + off_op0`, where `register` depends on the value of `op0_reg`: +- `op0_reg == 0`: We use `AP` +- `op0_reg == 1`: We use `FP`. + +If the value at the specified address is undefined, then it must be deduced instead. + +### Computing `op1` + +The value of `op1` is computed as the value at address `base + off_op1`, where `base` depends on the value of `op1_src`: +- `op1_src == 0`: We use `op0`. +- `op1_src == 1`: We use `pc`. +- `op1_src == 2`: We use `FP`. +- `op1_src == 4`: We use `AP`. +- Otherwise: The instruction is invalid. + +If the value at the specified address is undefined, then it must be deduced instead. + +> [!NOTE] +> When `op1_src == 1` we must assert that off_op1 == 1, so that op1 is an immediate value. This constraint is not specified in the whitepaper, but enforced by our VM. + +### Computing `res` + +The variable `res` computation depends on `res_logic`. +- `res_logic == 0`: We set `res = op1`. +- `res_logic == 1`: We set `res = op0 + op1`. +- `res_logic == 2`: We set `res = op0 * op1`. +- Otherwise: The instruction is invalid. + +> [!NOTE] +> The value of `res` won’t always be used. For example, it won’t be used when `pc_update == 4`. + +### Additional Constraints + +1. When `opcode == 1` (Call), the following conditions must be met: +- `off_dst == 0` +- `dst_reg == AP` +- `off_op0 == 1` +- `op0_reg == AP` +- `ap_update == 0` (add 2) + +2. When `opcode == 2` (Return), the following conditions must be met: +- `off_dst == -2` +- `dst_reg == FP` +- `off_op1 == -1` +- `op1_src == FP` +- `res_logic == 0` (op1) +- `pc_update == 1` (absolute jump) + + +> [!NOTE] +> These constraints are not specified in the whitepaper, but enforced by our VM. If +> they are not met, then the instruction is **invalid**. + +### Deductions + +Some values may be undefined because the associated memory locations haven’t been set. In this case, they must be *deduced*, or *asserted*. + +An **assertion** verifies that two values are equal. This implies deducing one of the values if its undefined. When deducing a value, the corresponding memory cell must be updated with the deduced value. + +If the memory cell corresponds to a builtin segment, the associated builtin runner should be used to assert the value of that memory cell. + +Otherwise, the value will be deduced based on the `opcode`. There are 4 different types of operations: + +- `opcode == 0`: A no-op (no operation). +- `opcode == 1`: A call operation. + - Asserts that `op0 == pc + instruction_size`. + - Asserts that `dst == fp`. +- `opcode == 2`: A ret operation. +- `opcode == 3`: An assert equal operation. + - Asserts that `dst == res`. This may imply deducing `op0` value so that `res == dst`, by performing the inverse operation. + +The `instruction_size` is always `1`, unless `op1` is an immediate, in which case it will be `2`. + + +> [!NOTE] +> If a value is undefined and cannot be deduced, the instruction execution must fail. +> This constraint is not specified in the whitepaper, but enforced by our VM. + +### Updating Registers + +At the end of the execution, the registers must be updated according to the instruction flags. + +### Updating the PC + +The updated PC will be denoted by `next_pc`. + +When updating the program counter, we depend primarily on the `pc_update` field: +- `pc_update == 0`: Advance program counter. + - Set `next_pc = pc + instruction_size`. +- `pc_update == 1`: Absolute jump. + - Set `next_pc = res`. +- `pc_update == 2`: Relative jump. + - Set `next_pc = pc + res`. +- `pc_update == 4`: Conditional relative jump. + - If `dst == 0`: Set `next_pc = pc + instruction_size`. + - If `dst != 0`: Set `next_pc = pc + op1`. +- Otherwise: The instruction is invalid. + +Additionally, when `pc_update == 4`, the following conditions must be met: + +- `res_logic == 0`. (op1) +- `opcode == 0`. (no-op) +- `ap_update != 1`. (add res) + +Otherwise: The instruction is invalid. + +> [!NOTE] +> In our VM: +> `new_pc` = `next_pc` + +### Updating the AP + +The updated AP will be denoted by `next_ap`. + +When updating the allocation pointer, we depend primarily on the `ap_update` field, but also on the current operation: + +- If the `opcode` is *call*, then we must assert that `ap_update == 0`: + - Set `next_ap = ap + 2` +- Else, depending on `ap_update`. + - `ap_update == 0`: Set `next_ap = ap` + - `ap_update == 1`: Set `next_ap = ap + res` + - `ap_update == 2`: Set `next_ap = ap + 1` +- Otherwise: The instruction is invalid. + +> [!NOTE] +> In our VM: +> `new_apset` = `next_ap` + +### Updating the FP + +The updated FP will be denoted by `next_fp`. + +When updating the frame pointer, we depend on the `opcode`: + +- `opcode == 0`: A no-op (no operation). + - Set `next_fp = fp`. +- `opcode == 1`: A call operation. + - Set `next_fp = ap + 2`. +- `opcode == 2`: A return operation. + - Set `next_fp = dst`. +- `opcode == 3`: An assert equal operation. + - Set `next_fp = fp`. +- Otherwise: The instruction is invalid. + +> [!NOTE] +> In our VM: +> `new_fp_offset` = `next_fp` From 4c76d36c8a21c43a910aa615cf5adc45f14153cf Mon Sep 17 00:00:00 2001 From: Diego Civini Date: Wed, 28 May 2025 09:27:55 -0300 Subject: [PATCH 02/28] Minor changes --- docs/SPECS.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/docs/SPECS.md b/docs/SPECS.md index a0032886aa..f038832ef2 100644 --- a/docs/SPECS.md +++ b/docs/SPECS.md @@ -1,7 +1,7 @@ # Specifications ## Instructions -### Operations +## Operations Each Cairo instruction represents one of the following operations: - `AddAP`: Increases the AP register. @@ -13,7 +13,7 @@ Each Cairo instruction represents one of the following operations: However, the binary encoding is not centered around the operation to perform, but around specific aspects of the instruction execution (i.e. how to modify the AP register). -### Encoding +## Encoding The instruction encoding is specified in the [Cario whitepaper](https://eprint.iacr.org/2021/1063.pdf), page 32. @@ -46,11 +46,11 @@ Each sets of fields determine different aspects of the instruction execution: > [!NOTE] > In our VM: -> off0 = off_dst -> off1 = off_op0 -> off2 = off_op1 +> - `off0` = `off_dst` +> - `off1` = `off_op0` +> - `off2` = `off_op1` -### Auxiliary Variables +## Auxiliary Variables The instruction execution uses four auxiliary variables, that can be computed from the memory values, and the instruction fields: - `dst`: Destination. @@ -88,7 +88,7 @@ The value of `op1` is computed as the value at address `base + off_op1`, where ` If the value at the specified address is undefined, then it must be deduced instead. > [!NOTE] -> When `op1_src == 1` we must assert that off_op1 == 1, so that op1 is an immediate value. This constraint is not specified in the whitepaper, but enforced by our VM. +> When `op1_src == 1` we must assert that `off_op1 == 1`, so that `op1` is an immediate value. This constraint is not specified in the whitepaper, but enforced by our VM. ### Computing `res` @@ -101,7 +101,7 @@ The variable `res` computation depends on `res_logic`. > [!NOTE] > The value of `res` won’t always be used. For example, it won’t be used when `pc_update == 4`. -### Additional Constraints +## Additional Constraints 1. When `opcode == 1` (Call), the following conditions must be met: - `off_dst == 0` @@ -123,7 +123,7 @@ The variable `res` computation depends on `res_logic`. > These constraints are not specified in the whitepaper, but enforced by our VM. If > they are not met, then the instruction is **invalid**. -### Deductions +## Deductions Some values may be undefined because the associated memory locations haven’t been set. In this case, they must be *deduced*, or *asserted*. @@ -148,7 +148,7 @@ The `instruction_size` is always `1`, unless `op1` is an immediate, in which cas > If a value is undefined and cannot be deduced, the instruction execution must fail. > This constraint is not specified in the whitepaper, but enforced by our VM. -### Updating Registers +## Updating Registers At the end of the execution, the registers must be updated according to the instruction flags. From 922fd22b9343e5457aae3ef07b7680c6c0ad5cf3 Mon Sep 17 00:00:00 2001 From: Diego Civini Date: Thu, 29 May 2025 16:03:00 -0300 Subject: [PATCH 03/28] Add brief explanation of Opcode Extensions --- docs/SPECS.md | 34 +++++++++++++++++++++++++++++++++- 1 file changed, 33 insertions(+), 1 deletion(-) diff --git a/docs/SPECS.md b/docs/SPECS.md index f038832ef2..6b03003180 100644 --- a/docs/SPECS.md +++ b/docs/SPECS.md @@ -34,7 +34,6 @@ The instruction encoding is specified in the [Cario whitepaper](https://eprint.i The figure shows the structure of the 63-bit that form the first word of each instruction. - The bits are ordered in a little-endian-like encoding, this implies that `off_dst` is located at bits `[0;15]`, while `dst_reg` is located at bit `48`. - The last bit (`63`) is unused. -- **TODO**: document the opcode extensions. Each sets of fields determine different aspects of the instruction execution: - `off_op0`, `off_op1`, `op0_reg`, `op1_src` determine the location of each operand. @@ -50,6 +49,39 @@ Each sets of fields determine different aspects of the instruction execution: > - `off1` = `off_op0` > - `off2` = `off_op1` +## Opcode Extensions + +With the realease of Stwo, a new field was introduced named `opcode_extension`. Execution of instructions vary depending on its extension. The 4 types are `Stone`, `Blake`, `BlakeFinalize` and `QM310`. + +> [!NOTE] +> These are not specified on the whitepaper + +### Stone + +This type is used when `opcode_extension == 0`. Its does not add new behaviour since its the original extension + +### Blake & BlakeFinalize + +Blake is used when `opcode_extension == 1` and BlakeFinalize is used when `opcode_extension == 2` and they have some constraints, if these are not met while having one of the the blake extensions, the instructions becomes invalid: +- `opcode == 0` (no operation) +- `op1_src == 2` (FP) or `op1_src == 4` (AP) +- `res_logic == 0` (op1) +- `pc_update == 0` (Regular) +- `ap_update == 0` (Regular) || `ap_update == 2` (Add1) + +After checking the constraints, if we are working with a `Blake` extension the blake2 algorithm is applied. The operands are contained as follows: +- `op0_addr` points to a sequence of 8 -> represents a state +- `op1_addr` points to a sequence of 16 felts -> represents a message +- `dst_addr` holds the value of the counter +- `ap` points to a sequence of 8 cells which each should either be uninitialised or already contain a value matching that of the output + +The output consists of 8 felts that represent the output of the Blake2s compression. + +On the other side, if we are working with the `BlakeFinalize` extension the operands are the same as with `Blake` with only one change: +- `op0_addr` points to a sequence of 9 felts -> first 8 felts represent the state the last one represents the state + +The out here represents the Blake2s compression of the last block. + ## Auxiliary Variables The instruction execution uses four auxiliary variables, that can be computed from the memory values, and the instruction fields: From 82c6a705e962735e74f8d071c5f8f83a02119601 Mon Sep 17 00:00:00 2001 From: Diego Civini Date: Thu, 29 May 2025 16:07:31 -0300 Subject: [PATCH 04/28] Minor changes --- docs/SPECS.md | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/docs/SPECS.md b/docs/SPECS.md index 6b03003180..c29311b821 100644 --- a/docs/SPECS.md +++ b/docs/SPECS.md @@ -51,7 +51,7 @@ Each sets of fields determine different aspects of the instruction execution: ## Opcode Extensions -With the realease of Stwo, a new field was introduced named `opcode_extension`. Execution of instructions vary depending on its extension. The 4 types are `Stone`, `Blake`, `BlakeFinalize` and `QM310`. +With the realease of Stwo, a new field was introduced named `opcode_extension`. Execution of instructions vary depending on its extension. The 4 types are `Stone`, `Blake`, `BlakeFinalize` and `QM31Operation`. > [!NOTE] > These are not specified on the whitepaper @@ -82,6 +82,14 @@ On the other side, if we are working with the `BlakeFinalize` extension the oper The out here represents the Blake2s compression of the last block. +### QM31 + +In this case, when `opcode_extension == 3` we are working with the `QM31Operation` which changes how the arithmetic (add, mul, sub, div) works on the VM by doing it with QM31 elements in reduced form. Again there are some constraints, if these are not met the instruction becomes invalid: +- `res_logic == 1` (Add) || `res_logic == 2` (Mul) +- `op1_src != 0` (Op0) +- `pc_update == 0` (Regular) +- `ap_update == 0` (Regular) || `ap_update == 2` (Add1) + ## Auxiliary Variables The instruction execution uses four auxiliary variables, that can be computed from the memory values, and the instruction fields: From 04093af98a8d34d4df09ea761e5f90c311331fe2 Mon Sep 17 00:00:00 2001 From: Diego Civini Date: Thu, 29 May 2025 17:26:34 -0300 Subject: [PATCH 05/28] Remove incorrect constraint --- docs/SPECS.md | 8 -------- 1 file changed, 8 deletions(-) diff --git a/docs/SPECS.md b/docs/SPECS.md index c29311b821..319390a5ce 100644 --- a/docs/SPECS.md +++ b/docs/SPECS.md @@ -208,14 +208,6 @@ When updating the program counter, we depend primarily on the `pc_update` field: - If `dst != 0`: Set `next_pc = pc + op1`. - Otherwise: The instruction is invalid. -Additionally, when `pc_update == 4`, the following conditions must be met: - -- `res_logic == 0`. (op1) -- `opcode == 0`. (no-op) -- `ap_update != 1`. (add res) - -Otherwise: The instruction is invalid. - > [!NOTE] > In our VM: > `new_pc` = `next_pc` From b282b86156807a3688a9c81012ab85d2aababe1e Mon Sep 17 00:00:00 2001 From: Diego Civini Date: Tue, 3 Jun 2025 15:27:13 -0300 Subject: [PATCH 06/28] Fix titles and subtitles sizes --- docs/SPECS.md | 34 +++++++++++++++++----------------- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/docs/SPECS.md b/docs/SPECS.md index 319390a5ce..ded9c07b47 100644 --- a/docs/SPECS.md +++ b/docs/SPECS.md @@ -1,7 +1,7 @@ # Specifications ## Instructions -## Operations +### Operations Each Cairo instruction represents one of the following operations: - `AddAP`: Increases the AP register. @@ -13,7 +13,7 @@ Each Cairo instruction represents one of the following operations: However, the binary encoding is not centered around the operation to perform, but around specific aspects of the instruction execution (i.e. how to modify the AP register). -## Encoding +### Encoding The instruction encoding is specified in the [Cario whitepaper](https://eprint.iacr.org/2021/1063.pdf), page 32. @@ -49,18 +49,18 @@ Each sets of fields determine different aspects of the instruction execution: > - `off1` = `off_op0` > - `off2` = `off_op1` -## Opcode Extensions +### Opcode Extensions With the realease of Stwo, a new field was introduced named `opcode_extension`. Execution of instructions vary depending on its extension. The 4 types are `Stone`, `Blake`, `BlakeFinalize` and `QM31Operation`. > [!NOTE] > These are not specified on the whitepaper -### Stone +#### Stone This type is used when `opcode_extension == 0`. Its does not add new behaviour since its the original extension -### Blake & BlakeFinalize +#### Blake & BlakeFinalize Blake is used when `opcode_extension == 1` and BlakeFinalize is used when `opcode_extension == 2` and they have some constraints, if these are not met while having one of the the blake extensions, the instructions becomes invalid: - `opcode == 0` (no operation) @@ -82,7 +82,7 @@ On the other side, if we are working with the `BlakeFinalize` extension the oper The out here represents the Blake2s compression of the last block. -### QM31 +#### QM31 In this case, when `opcode_extension == 3` we are working with the `QM31Operation` which changes how the arithmetic (add, mul, sub, div) works on the VM by doing it with QM31 elements in reduced form. Again there are some constraints, if these are not met the instruction becomes invalid: - `res_logic == 1` (Add) || `res_logic == 2` (Mul) @@ -90,7 +90,7 @@ In this case, when `opcode_extension == 3` we are working with the `QM31Operatio - `pc_update == 0` (Regular) - `ap_update == 0` (Regular) || `ap_update == 2` (Add1) -## Auxiliary Variables +### Auxiliary Variables The instruction execution uses four auxiliary variables, that can be computed from the memory values, and the instruction fields: - `dst`: Destination. @@ -100,7 +100,7 @@ The instruction execution uses four auxiliary variables, that can be computed fr Depending on the instruction, the values of `dst`, `op0` and `op1` may be unknown at the start of execution, and will be deduced during it. -### Computing `dst` +#### Computing `dst` The value of `dst` is computed as the value at address `register + off_dst`, where `register` depends on the value of `dst_reg`: - `dst_reg == 0`: We use `AP`. @@ -108,7 +108,7 @@ The value of `dst` is computed as the value at address `register + off_dst`, whe If the value at the specified address is undefined, then it must be deduced instead. -### Computing `op0` +#### Computing `op0` The value of `op0` is computed as the value at address `register + off_op0`, where `register` depends on the value of `op0_reg`: - `op0_reg == 0`: We use `AP` @@ -116,7 +116,7 @@ The value of `op0` is computed as the value at address `register + off_op0`, whe If the value at the specified address is undefined, then it must be deduced instead. -### Computing `op1` +#### Computing `op1` The value of `op1` is computed as the value at address `base + off_op1`, where `base` depends on the value of `op1_src`: - `op1_src == 0`: We use `op0`. @@ -130,7 +130,7 @@ If the value at the specified address is undefined, then it must be deduced inst > [!NOTE] > When `op1_src == 1` we must assert that `off_op1 == 1`, so that `op1` is an immediate value. This constraint is not specified in the whitepaper, but enforced by our VM. -### Computing `res` +#### Computing `res` The variable `res` computation depends on `res_logic`. - `res_logic == 0`: We set `res = op1`. @@ -141,7 +141,7 @@ The variable `res` computation depends on `res_logic`. > [!NOTE] > The value of `res` won’t always be used. For example, it won’t be used when `pc_update == 4`. -## Additional Constraints +### Additional Constraints 1. When `opcode == 1` (Call), the following conditions must be met: - `off_dst == 0` @@ -163,7 +163,7 @@ The variable `res` computation depends on `res_logic`. > These constraints are not specified in the whitepaper, but enforced by our VM. If > they are not met, then the instruction is **invalid**. -## Deductions +### Deductions Some values may be undefined because the associated memory locations haven’t been set. In this case, they must be *deduced*, or *asserted*. @@ -188,11 +188,11 @@ The `instruction_size` is always `1`, unless `op1` is an immediate, in which cas > If a value is undefined and cannot be deduced, the instruction execution must fail. > This constraint is not specified in the whitepaper, but enforced by our VM. -## Updating Registers +### Updating Registers At the end of the execution, the registers must be updated according to the instruction flags. -### Updating the PC +#### Updating the PC The updated PC will be denoted by `next_pc`. @@ -212,7 +212,7 @@ When updating the program counter, we depend primarily on the `pc_update` field: > In our VM: > `new_pc` = `next_pc` -### Updating the AP +#### Updating the AP The updated AP will be denoted by `next_ap`. @@ -230,7 +230,7 @@ When updating the allocation pointer, we depend primarily on the `ap_update` fie > In our VM: > `new_apset` = `next_ap` -### Updating the FP +#### Updating the FP The updated FP will be denoted by `next_fp`. From 5ade55babe3c99bfa2004347ffea4fb8a8f58655 Mon Sep 17 00:00:00 2001 From: Diego Civini Date: Tue, 3 Jun 2025 16:49:22 -0300 Subject: [PATCH 07/28] Add some more constraints we have --- docs/SPECS.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/docs/SPECS.md b/docs/SPECS.md index ded9c07b47..16b0dc1742 100644 --- a/docs/SPECS.md +++ b/docs/SPECS.md @@ -149,6 +149,8 @@ The variable `res` computation depends on `res_logic`. - `off_op0 == 1` - `op0_reg == AP` - `ap_update == 0` (add 2) +- `op0 == PC + instruction_size` +- `FP == dst` 2. When `opcode == 2` (Return), the following conditions must be met: - `off_dst == -2` @@ -158,6 +160,9 @@ The variable `res` computation depends on `res_logic`. - `res_logic == 0` (op1) - `pc_update == 1` (absolute jump) +3. When `opcode == 4` (AssertEq), the following conditions must be met: +- `res == dst` + > [!NOTE] > These constraints are not specified in the whitepaper, but enforced by our VM. If From 84de8267e881c2f82b774dcec236e87c18c9bd8d Mon Sep 17 00:00:00 2001 From: Diego Civini Date: Tue, 3 Jun 2025 17:26:10 -0300 Subject: [PATCH 08/28] Start memory model section --- docs/SPECS.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/docs/SPECS.md b/docs/SPECS.md index 16b0dc1742..a1652e6930 100644 --- a/docs/SPECS.md +++ b/docs/SPECS.md @@ -254,3 +254,13 @@ When updating the frame pointer, we depend on the `opcode`: > [!NOTE] > In our VM: > `new_fp_offset` = `next_fp` + +## Memory Model + +### Nondeterministic Memory + +Cairo VM uses a Nondeterministic Read-Only memory model. This means - the prover chooses all the values of the memory, and the memory is immutable. The Cairo program may only read from it - Cairo Whitepaper section 2.6. + +### Memory Layout + +Requirement 1: given a pair of accessed memory addresses *x and y*, any address *a* which satisfies that *x < a < y* must have also been accessed. This implies that any given set of accessed memory addresses must be contiguous. From b1084427a1e44b6b8455d142b02e4cff64d94c78 Mon Sep 17 00:00:00 2001 From: Diego Civini Date: Wed, 4 Jun 2025 15:35:42 -0300 Subject: [PATCH 09/28] Some initial info about memory structures --- docs/SPECS.md | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/docs/SPECS.md b/docs/SPECS.md index a1652e6930..61d499480d 100644 --- a/docs/SPECS.md +++ b/docs/SPECS.md @@ -264,3 +264,31 @@ Cairo VM uses a Nondeterministic Read-Only memory model. This means - the prover ### Memory Layout Requirement 1: given a pair of accessed memory addresses *x and y*, any address *a* which satisfies that *x < a < y* must have also been accessed. This implies that any given set of accessed memory addresses must be contiguous. + +#### Public Memory + +#### Memory Segments Manager + +To satisfy requirement 1, Cairo organizes its memory into segments. In our VM we have the `MemorySegmentManager` which contains everything to manage this special parts of the memory. +- `segment_sizes`: A HashMap that contains the size of each segment +- `segment_used_sizes`: ?????????? +- `memory`: The memory itself, containing the data, temporary data, relocation rules, among other things +- `public_memory_offsets`: ???????? +- `zero_segment_index`: ???????? +- `zero_segment_size`: ???????? + +#### Memory + +The VM memory containing: +- `data`: A vector of vectors. Each vector representing a different `segment` with its own data +- `temp_data`: A vector of vector. Each vector representing a `temporary segment` +- `relocation_rules`: ?????? +- `validated_addresses`: ?????? +- `validation_rules`: ?????? + +#### Memory Cell + +Represents a unit of data in the memory + + +#### Memory Relocation From 77b6a4c4c6cf3436043d2619ff68b2f737727cbb Mon Sep 17 00:00:00 2001 From: Diego Civini Date: Wed, 4 Jun 2025 16:17:38 -0300 Subject: [PATCH 10/28] Some more info about memory structures --- docs/SPECS.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/docs/SPECS.md b/docs/SPECS.md index 61d499480d..750a2f046d 100644 --- a/docs/SPECS.md +++ b/docs/SPECS.md @@ -267,6 +267,8 @@ Requirement 1: given a pair of accessed memory addresses *x and y*, any address #### Public Memory +Represented by a vector of `PublicMemoryEntry` which contains a value, an address and its page + #### Memory Segments Manager To satisfy requirement 1, Cairo organizes its memory into segments. In our VM we have the `MemorySegmentManager` which contains everything to manage this special parts of the memory. @@ -286,6 +288,9 @@ The VM memory containing: - `validated_addresses`: ?????? - `validation_rules`: ?????? +#### CairoPieMemory + + #### Memory Cell Represents a unit of data in the memory From 0a2b4ecd7447feda9c16c66ce980e6bfc7ae23ca Mon Sep 17 00:00:00 2001 From: Diego Civini Date: Wed, 4 Jun 2025 18:13:36 -0300 Subject: [PATCH 11/28] A little bit of memory info --- docs/SPECS.md | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/docs/SPECS.md b/docs/SPECS.md index 750a2f046d..329d4afedb 100644 --- a/docs/SPECS.md +++ b/docs/SPECS.md @@ -267,9 +267,11 @@ Requirement 1: given a pair of accessed memory addresses *x and y*, any address #### Public Memory -Represented by a vector of `PublicMemoryEntry` which contains a value, an address and its page +After executing the cairo program and relocating the memory of the runner, that relocated memory is used to create a `PublicInput` which contains a vector of `PublicMemoryEntry`. Each of this contains a value, an address and its page, the whole vector represents the public memory. -#### Memory Segments Manager +### Memory Implementation + +#### MemorySegmentManager To satisfy requirement 1, Cairo organizes its memory into segments. In our VM we have the `MemorySegmentManager` which contains everything to manage this special parts of the memory. - `segment_sizes`: A HashMap that contains the size of each segment @@ -288,12 +290,14 @@ The VM memory containing: - `validated_addresses`: ?????? - `validation_rules`: ?????? -#### CairoPieMemory - +#### MemoryCell & MaybeRelocatable -#### Memory Cell +`MaybeRelocatable` represents a unit of data that can be a pointer or a felt: +- `RelocatableValue(Relocatable)`: Contains the segment index and the offset in that segment +- `Int(Felt252)`: Contains just a value of data -Represents a unit of data in the memory +`MemoryCell` represents a unit of data in the memory. +### Memory relationships -#### Memory Relocation +### Memory Relocation From 7484df8b7e8f236609e3b182e038f65b1da13691 Mon Sep 17 00:00:00 2001 From: Diego Civini Date: Thu, 5 Jun 2025 09:18:20 -0300 Subject: [PATCH 12/28] Relocation first steps --- docs/SPECS.md | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/docs/SPECS.md b/docs/SPECS.md index 329d4afedb..e288c67d58 100644 --- a/docs/SPECS.md +++ b/docs/SPECS.md @@ -300,4 +300,14 @@ The VM memory containing: ### Memory relationships -### Memory Relocation +### Relocation + +The memory relocation has the following steps: +- Compute the sizes of the segments in the memory +- Create the relocation table +- Relocate the memory (In cairo0 this depends on the config, but in cairo1 is always done) +- Relocate the trace + +#### Memory Relocation + +Segments from the memory are iterated in order and for each cell of a segment the new relocated address and value are calculated. With this, the continuous memory is created. From e9e1e88df614630511d09521f0ca9b4eeb4e303c Mon Sep 17 00:00:00 2001 From: Diego Civini Date: Thu, 5 Jun 2025 17:00:13 -0300 Subject: [PATCH 13/28] More on memory relocation --- docs/SPECS.md | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/docs/SPECS.md b/docs/SPECS.md index e288c67d58..576a027826 100644 --- a/docs/SPECS.md +++ b/docs/SPECS.md @@ -311,3 +311,26 @@ The memory relocation has the following steps: #### Memory Relocation Segments from the memory are iterated in order and for each cell of a segment the new relocated address and value are calculated. With this, the continuous memory is created. + +Each segment gets its index from the order they have in the data of `Memory`. The same happens with the `MemoryCells` in which their offset in the segment is represented by their index. For example: + +``` +Memory = [ + [Cell0, Cell1], # Segment0 + [Cell2], # Segment1 + [Cell3] # Segment2 +] + +Cell0 -> segment = 0 and offset = 0 +Cell1 -> segment = 0 and offset = 1 +Cell3 -> segment = 2 and offset = 0 +``` + +- First we relocate the address as `relocation_table[segment] + offset` +- Then we relocate the value by turning a `MaybeRelocatable` into a `Felt252`: + - If the cell is a `MaybeRelocatable::Int(n)`, then the new value is `num`. + - If the cell is a `MaybeRelocatable::RelocatableValue(relocatable)`, then the new value is `relocation_table[relocatable.segment_index] + relocatable.offset` + +> [!NOTE] +> In our VM: +> relocation mem starts at index 1 From dc0186712c1fe5e6c6f358faec759e4d90489b14 Mon Sep 17 00:00:00 2001 From: Diego Civini Date: Fri, 6 Jun 2025 11:08:10 -0300 Subject: [PATCH 14/28] Creation of relocation table --- docs/SPECS.md | 26 +++++++++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/docs/SPECS.md b/docs/SPECS.md index 576a027826..d7518e5084 100644 --- a/docs/SPECS.md +++ b/docs/SPECS.md @@ -308,6 +308,28 @@ The memory relocation has the following steps: - Relocate the memory (In cairo0 this depends on the config, but in cairo1 is always done) - Relocate the trace +#### Creation of Relocation Table + +The relocation table is a `Vec`, that represents the offsets in the relocated memory for each of the segments in the memory. For its creation, we use the field `segment_used_sizes` of the `MemorySegmentManager` and get the relocation value as `relocation_table[i] + segment_usize` where `i` is the segment index. One thing to take in mind is that since relocated memory starts from index 1, the relocation table starts with a 1 which is the relocation offset of the first segment. + +Example: + +``` +segment_used_sizes = [2,4,5] +relocation_table = [1] + +1. Calculate the first segment +i = 0 +relocation_offset = relocation_table[0] + first_segment_size = 1 + 2 = 3 + +2. Calculate the second segment +i = 1 +relocation_offset = relocation_table[1] + second_segment_size = 3 + 4 = 7 + +### Relocation Table ### +[1,3,7] +``` + #### Memory Relocation Segments from the memory are iterated in order and for each cell of a segment the new relocated address and value are calculated. With this, the continuous memory is created. @@ -331,6 +353,8 @@ Cell3 -> segment = 2 and offset = 0 - If the cell is a `MaybeRelocatable::Int(n)`, then the new value is `num`. - If the cell is a `MaybeRelocatable::RelocatableValue(relocatable)`, then the new value is `relocation_table[relocatable.segment_index] + relocatable.offset` +It can happen that the cell is empty and has no value because it was just filling an unused gap, in that case the relocated memory is also filled with a `None`. + > [!NOTE] > In our VM: -> relocation mem starts at index 1 +> Relocation memory starts at index 1. Index 0 is filled with a `None` From e84ed44905a3487257bfb8cb606434923e8efccd Mon Sep 17 00:00:00 2001 From: Diego Civini Date: Fri, 6 Jun 2025 14:32:32 -0300 Subject: [PATCH 15/28] Explain MemoryCell --- docs/SPECS.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/SPECS.md b/docs/SPECS.md index d7518e5084..f19b21e06a 100644 --- a/docs/SPECS.md +++ b/docs/SPECS.md @@ -275,7 +275,7 @@ After executing the cairo program and relocating the memory of the runner, that To satisfy requirement 1, Cairo organizes its memory into segments. In our VM we have the `MemorySegmentManager` which contains everything to manage this special parts of the memory. - `segment_sizes`: A HashMap that contains the size of each segment -- `segment_used_sizes`: ?????????? +- `segment_used_sizes`: A Vector of each segment size. Item on index i is the segment size of segment i - `memory`: The memory itself, containing the data, temporary data, relocation rules, among other things - `public_memory_offsets`: ???????? - `zero_segment_index`: ???????? @@ -296,7 +296,7 @@ The VM memory containing: - `RelocatableValue(Relocatable)`: Contains the segment index and the offset in that segment - `Int(Felt252)`: Contains just a value of data -`MemoryCell` represents a unit of data in the memory. +`MemoryCell` is a `[u64; 4]` that represents a value in the memory. However cells can also represent a part in memory that holds no value and it is just used to fill gaps. This is done by useng a mask -> Cell with no value = `[NONE_MASK, 0, 0, 0]` ### Memory relationships From 38b317906bba7b2e5446e39fb54ecfd1d2fae4ad Mon Sep 17 00:00:00 2001 From: Diego Civini Date: Fri, 6 Jun 2025 17:25:16 -0300 Subject: [PATCH 16/28] Explain some memory methods --- docs/SPECS.md | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/docs/SPECS.md b/docs/SPECS.md index f19b21e06a..c02de1697f 100644 --- a/docs/SPECS.md +++ b/docs/SPECS.md @@ -290,6 +290,22 @@ The VM memory containing: - `validated_addresses`: ?????? - `validation_rules`: ?????? +Some methods: +- `insert(key, val)`: + - Inserts the value into a memory address. If the address is not contiguous with previously inserted data, the memory gaps are filled with `None`. + - To get the value, first gets the segment index and then the offset with `from_relocatable_to_indexes()` + - Verifies if that address is already used, in that case it returns a `MemoryError::InconsistencyMemory()` + - Validates the memory cell with the validation rule + +- `get(key)`: + - Returns a value from memory + - Relocates the value + +- `relocate_memory()`: + - If `relocation_rules` or `temp_data` are empty, it does nothing + - The cells from the data and temporary data are relocated + - TODO! + #### MemoryCell & MaybeRelocatable `MaybeRelocatable` represents a unit of data that can be a pointer or a felt: From f15ce1b71313b9d483e9d67d079af03d95650f4c Mon Sep 17 00:00:00 2001 From: Diego Civini Date: Tue, 10 Jun 2025 12:22:18 -0300 Subject: [PATCH 17/28] Explain Temporary Memory Relocation --- docs/SPECS.md | 26 ++++++++++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/docs/SPECS.md b/docs/SPECS.md index c02de1697f..d78e2201a0 100644 --- a/docs/SPECS.md +++ b/docs/SPECS.md @@ -269,6 +269,13 @@ Requirement 1: given a pair of accessed memory addresses *x and y*, any address After executing the cairo program and relocating the memory of the runner, that relocated memory is used to create a `PublicInput` which contains a vector of `PublicMemoryEntry`. Each of this contains a value, an address and its page, the whole vector represents the public memory. +#### Temporary Memory + +During execution, temporary segments are stored separated from the memory data and their segment indexes are represented with negative numbers. For their relocation, we use rules that will tell us how to map them to a memory segment. + +> [!NOTE] +> Relocation rules start at index 0 and temp data segment index start at -1. So for the mapping, segment_index = -1 maps to key 0, segment_index = -2 to key 1, and so on. + ### Memory Implementation #### MemorySegmentManager @@ -285,8 +292,8 @@ To satisfy requirement 1, Cairo organizes its memory into segments. In our VM we The VM memory containing: - `data`: A vector of vectors. Each vector representing a different `segment` with its own data -- `temp_data`: A vector of vector. Each vector representing a `temporary segment` -- `relocation_rules`: ?????? +- `temp_data`: A vector of vectors. Each vector representing a `temporary segment` +- `relocation_rules`: A hashmap that tells how a temporary segment maps to a to a memory segment - `validated_addresses`: ?????? - `validation_rules`: ?????? @@ -374,3 +381,18 @@ It can happen that the cell is empty and has no value because it was just fillin > [!NOTE] > In our VM: > Relocation memory starts at index 1. Index 0 is filled with a `None` + +#### Temporary Memory Relocation + +In this case, we will use the `relocation_rules` that determines to which relocated memory segment a temporary segment will map. + +First, the `data` and `temp_data` are iterated searching for any `Relocatable` that needs to have their address relocated. They are differentiated by having their `segment_index < 0`. When one of them is found, it is relocated with the `relocation_rules` in the following way: +``` +old_cell = Relocatable {segment_index: x, offset: y} + +new_cell = relocation_rules[-old_cell.segment_index + 1] + old_cell.offset +``` + +Once the addresses are relocated, we start moving the temporary memory into the VM's memory. By iterating the `temp_data` from **right to left**. From the `relocation_rules` we get the the base address for the temporary segment and start adding the cells from that point. + +Keep in mind that if a temporary segment does not map to a real memory address (in other words, it does not have its mapping in `relocation_rules`) it won´t be relocated to real memory. From 0790382b5ebcae496a2f39801e831266f5f6a0bb Mon Sep 17 00:00:00 2001 From: Diego Civini Date: Tue, 10 Jun 2025 12:42:27 -0300 Subject: [PATCH 18/28] Fix relocation explanation order --- docs/SPECS.md | 34 ++++++++++++++++++---------------- 1 file changed, 18 insertions(+), 16 deletions(-) diff --git a/docs/SPECS.md b/docs/SPECS.md index d78e2201a0..ed843872a4 100644 --- a/docs/SPECS.md +++ b/docs/SPECS.md @@ -319,18 +319,33 @@ Some methods: - `RelocatableValue(Relocatable)`: Contains the segment index and the offset in that segment - `Int(Felt252)`: Contains just a value of data -`MemoryCell` is a `[u64; 4]` that represents a value in the memory. However cells can also represent a part in memory that holds no value and it is just used to fill gaps. This is done by useng a mask -> Cell with no value = `[NONE_MASK, 0, 0, 0]` +`MemoryCell` is a `[u64; 4]` that represents a value in the memory. However cells can also represent a part in memory that holds no value and it is just used to fill gaps. This is done by using a mask -> Cell with no value = `[NONE_MASK, 0, 0, 0]` ### Memory relationships ### Relocation The memory relocation has the following steps: -- Compute the sizes of the segments in the memory -- Create the relocation table +- Relocate temporary data +- Compute the sizes of the segments in the memory and create the relocation table - Relocate the memory (In cairo0 this depends on the config, but in cairo1 is always done) - Relocate the trace +#### Temporary Memory Relocation + +In this case, we will use the `relocation_rules` that determines to which relocated memory segment a temporary segment will map. + +First, the `data` and `temp_data` are iterated searching for any `Relocatable` that needs to have their address relocated. They are differentiated by having their `segment_index < 0`. When one of them is found, it is relocated with the `relocation_rules` in the following way: +``` +old_cell = Relocatable {segment_index: x, offset: y} + +new_cell = relocation_rules[-old_cell.segment_index + 1] + old_cell.offset +``` + +Once the addresses are relocated, we start moving the temporary memory into the VM's memory. By iterating the `temp_data` from **right to left**. From the `relocation_rules` we get the the base address for the temporary segment and start adding the cells from that point. + +Keep in mind that if a temporary segment does not map to a real memory address (in other words, it does not have its mapping in `relocation_rules`) it won´t be relocated to real memory. + #### Creation of Relocation Table The relocation table is a `Vec`, that represents the offsets in the relocated memory for each of the segments in the memory. For its creation, we use the field `segment_used_sizes` of the `MemorySegmentManager` and get the relocation value as `relocation_table[i] + segment_usize` where `i` is the segment index. One thing to take in mind is that since relocated memory starts from index 1, the relocation table starts with a 1 which is the relocation offset of the first segment. @@ -382,17 +397,4 @@ It can happen that the cell is empty and has no value because it was just fillin > In our VM: > Relocation memory starts at index 1. Index 0 is filled with a `None` -#### Temporary Memory Relocation - -In this case, we will use the `relocation_rules` that determines to which relocated memory segment a temporary segment will map. - -First, the `data` and `temp_data` are iterated searching for any `Relocatable` that needs to have their address relocated. They are differentiated by having their `segment_index < 0`. When one of them is found, it is relocated with the `relocation_rules` in the following way: -``` -old_cell = Relocatable {segment_index: x, offset: y} - -new_cell = relocation_rules[-old_cell.segment_index + 1] + old_cell.offset -``` -Once the addresses are relocated, we start moving the temporary memory into the VM's memory. By iterating the `temp_data` from **right to left**. From the `relocation_rules` we get the the base address for the temporary segment and start adding the cells from that point. - -Keep in mind that if a temporary segment does not map to a real memory address (in other words, it does not have its mapping in `relocation_rules`) it won´t be relocated to real memory. From 85987d174f889288a15b52cba5371cf7ad8ed600 Mon Sep 17 00:00:00 2001 From: Diego Civini Date: Tue, 10 Jun 2025 14:53:54 -0300 Subject: [PATCH 19/28] Minor changes --- docs/SPECS.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/SPECS.md b/docs/SPECS.md index ed843872a4..8dfb4a093c 100644 --- a/docs/SPECS.md +++ b/docs/SPECS.md @@ -271,7 +271,7 @@ After executing the cairo program and relocating the memory of the runner, that #### Temporary Memory -During execution, temporary segments are stored separated from the memory data and their segment indexes are represented with negative numbers. For their relocation, we use rules that will tell us how to map them to a memory segment. +During execution, temporary segments are stored separated from the memory data and their segment indexes are represented with negative numbers. For their relocation, we use rules that will tell us how to map them to a real memory segment. > [!NOTE] > Relocation rules start at index 0 and temp data segment index start at -1. So for the mapping, segment_index = -1 maps to key 0, segment_index = -2 to key 1, and so on. @@ -284,9 +284,9 @@ To satisfy requirement 1, Cairo organizes its memory into segments. In our VM we - `segment_sizes`: A HashMap that contains the size of each segment - `segment_used_sizes`: A Vector of each segment size. Item on index i is the segment size of segment i - `memory`: The memory itself, containing the data, temporary data, relocation rules, among other things -- `public_memory_offsets`: ???????? -- `zero_segment_index`: ???????? -- `zero_segment_size`: ???????? +- `public_memory_offsets`: A HashMap that maps a segment index to a list of offsets for memory cells that represent the public memory +- `zero_segment_index`: The index of the zero segment that is used for builtins +- `zero_segment_size`: Size of the zero segment #### Memory @@ -294,8 +294,8 @@ The VM memory containing: - `data`: A vector of vectors. Each vector representing a different `segment` with its own data - `temp_data`: A vector of vectors. Each vector representing a `temporary segment` - `relocation_rules`: A hashmap that tells how a temporary segment maps to a to a memory segment -- `validated_addresses`: ?????? -- `validation_rules`: ?????? +- `validated_addresses`: A vector of `BitVec` +- `validation_rules`: A vector containing the validation rules Some methods: - `insert(key, val)`: From 8981742d95e8da354a1ef5a96fab79415255f86b Mon Sep 17 00:00:00 2001 From: Diego Civini Date: Tue, 10 Jun 2025 15:17:42 -0300 Subject: [PATCH 20/28] MemorySegmentManager methods --- docs/SPECS.md | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/docs/SPECS.md b/docs/SPECS.md index 8dfb4a093c..f9a8d90589 100644 --- a/docs/SPECS.md +++ b/docs/SPECS.md @@ -288,6 +288,17 @@ To satisfy requirement 1, Cairo organizes its memory into segments. In our VM we - `zero_segment_index`: The index of the zero segment that is used for builtins - `zero_segment_size`: Size of the zero segment +Some methods: +- `add()`: + - Adds a new segment to the memory by adding an empty vector to the data + - It returns a `Relocatable` that represents the starting location of the new segement + +- `add_temporary_segment()`: + - Similar to the method above, just adds a new vector to the temporary data and again returns its starting location. However, in this case the segment index will be a negative value + +- `relocate_segments()`: + - Creates the relocation table. This is explained in detail [Creation of Relocation Table](#creation-of-relocation-table) + #### Memory The VM memory containing: @@ -309,9 +320,9 @@ Some methods: - Relocates the value - `relocate_memory()`: + - Relocates temporary data - If `relocation_rules` or `temp_data` are empty, it does nothing - - The cells from the data and temporary data are relocated - - TODO! + - Explained with more detail in [Temporary Memory Relocation](#temporary-memory-relocation) #### MemoryCell & MaybeRelocatable From c5f57cde4bb9bd7f0889e2a6aa87fe272e90ab8e Mon Sep 17 00:00:00 2001 From: Diego Civini Date: Wed, 11 Jun 2025 09:26:48 -0300 Subject: [PATCH 21/28] Add info about real memory --- docs/SPECS.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/docs/SPECS.md b/docs/SPECS.md index f9a8d90589..e985a14e99 100644 --- a/docs/SPECS.md +++ b/docs/SPECS.md @@ -265,6 +265,10 @@ Cairo VM uses a Nondeterministic Read-Only memory model. This means - the prover Requirement 1: given a pair of accessed memory addresses *x and y*, any address *a* which satisfies that *x < a < y* must have also been accessed. This implies that any given set of accessed memory addresses must be contiguous. +#### Real Memory + +The VM's memory is represented with the struct `Memory` which contains each segment, temporary or not. Segments are represented by a `Vec` and their index in the vector represents their segment index which will be then used for relocation. To manage all the segments and their data, we have the `MemorySegmentManager`. + #### Public Memory After executing the cairo program and relocating the memory of the runner, that relocated memory is used to create a `PublicInput` which contains a vector of `PublicMemoryEntry`. Each of this contains a value, an address and its page, the whole vector represents the public memory. @@ -408,4 +412,6 @@ It can happen that the cell is empty and has no value because it was just fillin > In our VM: > Relocation memory starts at index 1. Index 0 is filled with a `None` +## Output + From 93899fd050a1deec722883a88ccd837fb8d68f5b Mon Sep 17 00:00:00 2001 From: Diego Civini Date: Wed, 11 Jun 2025 15:57:12 -0300 Subject: [PATCH 22/28] Info about trace --- docs/SPECS.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/docs/SPECS.md b/docs/SPECS.md index e985a14e99..725aa26501 100644 --- a/docs/SPECS.md +++ b/docs/SPECS.md @@ -414,4 +414,11 @@ It can happen that the cell is empty and has no value because it was just fillin ## Output +During execution the trace is represented as a vector of `TraceEntry` where each of them has the values of the `pc`, `ap` and `fp`. In the case of the `pc` it is a `Relocatable` the others are just an `usize`. During the execution of each instruction, a new `TraceEntry` is created with the values of the registers (these are updated after). +After execution, the trace needs to be relocated. From a `Vec` we get a `Vec`. This process is simple and does the following: +- relocated_pc = pc.offset + relocation_table[pc.segment_index] +- relocated_ap = ap + relocation_table[1] +- relocated_fp = fp + relocation_table[1] + +For `ap` and `fp` we use segment 1 since it is the execution segement From 23f4a134e118887e9bfb8c6c4fc9133af4011cc6 Mon Sep 17 00:00:00 2001 From: Diego Civini Date: Wed, 11 Jun 2025 17:32:33 -0300 Subject: [PATCH 23/28] Info about CairoPie and PublicInput --- docs/SPECS.md | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/docs/SPECS.md b/docs/SPECS.md index 725aa26501..6a8e6ec743 100644 --- a/docs/SPECS.md +++ b/docs/SPECS.md @@ -414,6 +414,8 @@ It can happen that the cell is empty and has no value because it was just fillin ## Output +### Trace + During execution the trace is represented as a vector of `TraceEntry` where each of them has the values of the `pc`, `ap` and `fp`. In the case of the `pc` it is a `Relocatable` the others are just an `usize`. During the execution of each instruction, a new `TraceEntry` is created with the values of the registers (these are updated after). After execution, the trace needs to be relocated. From a `Vec` we get a `Vec`. This process is simple and does the following: @@ -421,4 +423,12 @@ After execution, the trace needs to be relocated. From a `Vec` we ge - relocated_ap = ap + relocation_table[1] - relocated_fp = fp + relocation_table[1] -For `ap` and `fp` we use segment 1 since it is the execution segement +For `ap` and `fp` we use segment 1 since it is the execution segment. + +### Public Input + +Its the output of the of the VM execution and the input for the verifier. For its creation, it uses the VMs public memory information to create a `Vec` which represents the public memory. `PublicMemoryEntry` has an `address`, a `page` and a `value`. + +### CairoPie + +Like [Public Input](#public-input), this output will also be used as an input in the VM itself. It includes a `CairoPieMemory` which is a simplified version of the VM memory and it is possible to easily create it from it. It is simplified in the sense that it only contains the address-value pairs of the memory, and nothing else. From 748a3033ce1236089411732094c04dfce24a677d Mon Sep 17 00:00:00 2001 From: Diego Civini Date: Tue, 17 Jun 2025 17:24:37 -0300 Subject: [PATCH 24/28] Minor changes --- docs/SPECS.md | 39 +++++++++++++++++++++------------------ 1 file changed, 21 insertions(+), 18 deletions(-) diff --git a/docs/SPECS.md b/docs/SPECS.md index 6a8e6ec743..0c094ff00d 100644 --- a/docs/SPECS.md +++ b/docs/SPECS.md @@ -1,6 +1,7 @@ # Specifications ## Instructions + ### Operations Each Cairo instruction represents one of the following operations: @@ -62,7 +63,7 @@ This type is used when `opcode_extension == 0`. Its does not add new behaviour s #### Blake & BlakeFinalize -Blake is used when `opcode_extension == 1` and BlakeFinalize is used when `opcode_extension == 2` and they have some constraints, if these are not met while having one of the the blake extensions, the instructions becomes invalid: +Blake is used when `opcode_extension == 1` and BlakeFinalize is used when `opcode_extension == 2` and they have some constraints, if these are not met while having one of the the blake extensions, the instruction becomes invalid: - `opcode == 0` (no operation) - `op1_src == 2` (FP) or `op1_src == 4` (AP) - `res_logic == 0` (op1) @@ -70,25 +71,25 @@ Blake is used when `opcode_extension == 1` and BlakeFinalize is used when `opcod - `ap_update == 0` (Regular) || `ap_update == 2` (Add1) After checking the constraints, if we are working with a `Blake` extension the blake2 algorithm is applied. The operands are contained as follows: -- `op0_addr` points to a sequence of 8 -> represents a state +- `op0_addr` points to a sequence of 8 felts -> represents a state - `op1_addr` points to a sequence of 16 felts -> represents a message - `dst_addr` holds the value of the counter -- `ap` points to a sequence of 8 cells which each should either be uninitialised or already contain a value matching that of the output +- `ap` points to a sequence of 8 cells where each should either be uninitialised or already contain a value matching that of the output The output consists of 8 felts that represent the output of the Blake2s compression. On the other side, if we are working with the `BlakeFinalize` extension the operands are the same as with `Blake` with only one change: -- `op0_addr` points to a sequence of 9 felts -> first 8 felts represent the state the last one represents the state +- `op0_addr` points to a sequence of 9 felts -> first 8 felts represent the state and the last one represents a counter -The out here represents the Blake2s compression of the last block. +The output here represents the Blake2s compression of the last block. #### QM31 In this case, when `opcode_extension == 3` we are working with the `QM31Operation` which changes how the arithmetic (add, mul, sub, div) works on the VM by doing it with QM31 elements in reduced form. Again there are some constraints, if these are not met the instruction becomes invalid: -- `res_logic == 1` (Add) || `res_logic == 2` (Mul) +- `res_logic == 1` (Add) or `res_logic == 2` (Mul) - `op1_src != 0` (Op0) - `pc_update == 0` (Regular) -- `ap_update == 0` (Regular) || `ap_update == 2` (Add1) +- `ap_update == 0` (Regular) or `ap_update == 2` (Add1) ### Auxiliary Variables @@ -136,7 +137,8 @@ The variable `res` computation depends on `res_logic`. - `res_logic == 0`: We set `res = op1`. - `res_logic == 1`: We set `res = op0 + op1`. - `res_logic == 2`: We set `res = op0 * op1`. -- Otherwise: The instruction is invalid. + +Otherwise: The instruction is invalid. > [!NOTE] > The value of `res` won’t always be used. For example, it won’t be used when `pc_update == 4`. @@ -197,7 +199,7 @@ The `instruction_size` is always `1`, unless `op1` is an immediate, in which cas At the end of the execution, the registers must be updated according to the instruction flags. -#### Updating the PC +#### Updating the `PC` The updated PC will be denoted by `next_pc`. @@ -211,13 +213,14 @@ When updating the program counter, we depend primarily on the `pc_update` field: - `pc_update == 4`: Conditional relative jump. - If `dst == 0`: Set `next_pc = pc + instruction_size`. - If `dst != 0`: Set `next_pc = pc + op1`. -- Otherwise: The instruction is invalid. + +Otherwise: The instruction is invalid. > [!NOTE] > In our VM: > `new_pc` = `next_pc` -#### Updating the AP +#### Updating the `AP` The updated AP will be denoted by `next_ap`. @@ -229,13 +232,14 @@ When updating the allocation pointer, we depend primarily on the `ap_update` fie - `ap_update == 0`: Set `next_ap = ap` - `ap_update == 1`: Set `next_ap = ap + res` - `ap_update == 2`: Set `next_ap = ap + 1` -- Otherwise: The instruction is invalid. + +Otherwise: The instruction is invalid. > [!NOTE] > In our VM: > `new_apset` = `next_ap` -#### Updating the FP +#### Updating the `FP` The updated FP will be denoted by `next_fp`. @@ -249,7 +253,8 @@ When updating the frame pointer, we depend on the `opcode`: - Set `next_fp = dst`. - `opcode == 3`: An assert equal operation. - Set `next_fp = fp`. -- Otherwise: The instruction is invalid. + +Otherwise: The instruction is invalid. > [!NOTE] > In our VM: @@ -301,7 +306,7 @@ Some methods: - Similar to the method above, just adds a new vector to the temporary data and again returns its starting location. However, in this case the segment index will be a negative value - `relocate_segments()`: - - Creates the relocation table. This is explained in detail [Creation of Relocation Table](#creation-of-relocation-table) + - Creates the relocation table. This is explained in detail on [Creation of Relocation Table](#creation-of-relocation-table) #### Memory @@ -336,8 +341,6 @@ Some methods: `MemoryCell` is a `[u64; 4]` that represents a value in the memory. However cells can also represent a part in memory that holds no value and it is just used to fill gaps. This is done by using a mask -> Cell with no value = `[NONE_MASK, 0, 0, 0]` -### Memory relationships - ### Relocation The memory relocation has the following steps: @@ -348,7 +351,7 @@ The memory relocation has the following steps: #### Temporary Memory Relocation -In this case, we will use the `relocation_rules` that determines to which relocated memory segment a temporary segment will map. +In this case, we will use the `relocation_rules` that determine to which relocated memory segment a temporary segment will map. First, the `data` and `temp_data` are iterated searching for any `Relocatable` that needs to have their address relocated. They are differentiated by having their `segment_index < 0`. When one of them is found, it is relocated with the `relocation_rules` in the following way: ``` From c466ab820914b7f6af44f63b39539782b75e1269 Mon Sep 17 00:00:00 2001 From: Diego Civini Date: Tue, 17 Jun 2025 18:21:54 -0300 Subject: [PATCH 25/28] Add diagram with relocation flow --- docs/SPECS.md | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/docs/SPECS.md b/docs/SPECS.md index 0c094ff00d..7904efe2ae 100644 --- a/docs/SPECS.md +++ b/docs/SPECS.md @@ -349,6 +349,19 @@ The memory relocation has the following steps: - Relocate the memory (In cairo0 this depends on the config, but in cairo1 is always done) - Relocate the trace +```mermaid +stateDiagram-v2 + state "Relocate addresses" as iterate_data + state "Relocate temporary data into real memory" as into_real_mem + [*] --> RelocateTempData + state RelocateTempData { + iterate_data --> into_real_mem + } + RelocateTempData --> CreateRelocationTable + CreateRelocationTable --> RelocateMemory + RelocateMemory --> [*] +``` + #### Temporary Memory Relocation In this case, we will use the `relocation_rules` that determine to which relocated memory segment a temporary segment will map. From e754878d71e8e0b8f5181cd87839b132389d38a2 Mon Sep 17 00:00:00 2001 From: Diego Civini Date: Wed, 18 Jun 2025 15:00:33 -0300 Subject: [PATCH 26/28] Add diagrams for instructions --- docs/SPECS.md | 49 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+) diff --git a/docs/SPECS.md b/docs/SPECS.md index 7904efe2ae..8a90355f86 100644 --- a/docs/SPECS.md +++ b/docs/SPECS.md @@ -260,6 +260,55 @@ Otherwise: The instruction is invalid. > In our VM: > `new_fp_offset` = `next_fp` +### Instruction Execution Flow + +#### General Overview + +The runner is the one that starts everything with `run_until_pc()` which uses the VM to execute instructions until the +PC reaches the end address. For each step done by the VM, an instruction is decoded and executed. + +```mermaid +stateDiagram-v2 + state "run_until_pc()" as run_steps + state "decode_instruction()" as decode + state "run_instruction()" as run_instr + state end <> + + [*] --> CairoRunner + state CairoRunner { + run_steps + } + + state VirtualMachine { + step_instruction() + } + + state step_instruction() { + decode --> run_instr + } + + CairoRunner --> VirtualMachine + VirtualMachine --> end + end --> [*] : PC == end_addr + end --> CairoRunner : PC != end_addr + +``` +#### Flow of `run_instruction()` + +```mermaid +stateDiagram-v2 + state "Compute Operands" as compute_ops + state "Deduce Operands" as deduce_ops + state "Add new TraceEntry" as new_trace_entry + state "Update registers" as update_regs + [*] --> RunInstruction + state RunInstruction { + compute_ops --> deduce_ops + deduce_ops --> new_trace_entry + new_trace_entry --> update_regs + } +``` + ## Memory Model ### Nondeterministic Memory From 00de59f68a14cd76840726349c6be593f10f9ae0 Mon Sep 17 00:00:00 2001 From: Diego Civini Date: Wed, 18 Jun 2025 15:22:29 -0300 Subject: [PATCH 27/28] Add links to functions --- docs/SPECS.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/docs/SPECS.md b/docs/SPECS.md index 8a90355f86..f3460a697c 100644 --- a/docs/SPECS.md +++ b/docs/SPECS.md @@ -264,7 +264,7 @@ Otherwise: The instruction is invalid. #### General Overview -The runner is the one that starts everything with `run_until_pc()` which uses the VM to execute instructions until the +The runner is the one that starts everything with [`run_until_pc()`](https://github.com/lambdaclass/cairo-vm/blob/06708faf0e44f4ddae7c3c53a38ce94f70a1966e/vm/src/vm/runners/cairo_runner.rs#L672) which uses the VM to execute instructions until the PC reaches the end address. For each step done by the VM, an instruction is decoded and executed. ```mermaid @@ -293,7 +293,7 @@ stateDiagram-v2 end --> CairoRunner : PC != end_addr ``` -#### Flow of `run_instruction()` +#### Flow of [`run_instruction()`](https://github.com/lambdaclass/cairo-vm/blob/06708faf0e44f4ddae7c3c53a38ce94f70a1966e/vm/src/vm/vm_core.rs#L407) ```mermaid stateDiagram-v2 @@ -347,14 +347,14 @@ To satisfy requirement 1, Cairo organizes its memory into segments. In our VM we - `zero_segment_size`: Size of the zero segment Some methods: -- `add()`: +- [`add()`](https://github.com/lambdaclass/cairo-vm/blob/06708faf0e44f4ddae7c3c53a38ce94f70a1966e/vm/src/vm/vm_memory/memory_segments.rs#L49): - Adds a new segment to the memory by adding an empty vector to the data - It returns a `Relocatable` that represents the starting location of the new segement -- `add_temporary_segment()`: +- [`add_temporary_segment()`](https://github.com/lambdaclass/cairo-vm/blob/06708faf0e44f4ddae7c3c53a38ce94f70a1966e/vm/src/vm/vm_memory/memory_segments.rs#L58): - Similar to the method above, just adds a new vector to the temporary data and again returns its starting location. However, in this case the segment index will be a negative value -- `relocate_segments()`: +- [`relocate_segments()`](https://github.com/lambdaclass/cairo-vm/blob/06708faf0e44f4ddae7c3c53a38ce94f70a1966e/vm/src/vm/vm_memory/memory_segments.rs#L112): - Creates the relocation table. This is explained in detail on [Creation of Relocation Table](#creation-of-relocation-table) #### Memory @@ -367,17 +367,17 @@ The VM memory containing: - `validation_rules`: A vector containing the validation rules Some methods: -- `insert(key, val)`: +- [`insert(key, val)`](https://github.com/lambdaclass/cairo-vm/blob/06708faf0e44f4ddae7c3c53a38ce94f70a1966e/vm/src/vm/vm_memory/memory.rs#L192): - Inserts the value into a memory address. If the address is not contiguous with previously inserted data, the memory gaps are filled with `None`. - To get the value, first gets the segment index and then the offset with `from_relocatable_to_indexes()` - Verifies if that address is already used, in that case it returns a `MemoryError::InconsistencyMemory()` - Validates the memory cell with the validation rule -- `get(key)`: +- [`get(key)`](https://github.com/lambdaclass/cairo-vm/blob/06708faf0e44f4ddae7c3c53a38ce94f70a1966e/vm/src/vm/vm_memory/memory.rs#L241): - Returns a value from memory - Relocates the value -- `relocate_memory()`: +- [`relocate_memory()`](https://github.com/lambdaclass/cairo-vm/blob/06708faf0e44f4ddae7c3c53a38ce94f70a1966e/vm/src/vm/vm_memory/memory.rs#L291): - Relocates temporary data - If `relocation_rules` or `temp_data` are empty, it does nothing - Explained with more detail in [Temporary Memory Relocation](#temporary-memory-relocation) @@ -496,4 +496,4 @@ Its the output of the of the VM execution and the input for the verifier. For it ### CairoPie -Like [Public Input](#public-input), this output will also be used as an input in the VM itself. It includes a `CairoPieMemory` which is a simplified version of the VM memory and it is possible to easily create it from it. It is simplified in the sense that it only contains the address-value pairs of the memory, and nothing else. +Like [Public Input](#public-input), this output will also be used as an input, in this case in the VM itself (this can be done with [cairo_run_pie()](https://github.com/lambdaclass/cairo-vm/blob/06708faf0e44f4ddae7c3c53a38ce94f70a1966e/vm/src/cairo_run.rs#L154)). It includes a `CairoPieMemory` which is a simplified version of the VM memory and it is possible to easily create it from it. It is simplified in the sense that it only contains the address-value pairs of the memory, and nothing else. From 7d3298c71f349d9df3959ab4c0ffa17ac849e5bc Mon Sep 17 00:00:00 2001 From: Diego Civini Date: Tue, 16 Sep 2025 15:04:35 -0300 Subject: [PATCH 28/28] Fix typos --- docs/SPECS.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/SPECS.md b/docs/SPECS.md index f3460a697c..f859d12f7d 100644 --- a/docs/SPECS.md +++ b/docs/SPECS.md @@ -32,14 +32,14 @@ The instruction encoding is specified in the [Cario whitepaper](https://eprint.i │ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │ 9 │ 10 │ 11 │ 12 │ 13 │ 14 │ 15 │ └─────┴─────┴───┴───┴───┴───┴───┴───┴───┴───┴────┴────┴────┴────┴────┴────┘ ``` -The figure shows the structure of the 63-bit that form the first word of each instruction. +The figure shows the structure of the 63-bits that form the first word of each instruction. - The bits are ordered in a little-endian-like encoding, this implies that `off_dst` is located at bits `[0;15]`, while `dst_reg` is located at bit `48`. - The last bit (`63`) is unused. Each sets of fields determine different aspects of the instruction execution: - `off_op0`, `off_op1`, `op0_reg`, `op1_src` determine the location of each operand. -- `off_dst`, `dst_reg` determine the location of the destionation. -- `res_logic` determine which operation to compute with the operands. +- `off_dst`, `dst_reg` determine the location of the destination. +- `res_logic` determines which operation to compute with the operands. - `pc_update` determines how the PC register is updated. - `ap_update` determines how the AP register is updated. - `opcode` determines both how the FP register is updated, and how some memory cell values are deduced. @@ -317,7 +317,7 @@ Cairo VM uses a Nondeterministic Read-Only memory model. This means - the prover ### Memory Layout -Requirement 1: given a pair of accessed memory addresses *x and y*, any address *a* which satisfies that *x < a < y* must have also been accessed. This implies that any given set of accessed memory addresses must be contiguous. +Requirement: given a pair of accessed memory addresses *x and y*, any address *a* which satisfies that *x < a < y* must have also been accessed. This implies that any given set of accessed memory addresses must be contiguous. #### Real Memory @@ -338,7 +338,7 @@ During execution, temporary segments are stored separated from the memory data a #### MemorySegmentManager -To satisfy requirement 1, Cairo organizes its memory into segments. In our VM we have the `MemorySegmentManager` which contains everything to manage this special parts of the memory. +To satisfy the memory requirement, Cairo organizes its memory into segments. In our VM we have the `MemorySegmentManager` which contains everything to manage this special parts of the memory. - `segment_sizes`: A HashMap that contains the size of each segment - `segment_used_sizes`: A Vector of each segment size. Item on index i is the segment size of segment i - `memory`: The memory itself, containing the data, temporary data, relocation rules, among other things