Installing the ZKsync Solidity Compiler Toolchain

To compile contracts for ZKsync, you need the ZKsync Solidity compiler toolchain. It consists of two components:

  1. The main component: zksolc.
  2. The additional component: solc, which produces Solidity artifacts used by zksolc.

We are using our fork of the upstream solc compiler. The fork is necessary to support several ZKsync-specific features and workarounds.

System Requirements

It is recommended to have at least 4 GB of RAM to compile large projects. The compilation process is parallelized by default, so the number of threads used is equal to the number of CPU cores.

Large projects can consume a lot of RAM during compilation on machines with a high number of cores. If you encounter memory issues, consider reducing the number of threads using the --threads option.

The table below outlines the supported platforms and architectures:

CPU/OSMacOSLinuxWindows
x86_64
arm64

Please avoid using outdated distributions of operating systems, as they may lack the necessary dependencies or include outdated versions of them. zksolc is only tested on recent versions of popular distributions, such as MacOS 11.0 and Windows 10.

musl-based builds are deprecated, but they are still supported to preserve tooling compatibility.
Starting from zksolc v1.5.3, we are shipping builds statically linked with the GNU C library.

Installing zksolc

You can install the ZKsync Solidity compiler toolchain using the following methods:

  1. Use Foundry, Hardhat, or other popular toolkits, so they will manage the compiler installation and their dependencies for you. See Ethereum Development Toolkits.
  2. Download pre-built binaries of solc and zksolc. See Static Executables.
  3. Build zksolc from sources. See Building from Source.

For small projects, learning and research purposes, zksolc and solc executables without a toolkit are sufficient.

Installing solc

Running zksolc requires the fork of the Solidity compiler solc where we fixed several issues with lowering of EVM assembly to LLVM IR. The fork is called by zksolc as a child process. To point zksolc to the location of solc, use one of the following methods:

  1. Add the location of solc to the environment variable PATH.

    For example, if you have downloaded solc to the directory /home/username/opt, you can execute the following command, or append it to the configuration file of your shell:

    export PATH="/home/username/opt:${PATH}"
    
  2. Alternatively, when you run zksolc, provide the full path to solc using the --solc option.

    For example, if solc is located in your current working directory, you can point to it with this command:

    zksolc --solc './solc' --bin 'Greeter.sol'
    

The second option is more convenient if you are using different versions of solc for different projects. The current version of zksolc supports solc versions from 0.4.12 to 0.8.28.

Versioning

The zksolc versioning scheme does not yet follow the Semantic Versioning specification. Instead, its major and minor versions match those of the EraVM protocol for which zksolc produces bytecode. The patch version is incremented with each release, regardless of whether breaking changes are introduced. Therefore, please consult the changelog before updating the compiler.

Versions of our solc fork consist of two semver-compatible parts:

  1. Original upstream version
  2. ZKsync revision

For instance, the latest revision of the latest version of solc is 0.8.28-1.0.1. Here are the ZKsync revisions released by now:

RevisionFeatures
v1.0.0Fixed compatibility between EVM assembly and LLVM IR
v1.0.1Fixed a compiler crash with nested try-catch patterns

We recommend always using the latest version of zksolc and solc to benefit from the latest features and bug fixes. Starting from zksolc v1.5.8, it is not possible to use the original solc with zksolc anymore.

Ethereum Development Toolkits

For large codebases, it is more convenient to use the ZKsync compiler toolchain via toolkits like Foundry and Hardhat. These tools manage the compiler executables and their dependencies, and provide additional features like incremental compilation and caching.

The ZKsync toolchain is supported by the following toolkits:

  1. Foundry
  2. Hardhat

Static Executables

We ship zksolc binaries on the releases page of matter-labs/era-compiler-solidity repository. This repository maintains intuitive and stable naming for the executables and provides a changelog for each release. Tools using zksolc will download the binaries from this repository and cache them locally.

The matter-labs/era-compiler-solidity repository only contains builds for versions 1.4.0 and newer.
You can download older versions from the main branch or the releases page of the deprecated repository for zksolc executables.
If any of your projects are still using the old locations, please change their download URLs to the new one.

All binaries are statically linked and must work on all recent platforms without issues. zksolc is fully written in Rust, aiming to minimize incompatibilities with the environment.

Building from Source

Please consider using the pre-built executables before building from source. Building from source is only necessary for development, research, and debugging purposes. Deployment and production use cases should rely only on the officially released executables.

  1. Install the necessary system-wide dependencies.

    • For Linux (Debian):
    apt install cmake ninja-build curl git libssl-dev pkg-config clang lld
    
    • For Linux (Arch):
    pacman -Syu which cmake ninja curl git pkg-config clang lld
    
    • For MacOS:

      1. Install the Homebrew package manager by following the instructions at brew.sh.

      2. Install the necessary system-wide dependencies:

        brew install cmake ninja coreutils
        
      3. Install a recent build of the LLVM/Clang compiler using one of the following tools:

  2. Install Rust.

    The easiest way to do it is following the latest official instructions.

The Rust version used for building is pinned in the rust-toolchain.toml file at the repository root. cargo will automatically download the pinned version of rustc when you start building the project.

  1. Clone and checkout this repository.

    git clone https://github.com/matter-labs/era-compiler-solidity
    
  2. Install the ZKsync LLVM framework builder. This tool clones the repository of ZKsync LLVM Framework and runs a sequence of build commands tuned for the needs of ZKsync compiler toolchain.

    cargo install compiler-llvm-builder
    

    To fine-tune your build of ZKsync LLVM framework, refer to the section Fine tuning ZKsync LLVM build

Always use the latest version of the builder to benefit from the latest features and bug fixes. To check for new versions and update the builder, simply run cargo install compiler-llvm-builder again, even if you have already installed the builder. The builder is not the ZKsync LLVM framework itself, but a tool to build it. By default, it is installed in ~/.cargo/bin/, which is usually added to your PATH during the Rust installation process.

  1. Clone and build the ZKsync LLVM framework using the zksync-llvm tool.

    # Navigate to the root of your local copy of this repository.
    cd era-compiler-solidity
    # Clone the ZKsync LLVM framework. The branch is specified in the file `LLVM.lock`.
    zksync-llvm clone
    # Build the ZKsync LLVM framework.
    zksync-llvm build
    

    For more information and available build options, run zksync-llvm build --help.

    You can also clone and build LLVM framework outside of the repository root. In this case, do the following:

    1. Provide an LLVM.lock file in the directory where you run zksync-llvm. See the default LLVM.lock for an example.

    2. Ensure that LLVM.lock selects the correct branch of the ZKsync LLVM Framework repository.

    3. Before proceeding to the next step, set the environment variable LLVM_SYS_170_PREFIX to the path of the directory with the LLVM build artifacts. Typically, it ends with target-llvm/build-final, which is the default LLVM target directory of the LLVM builder. For example:

      export LLVM_SYS_170_PREFIX=~/repositories/era-compiler-solidity/target-llvm/build-final 
      
  2. Build the zksolc executable.

    cargo build --release
    

    The zksolc executable will appear at ./target/release/zksolc, where you can run it directly or move it to another location.

    If cargo cannot find the LLVM build artifacts, return to the previous step and ensure that the LLVM_SYS_170_PREFIX environment variable is set to the absolute path of the directory target-llvm/build-final.

Tuning the ZKsync LLVM build

  • For more information and available build options, run zksync-llvm build --help.

  • Use the --use-ccache option to speed up the build process if you have ccache installed.

  • To build ZKsync LLVM framework using specific C and C++ compilers, pass additional arguments to CMake using the --extra-args option:

    # Pay special attention to character escaping.
    
    zksync-llvm build \
      --use-ccache \
      --extra-args \
        '\-DCMAKE_C_COMPILER=/opt/homebrew/Cellar/llvm@18/18.1.8/bin/clang' \
        '\-DCMAKE_BUILD_TYPE=Release' \
        '\-DCMAKE_CXX_COMPILER=/opt/homebrew/Cellar/llvm@18/18.1.8/bin/clang++' 
    

Building LLVM manually

  • If you prefer building your ZKsync LLVM manually, include the following flags in your CMake command:

    # We recommended using the latest version of CMake.
    
    -DLLVM_TARGETS_TO_BUILD='EraVM;EVM'
    -DLLVM_ENABLE_PROJECTS='lld'
    -DBUILD_SHARED_LIBS='Off'
    

For most users, the ZKsync LLVM builder is the recommended way to build the ZKsync LLVM framework. This section exists for the ZKsync toolchain developers and researchers with specific requirements and experience with the LLVM framework. We are going to present a more detailed guide for LLVM contributors in the future.

Command Line Interface (CLI)

The CLI of zksolc is designed with resemblance to the CLI of solc. There are several main input/output (I/O) modes in the zksolc interface:

The basic CLI and combined JSON modes are more light-weight and suitable for calling from the shell. The standard JSON mode is similar to client-server interaction, thus more suitable for using from other applications, such as Foundry.

All toolkits using zksolc must be operating in standard JSON mode and follow its specification. It will make the toolkits more robust and future-proof, as the standard JSON mode is the most versatile and used for the majority of popular projects.

This page focuses on the basic CLI mode. For more information on the other modes, see the corresponding combined JSON and standard JSON pages.

Basic CLI

Basic CLI mode is the simplest way to compile a file with the source code.

To compile a basic Solidity contract, make sure that the solc compiler is present in your environment and run the example from the --bin section.

The rest of this section describes the available CLI options and their usage. You may also check out zksolc --help for a quick reference.

--solc

Specifies the path to the solc compiler. Useful when the solc compiler is not available in the system path.

Usage:

zksolc './Simple.sol' --bin --solc '/path/to/solc'

Examples in the subsequent sections assume that solc is installed and available in the system path. If you prefer specifying the full path to solc, use the --solc option with the examples below.

--bin

Enables the output of compiled bytecode. The following command compiles a Solidity file and prints the bytecode:

zksolc './Simple.sol' --bin

Output:

======= Simple.sol:Simple =======
Binary:
0000008003000039000000400030043f0000000100200190000000130000c13d...

It is possible to dry-run the compilation without writing any output. To do this, simply omit --bin and other output options:

zksolc './Simple.sol'

Output:

Compiler run successful. No output requested. Use flags --metadata, --asm, --bin.

Input Files

zksolc supports multiple input files. The following command compiles two Solidity files and prints the bytecode:

zksolc './Simple.sol' './Complex.sol' --bin

Solidity import remappings are passed in the way as input files, but they are distinguished by a = symbol between source and destination. The following command compiles a Solidity file with a remapping and prints the bytecode:

zksolc './Simple.sol' 'github.com/ethereum/dapp-bin/=/usr/local/lib/dapp-bin/' --bin

zksolc does not handle remappings itself, but only passes them through to solc. Visit the solc documentation to learn more about the processing of remappings.

--libraries

Specifies the libraries to link with compiled contracts. The option accepts multiple string arguments. The safest way is to wrap each argument in single quotes, and separate them with a space.

The specifier has the following format: <ContractPath>:<ContractName>=<LibraryAddress>.

Usage:

zksolc './Simple.sol' --bin --libraries 'Simple.sol:Test=0x1234567890abcdef1234567890abcdef12345678'

There are two ways of linking libraries:

  1. At compile time, immediately after the contract is compiled.
  2. At deploy time (a.k.a. post-compile time), right before the contract is deployed.

The use case above describes linking at compile time. For linking at deploy time, see the linker documentation.

--base-path, --include-path, --allow-paths

These options are used to specify Solidity import resolution settings. They are not used by zksolc and only passed through to solc like import remappings.

Visit the solc documentation to learn more about the processing of these options.

--asm

Enables the output of contract assembly. The assembly format depends on the --target architecture the contract is compiled for.

For the EraVM assembly specification, visit the EraVM documentation.

EVM assembly is not supported yet.

Usage:

zksolc Simple.sol --asm

Output:

======= Simple.sol:Simple =======
EraVM assembly:
        .text
        .file   "Simple.sol:Simple"
        .globl  __entry
__entry:
.func_begin0:
        add     128, r0, r3
        stm.h   64, r3
...

The --asm option can be combined with other output options, such as --bin:

zksolc './Simple.sol' --asm --bin

--metadata

Enables the output of contract metadata. The metadata is a JSON object that contains information about the contract, such as its name, source code hash, the list of dependencies, compiler versions, and so on.

The zksolc metadata format is compatible with the Solidity metadata format. This means that the metadata output can be used with other tools that support Solidity metadata. Essentially, solc metadata is a part of zksolc metadata, and it is included as source_metadata without any modifications.

Usage:

zksolc './Simple.sol' --metadata

Output:

======= Simple.sol:Simple =======
Metadata:
{"llvm_options":[],"optimizer_settings":{"is_debug_logging_enabled":false,"is_fallback_to_size_enabled":false,"is_verify_each_enabled":false,"level_back_end":"Aggressive","level_middle_end":"Aggressive","level_middle_end_size":"Zero"},"solc_version":"x.y.z","solc_zkvm_edition":null,"source_metadata":{...},"zk_version":"x.y.z"}

--output-dir

Specifies the output directory for build artifacts. Can only be used in basic CLI and combined JSON modes.

Usage in basic CLI mode:

zksolc './Simple.sol' --bin --asm --metadata --output-dir './build/'
ls './build/Simple.sol'

Output:

Compiler run successful. Artifact(s) can be found in directory "build".
...
Test.zasm       Test.zbin       Test_meta.json

Usage in combined JSON mode:

zksolc './Simple.sol' --combined-json 'bin,asm,metadata' --output-dir './build/'
ls './build/'

Output:

Compiler run successful. Artifact(s) can be found in directory "build".
...
combined.json

--overwrite

Overwrites the output files if they already exist in the output directory. By default, zksolc does not overwrite existing files.

Can only be used in combination with the --output-dir option.

Usage:

zksolc './Simple.sol' --combined-json 'bin,asm,metadata' --output-dir './build/' --overwrite

If the --overwrite option is not specified and the output files already exist, zksolc will print an error message and exit:

Error: Refusing to overwrite an existing file "build/combined.json" (use --overwrite to force).

--version

Prints the version of zksolc and the hash of the LLVM commit it was built with.

Usage:

zksolc --version

--help

Prints the help message.

Usage:

zksolc --help

Other I/O Modes

The mode-altering CLI options are mutually exclusive. This means that only one of the options below can be enabled at a time:

  • --standard-json
  • --combined-json
  • --yul
  • --llvm-ir
  • --eravm-assembly
  • --disassemble
  • --link

--standard-json

For the standard JSON mode usage, see the Standard JSON page.

--combined-json

For the combined JSON mode usage, see the Combined JSON page.

zksolc Compilation Settings

The options in this section are only configuring the zksolc compiler and do not affect the underlying solc compiler.

--optimization / -O

Sets the optimization level of the LLVM optimizer. Available values are:

LevelMeaningHints
0No optimizationBest compilation speed: for active development
1Performance: basicFor optimization research
2Performance: defaultFor optimization research
3Performance: aggressiveDefault value. Best performance: for production
sSize: defaultFor optimization research
zSize: aggressiveBest size: for contracts with size constraints

For most cases, it is fine to use the default value of 3. You should only use the level z if you are ready to deliberately sacrifice performance and optimize for size.

Large contracts may hit the EraVM or EVM bytecode size limit. In this case, it is recommended to use the --fallback-Oz option rather than set the z level.

--fallback-Oz

Sets the optimization level to z for contracts that failed to compile due to overrunning the bytecode size constraints.

Under the hood, this option automatically triggers recompilation of contracts with level z. Contracts that were successfully compiled with the original --optimization setting are not recompiled.

It is recommended to have this option always enabled to prevent compilation failures due to bytecode size constraints. There are no known downsides to using this option.

--metadata-hash

Specifies the hash function used for contract metadata.

The following values are allowed:

ValueSizePaddingReference
none0 B0-32 B
keccak25632 B0-32 BSHA-3 Wikipedia Page
ipfs44 B20-52 BIPFS Documentation

The default value is keccak256.

EraVM requires its bytecode size to be an odd number of 32-byte words. If the size after appending the hash does not satisfy this requirement, the hash is prepended with zeros according to the Padding column in the table above.

Usage:

zksolc './Simple.sol' --bin --metadata-hash 'ipfs'

--enable-eravm-extensions

Enables the EraVM extensions.

If this flag is set, calls to addresses 0xFFFF and below are substituted by special EraVM instructions.

In Yul mode, the verbatim_* instruction family becomes available.

The full list of EraVM extensions and their usage can be found here.

Usage:

zksolc './Simple.sol' --bin --enable-eravm-extensions

--suppress-errors

Tells the compiler to suppress specified errors. The option accepts multiple string arguments, so make sure they are properly separated by whitespace.

Only one error can be suppressed with this option: sendtransfer.

Usage:

zksolc './Simple.sol' --bin --suppress-errors 'sendtransfer'

--suppress-warnings

Tells the compiler to suppress specified warnings. The option accepts multiple string arguments, so make sure they are properly separated by whitespace.

Only one warning can be suppressed with this option: txorigin.

Usage:

zksolc './Simple.sol' --bin --suppress-warnings 'txorigin'

--llvm-options

Specifies additional options for the LLVM framework. The argument must be a single quoted string following a = separator.

Usage:

zksolc './Simple.sol' --bin --llvm-options='-eravm-jump-table-density-threshold=10'

The --llvm-options option is experimental and must only be used by experienced users. All supported options will be documented in the future.

solc Compilation Settings

The options in this section are only configuring solc, so they are passed directly to its child process, and do not affect the zksolc compiler.

--codegen

Specifies the solc codegen. The following values are allowed:

ValueDescriptionHints
evmlaEVM legacy assemblysolc default for EVM/L1
yulYul a.k.a. IRzksolc default for ZKsync

solc uses the evmla codegen by default. However, zksolc uses the yul codegen by default for historical reasons. Codegens are not equivalent and may lead to different behavior in production. Make sure that this option is set to evmla if you want your contracts to behave as they would on L1. For codegen differences, visit the solc IR breaking changes page. zksolc is going to switch to the evmla codegen by default in the future in order to have more parity with L1.

Usage:

zksolc './Simple.sol' --bin --codegen 'evmla'

--evm-version

Specifies the EVM version solc will produce artifacts for. Only artifacts such as Yul and EVM assembly are known to be affected by this option. For instance, if the EVM version is set to cancun, then Yul and EVM assembly may contain MCOPY instructions.

EVM version only affects IR artifacts produced by solc and does not affect EraVM bytecode produced by zksolc.

The default value is chosen by solc. For instance, solc v0.8.24 and older use shanghai by default, whereas newer ones use cancun.

The following values are allowed, however have in mind that newer EVM versions are only supported by newer versions of solc:

  • homestead
  • tangerineWhistle
  • spuriousDragon
  • byzantium
  • constantinople
  • petersburg
  • istanbul
  • berlin
  • london
  • paris
  • shanghai
  • cancun
  • prague

Usage:

zksolc './Simple.sol' --bin --evm-version 'cancun'

For more information on how solc handles EVM versions, see its EVM version documentation.

--metadata-literal

Tells solc to store referenced sources as literal data in the metadata output.

This option only affects the contract metadata output produced by solc, and does not affect artifacts produced by zksolc.

Usage:

zksolc './Simple.sol' --bin --metadata-literal

Multi-Language Support

zksolc supports input in multiple programming languages:

The following sections outline how to use zksolc with these languages.

--yul

Enables the Yul mode. In this mode, input is expected to be in the Yul language. The output works the same way as with Solidity input.

Usage:

zksolc --yul './Simple.yul' --bin

Output:

======= Simple.yul =======
Binary:
0000000100200190000000060000c13d0000002a01000039000000000010043f...

zksolc is able to compile Yul without solc. However, using solc is still recommended as it provides additional validation, diagnostics and better error messages:

zksolc --yul './Simple.yul' --bin --solc '/path/to/solc'

zksolc features its own dialect of Yul with extensions for EraVM. If the extensions are enabled, it is not possible to use solc for validation.

--llvm-ir

Enables the LLVM IR mode. In this mode, input is expected to be in the LLVM IR language. The output works the same way as with Solidity input.

Unlike solc, zksolc is an LLVM-based compiler toolchain, so it uses LLVM IR as an intermediate representation. It is not recommended to write LLVM IR manually, but it can be useful for debugging and optimization purposes. LLVM IR is more low-level than Yul in the ZKsync compiler toolchain IR hierarchy, so solc is not used for compilation.

Usage:

zksolc --llvm-ir './Simple.ll' --bin

Output:

======= Simple.ll =======
Binary:
000000000002004b000000070000613d0000002001000039000000000010043f...

--eravm-assembly

Enables the EraVM Assembly mode. In this mode, input is expected to be in the EraVM assembly language. The output works the same way as with Solidity input.

EraVM assembly is a representation the closest to EraVM bytecode. It is not recommended to write EraVM assembly manually, but it can be even more useful for debugging and optimization purposes than LLVM IR.

For the EraVM assembly specification, visit the EraVM documentation.

Usage:

zksolc --eravm-assembly './Simple.zasm' --bin

Output:

======= Simple.zasm =======
Binary:
000000000120008c000000070000613d00000020010000390000000000100435...

Multi-Target Support

zksolc is an LLVM-based compiler toolchain, so it is easily extensible to support multiple target architectures. The following targets are supported:

  • eravmEraVM (default).
  • evmEVM (under development and only available for testing).

--target

Specifies the target architecture for the compiled contract.

The --target option is experimental and must be passed as a CLI argument in all modes including combined JSON and standard JSON.

Usage:

zksolc Simple.sol --bin --target evm

Output:

======= Simple.sol:Simple =======
Binary:
0000008003000039000000400030043f0000000100200190000000130000c13d...

Integrated Tooling

zksolc includes several tools provided by the LLVM framework out of the box, such as disassembler and linker. The following sections describe the usage of these tools.

--disassemble

Enables the disassembler mode.

zksolc includes an LLVM-based disassembler that can be used to disassemble compiled bytecode.

The disassembler input must be files with a hexadecimal string. The disassembler output is a human-readable representation of the bytecode, also known as EraVM assembly.

Usage:

cat './input.zbin'

Output:

0x0000008003000039000000400030043f0000000100200190000000140000c13d00000000020...
zksolc --disassemble './input.zbin'

Output:

File `input.zbin` disassembly:

       0: 00 00 00 80 03 00 00 39       add     128, r0, r3
       8: 00 00 00 40 00 30 04 3f       stm.h   64, r3
      10: 00 00 00 01 00 20 01 90       and!    1, r2, r0
      18: 00 00 00 14 00 00 c1 3d       jump.ne 20
      20: 00 00 00 00 02 01 00 19       add     r1, r0, r2
      28: 00 00 00 0b 00 20 01 98       and!    code[11], r2, r0
      30: 00 00 00 23 00 00 61 3d       jump.eq 35
      38: 00 00 00 00 01 01 04 3b       ldp     r1, r1

Enables the linker mode.

For the linker usage, visit the linker documentation.

Debugging

--debug-output-dir

Specifies the directory to store intermediate build artifacts. The artifacts can be useful for debugging and research.

The directory is created if it does not exist. If artifacts are already present in the directory, they are overwritten.

The intermediate build artifacts can be:

NameCodegenFile extension
EVM Assemblyevmlaevmla
EthIRevmlaethir
Yulyulyul
LLVM IRevmla, yulll
EraVM Assemblyevmla, yulzasm

Usage:

zksolc './Simple.sol' --bin --debug-output-dir './debug/'
ls './debug/'

Output:

Compiler run successful. No output requested. Use flags --metadata, --asm, --bin.
...
Simple.sol.C.runtime.optimized.ll
Simple.sol.C.runtime.unoptimized.ll
Simple.sol.C.yul
Simple.sol.C.zasm
Simple.sol.Test.runtime.optimized.ll
Simple.sol.Test.runtime.unoptimized.ll
Simple.sol.Test.yul
Simple.sol.Test.zasm

The output file name is constructed as follows: <ContractPath>.<ContractName>.<Modifiers>.<Extension>.

--llvm-verify-each

Enables the verification of the LLVM IR after each optimization pass. This option is useful for debugging and research purposes.

Usage:

zksolc './Simple.sol' --bin --llvm-verify-each

--llvm-debug-logging

Enables the debug logging of the LLVM IR optimization passes. This option is useful for debugging and research purposes.

Usage:

zksolc './Simple.sol' --bin --llvm-debug-logging

Standard JSON

Standard JSON is a protocol for interaction with the zksolc and solc compilers. This protocol must be implemented by toolkits such as Hardhat and Foundry.

The protocol uses two data formats for communication: input JSON and output JSON.

Usage

Input JSON can be provided by-value via the --standard-json option:

zksolc --standard-json './input.json'

Alternatively, the input JSON can be fed to zksolc via stdin:

cat './input.json' | zksolc --standard-json

After receiving output JSON, the calling program can process it according to its needs. For projects with deployable libraries, calling the linker is usually required before compiled contracts are ready for deployment.

For the sake of interface unification, zksolc will always return with exit code 0 and have its standard JSON output printed to stdout. It differs from solc that may return with exit code 1 and a free-formed error in some cases, such as when the standard JSON input file is missing, even though the solc documentation claims otherwise.

The formats below are modifications of the original standard JSON input and output formats implemented by solc. It means that there are:

  • zksolc-specific options that are not present in the original format: they are marked as zksolc in the specifications below.
  • solc-specific options that are not supported by zksolc: they are not mentioned in the specifications below.

Input JSON

The input JSON provides the compiler with the source code and settings for the compilation. The example below serves as the specification of the input JSON format.

Internally, zksolc extracts all zksolc-specific options and converts the input JSON to the subset expected by solc before calling it.

{
  // Required: Source code language.
  // Currently supported: "Solidity", "Yul", "LLVM IR", "EraVM Assembly".
  "language": "Solidity",
  // Required: Source code files to compile.
  // The keys here are the "global" names of the source files. Imports can be using other file paths via remappings.
  "sources": {
    // In source file entry, either but not both "urls" and "content" must be specified.
    "myFile.sol": {
      // Required (unless "content" is used): URL(s) to the source file.
      "urls": [
        // In Solidity mode, directories must be added to the command-line via "--allow-paths <path>" for imports to work.
        // It is possible to specify multiple URLs for a single source file. In this case the first successfully resolved URL will be used.
        "/tmp/path/to/file.sol"
      ],
      // Required (unless "urls" is used): Literal contents of the source file.
      "content": "contract settable is owned { uint256 private x = 0; function set(uint256 _x) public { if (msg.sender == owner) x = _x; } }"
    }
  },

  // Required: Compilation settings.
  "settings": {
    // Optional: Optimizer settings.
    "optimizer": {
      // Optional, zksolc: Set the zksolc LLVM optimizer level.
      // Available options:
      // -0: do not optimize
      // -1: basic optimizations for gas usage
      // -2: advanced optimizations for gas usage
      // -3: all optimizations for gas usage
      // -s: basic optimizations for deployment cost
      // -z: all optimizations for deployment cost
      // Default: 3.
      "mode": "3",
      // Optional, zksolc: Re-run the compilation with "mode": "z" if the compilation with "mode": "3" fails due to EraVM bytecode size limit.
      // Used on a per-contract basis and applied automatically, so some contracts will end up compiled with "mode": "3", and others with "mode": "z".
      // Default: false.
      "fallbackToOptimizingForSize": false
    },

    // Optional: Sorted list of remappings.
    // Important: Only used with Solidity input.
    "remappings": [ ":g=/dir" ],
    // Optional: Addresses of the libraries.
    // If not all library addresses are provided here, it will result in unlinked bytecode files that will require post-compile-time linking before deployment.
    // Important: Only used with Solidity, Yul, and LLVM IR input.
    "libraries": {
      // The top level key is the name of the source file where the library is used.
      // If remappings are used, this source file should match the global path after remappings were applied.
      "myFile.sol": {
        // Source code library name and address where it is deployed.
        "MyLib": "0x123123..."
      }
    },

    // Optional: Version of the EVM solc will produce IR for.
    // Affects type checking and code generation.
    // Can be "homestead", "tangerineWhistle", "spuriousDragon", "byzantium", "constantinople", "petersburg", "istanbul", "berlin", "london", "paris", "shanghai", "cancun" or "prague" (experimental).
    // Only used with Solidity, and only affects Yul and EVM assembly codegen. For instance, with version "cancun", solc will produce `MCOPY` instructions, whereas with older EVM versions it will not.
    // Default: chosen by solc, is version-dependent.
    "evmVersion": "cancun",
    // Optional: Select the desired output.
    // Important: zksolc does not support per-file and per-contract selection.
    //
    // Available file-level options, must be listed under "*"."":
    //   ast                       AST of all source files
    //
    // Available contract-level options, must be listed under "*"."*":
    //   abi                       Solidity ABI
    //   evm.methodIdentifiers     Solidity function hashes
    //   storageLayout             Slots, offsets and types of the contract's state variables in storage
    //   transientStorageLayout    Slots, offsets and types of the contract's state variables in transient storage
    //   devdoc                    Developer documentation (natspec)
    //   userdoc                   User documentation (natspec)
    //   metadata                  Metadata
    //   evm.legacyAssembly        EVM assembly produced by solc
    //   irOptimized               Yul produced by solc
    //   eravm.assembly            EraVM assembly produced by zksolc
    //
    // Default: no flags are selected, so only bytecode is emitted.
    "outputSelection": {
      "*": {
        "": [
          "ast" // Enable the AST output for the project.
        ],
        "*": [
          "metadata", // Enable the metadata output for the project.
          "irOptimized", // Enable the Yul output for the project.
          "eravm.assembly" // Enable the EraVM assembly output for the project.
        ]
      }
    },
    // Optional: Metadata settings.
    "metadata": {
      // Optional: Use the given hash method for the metadata hash that is appended to the bytecode.
      // Available options: "none", "keccak256", "ipfs".
      // The metadata hash can be removed from the bytecode via option "none".
      // Default: "keccak256".
      "hashType": "ipfs",
      // Optional: Use only literal content and not URLs.
      // Passed through to solc and does not affect the zksolc-specific metadata.
      // Default: false.
      "useLiteralContent": true
    },

    // Optional: Solidity codegen.
    // Can be "evmla" or "yul".
    // In contract to solc, zksolc uses "Yul" codegen by default for solc v0.8.0 and newer. It will be fixed soon, so solc and zksolc defaults will be the same.
    // Default: "evmla" for solc <0.8.0, "yul" for solc >=0.8.0.
    "codegen": "Yul",
    // Optional, Deprecated, zksolc: Use "codegen" instead.
    // Default: false.
    "forceEVMLA": true,
    // Optional, zksolc: Enables the EraVM extensions in Solidity and Yul modes.
    // The extensions include EraVM-specific opcodes and features, such as call forwarding and usage of additional memory spaces.
    // Default: false.
    "enableEraVMExtensions": true,

    // Optional, zksolc: extra LLVM settings.
    "LLVMOptions": [
      "-eravm-jump-table-density-threshold", "10",
      "-tail-dup-size", "6",
      "-eravm-enable-split-loop-phi-live-ranges",
      "-tail-merge-only-bbs-without-succ",
      "-join-globalcopies",
      "-disable-early-taildup"
    ],
    // Optional, zksolc: suppressed errors.
    // Available options: "sendtransfer", "assemblycreate".
    "suppressedErrors": [
      "sendtransfer",
      "assemblycreate"
    ],
    // Optional, zksolc: suppressed warnings.
    // Available options: "txorigin".
    "suppressedWarnings": [
      "txorigin"
    ]
  }
}

Output JSON

The output JSON contains all artifacts produced by both zksolc and solc compilers. The example below serves as the specification of the input JSON format.

If solc is provided to zksolc, the output JSON is initially generated by solc, and ZKsync-specific data is appended by zksolc afterwards. If solc is not provided, the output JSON is generated by zksolc alone.

{
  // Required: File-level outputs.
  "sources": {
    "sourceFile.sol": {
      // Required: Identifier of the source.
      "id": 1,
      // Optional: The AST object.
      // Corresponds to "ast" in the outputSelection settings.
      "ast": {/* ... */}
    }
  },

  // Required: Contract-level outputs.
  "contracts": {
    // The source name.
    "sourceFile.sol": {
      // The contract name.
      // If the language only supports one contract per file, this field equals to the source name.
      "ContractName": {
        // Optional: The Ethereum Contract ABI (object).
        // See https://docs.soliditylang.org/en/develop/abi-spec.html.
        // Corresponds to "abi" in the outputSelection settings.
        // Provided by solc and passed through by zksolc.
        "abi": [/* ... */],
        // Optional: Storage layout (object).
        // Corresponds to "storageLayout" in the outputSelection settings.
        // Provided by solc and passed through by zksolc.
        "storageLayout": {/* ... */},
        // Optional: Transient storage layout (object).
        // Corresponds to "transientStorageLayout" in the outputSelection settings.
        // Provided by solc and passed through by zksolc.
        "transientStorageLayout": {/* ... */},
        // Optional: Developer documentation (natspec object).
        // Corresponds to "devdoc" in the outputSelection settings.
        // Provided by solc and passed through by zksolc.
        "devdoc": {/* ... */},
        // Optional: User documentation (natspec object).
        // Corresponds to "userdoc" in the outputSelection settings.
        // Provided by solc and passed through by zksolc.
        "userdoc": {/* ... */},
        // Optional: Contract metadata (object).
        // Corresponds to "metadata" in the outputSelection settings.
        // Provided by solc and wrapped with additional data by zksolc.
        "metadata": {/* ... */},
        // Optional: Yul produced by solc (string).
        // Corresponds to "irOptimized" in the outputSelection settings.
        // Provided by solc and passed through by zksolc.
        "irOptimized": "/* ... */",
        // Required: EraVM target outputs.
        "eravm": {
          // Required: EraVM bytecode (string).
          "bytecode": "0000008003000039000000400030043f0000000100200190000000130000c13d...",
          // Optional: EraVM assembly produced by zksolc (string).
          // Corresponds to "eravm.assembly" in the outputSelection settings.
          "assembly": "/* ... */"
        },
        // Required: EVM target outputs.
        // Warning: EraVM artifacts "bytecode" and "assembly" are still returned here within the "evm" object for backward compatibility, but all new applications must be reading from the "eravm" object.
        "evm": {
          // Required, Deprecated(EraVM): EVM bytecode.
          "bytecode": {
            // Required: Bytecode (string).
            "object": "0000008003000039000000400030043f0000000100200190000000130000c13d..."
          },
          // Optional: List of function hashes (object).
          // Corresponds to "evm.methodIdentifiers" in the outputSelection settings.
          // Provided by solc and passed through by zksolc.
          "methodIdentifiers": {
            // Mapping between the function signature and its hash.
            "delegate(address)": "5c19a95c"
          },
          // Optional: EVM assembly produced by solc (object).
          // Corresponds to "evm.legacyAssembly" in the outputSelection settings.
          // Provided by solc and passed through by zksolc.
          "legacyAssembly": {/* ... */},

          // Optional, Deprecated: EraVM assembly produced by zksolc (string).
          // Corresponds to "eravm.assembly" in the outputSelection settings.
          "assembly": "/* ... */"
        },

        // Required, zksolc(eravm): Bytecode hash.
        // Used to identify bytecode on ZKsync chains.
        "hash": "5ab89dcf...",
        // Required, zksolc(eravm): All factory dependencies, both linked and unlinked.
        // This field is useful if the full list of dependencies is needed, including those that could not have been linked yet.
        // Example: [ "default.sol:Test" ].
        "factoryDependenciesUnlinked": [/* ... */],
        // Required, zksolc(eravm): Mapping between bytecode hashes and full contract identifiers.
        // Only linked contracts are listed here due to the requirement of bytecode hash.
        // Example: { "5ab89dcf...": "default.sol:Test" }.
        "factoryDependencies": {/* ... */},
        // Required, zksolc(eravm): Mapping between full contract identifiers and library identifiers that must be linked after compilation.
        // Only unlinked libraries are listed here.
        // Example: { "default.sol:Test": "library.sol:Library" }.
        "missingLibraries": {/* ... */},
        // Required, zksolc: Binary object format.
        // Tells whether the bytecode has been linked.
        // Possible values: "elf" (unlinked), "raw" (linked).
        "objectFormat": "elf"
      }
    }
  },

  // Optional: Unset if no messages were emitted.
  "errors": [
    {
      // Optional: Location within the source file.
      // Unset if the error is unrelated to input sources.
      "sourceLocation": {
        /// Required: The source path.
        "file": "sourceFile.sol",
        /// Required: The source location start. Equals -1 if unknown.
        "start": 0,
        /// Required: The source location end. Equals -1 if unknown.
        "end": 100
      },
      // Required: Message type.
      // zksolc only produces "Error" and "Warning" types.
      // *solc* are listed at https://docs.soliditylang.org/en/latest/using-the-compiler.html#error-types.
      "type": "Error",
      // Required: Component the error originates from.
      // zksolc only produces "general".
      // *solc* may produce other values as well.
      "component": "general",
      // Required: Message severity.
      // zksolc only produces "Error" and "Warning" types.
      // *solc* "error", "warning" or "info". May be extended in the future.
      "severity": "error",
      // Optional: Unique code for the cause of the error.
      // Only *solc* produces error codes for now.
      // zksolc error classification is coming soon.
      "errorCode": "3141",
      // Required: Message.
      "message": "Invalid keyword",
      // Required: Message formatted using the source location.
      "formattedMessage": "sourceFile.sol:100: Invalid keyword"
    }
  ],

  // Required: Short semver-compatible solc compiler version.
  "version": "0.8.28",
  // Required: Full solc compiler version.
  "long_version": "0.8.28+commit.7893614a.Darwin.appleclang",
  // Required: Short semver-compatible zksolc compiler version.
  "zk_version": "1.5.8",
}

Combined JSON

Combined JSON is an I/O mode designed to provide a middle-ground experience between basic CLI and standard JSON. In this mode, input data is provided by the user via CLI, and JSON output can be easily read by both humans and programs calling zksolc as a child process.

Usage

To use combined JSON, pass the --combined-json flag to zksolc with the desired comma-separated output selectors:

zksolc './MyContract.sol' --combined-json 'ast,abi,metadata'

The following selectors are supported:

SelectorDescriptionTypeOrigin
abiSolidity ABIJSONsolc
hashesSolidity function hashesJSONsolc
metadataMetadataStringified JSONsolc
devdocDeveloper documentationJSON (NatSpec)solc
userdocUser documentationJSON (NatSpec)solc
storage-layoutSolidity storage layoutJSONsolc
transient-storage-layoutSolidity transientstorage layoutJSONsolc
astAST of the source fileJSONsolc
asmEVM assemblyJSONsolc
eravm-assemblyEraVM assemblyStringzksolc
binDeploy ytecode (always enabled)Hexadecimal stringzksolc
bin-runtimeRuntime bytecode (EVM-only, always enabled)Hexadecimal stringzksolc

Warning: It is only possible to use Combined JSON with Solidity input, so the path to solc must be always provided to zksolc. Support for other languages is planned for future releases.

Output Format

The format below is a modification of the original combined JSON output format implemented by solc. It means that there are:

  • zksolc-specific options that are not present in the original format: they are marked as zksolc in the specification below.
  • solc-specific options that are not supported by zksolc: they are not mentioned in the specification below.
{
  // Required: Contract outputs.
  "contracts": {
    "MyContract.sol:Test": {
      // Optional: Emitted if "hashes" selector is provided.
      "hashes": {/* ... */},
      // Optional: Emitted if "abi" selector is provided.
      "abi": [/* ... */],
      // Optional: Emitted if "metadata" selector is provided.
      "metadata": "/* ... */",
      // Optional: Emitted if "devdoc" selector is provided.
      "devdoc": {/* ... */},
      // Optional: Emitted if "userdoc" selector is provided.
      "userdoc": {/* ... */},
      // Optional: Emitted if "storage-layout" selector is provided.
      "storage-layout": {/* ... */},
      // Optional: Emitted if "transient-storage-layout" selector is provided.
      "transient-storage-layout": {/* ... */},
      // Optional: Emitted if "ast" selector is provided.
      "ast": {/* ... */},
      // Optional: Emitted if "asm" selector is provided.
      "asm": {/* ... */},

      // Optional: Emitted if "assembly" selector is provided.
      "assembly": "/* ... */",
      // Required: Bytecode is always emitted.
      "bin": "0000008003000039000000400030043f0000000100200190000000130000c13d...",
      // Required: Bytecode is always emitted.
      "bin-runtime": "0000008003000039000000400030043f0000000100200190000000130000c13d...",

      // Required, zksolc(eravm): All factory dependencies, both linked and unlinked.
      // This field is useful if the full list of dependencies is needed, including those that could not have been linked yet.
      // Example: [ "default.sol:Test" ].
      "factory-deps-unlinked": [/* ... */],
      // Required, zksolc(eravm): Mapping between bytecode hashes and full contract identifiers.
      // Only linked contracts are listed here due to the requirement of bytecode hash.
      // Example: { "5ab89dcf...": "default.sol:Test" }.
      "factory-deps": {/* ... */},
      // Required, zksolc(eravm): Unlinked EraVM libraries.
      // Example: [ "library.sol:Library" ].
      "missing-libraries": [/* ... */],
      // Required, zksolc: Binary object format.
      // Tells whether the bytecode has been linked.
      // Possible values: "elf" (unlinked), "raw" (linked).
      "object-format": "elf"
    }
  },
  // Optional: List of input files.
  // Only emitted if "ast" selector is provided.
  "sourceList": [
    "MyContract.sol"
  ],
  // Optional: List of input sources.
  // Only emitted if "ast" selector is provided.
  "sources": {
    "MyContract.sol": {
      // Required: Contract AST.
      "AST": {/* ... */}
      // Required: Contract index in "sourceList".
      "id": 0
    }
  },
  // Required: Version of solc.
  "version": "0.8.28+commit.acc7d8f9.Darwin.appleclang",
  // Required, zksolc: Version of zksolc.
  "zk_version": "1.5.8"
}

Linker

zksolc includes an LLVM-based linker that can be used for post-compile-time library linking.

For unlinked bytecode, the ZKsync compiler toolchain uses an ELF wrapper, which is the standard in the LLVM framework. ELF-wrapped bytecode cannot be deployed to the blockchain as-is; all library references must first be resolved. Once they are resolved, the ELF wrapper is stripped, leaving only the raw bytecode ready for deployment. This approach also results in unlinked and linked bytecode differing in size.

When compiling to EraVM, provide all build artifacts to the linker. Unlike EVM ones, EraVM dependencies are linked using the bytecode hash, so the linker must be able to derive the bytecode hash of all contracts in order to automatically resolve all dependencies.

The zksolc linker can be used in several ways:

JSON Protocol

This mode is suitable for integration with tooling such as Foundry. The linker features its own JSON protocol with input and output formats which are described in input and output sections below.

Input JSON can be provided by-value via the --standard-json option:

zksolc --link --standard-json './input.json'

Alternatively, the input JSON can be fed to zksolc via stdin:

cat './input.json' | zksolc --link --standard-json

Input

{
  // Input bytecode files mapping.
  "bytecodes": {
    // Input bytecode must be a valid ELF object.
    "tests/data/bytecodes/linker.zbin": "7f454c46010101ff000000000000000001000401010000000000000000000000..."
  },
  // Library specifiers array.
  "libraries": [
    // The format is following that of solc: "filename:libraryName=address".
    "Greeter.sol:GreeterHelper=0x1234567890abcdef1234567890abcdef12345678"
  ]
}

Output

{
  // Bytecode files where all library references have been successfully resolved.
  "linked": {
    "tests/data/bytecodes/linked.zbin": {
      // Linked EraVM bytecode, stripped of the ELF wrapper.
      "bytecode": "0000008003000039000000400030043f0000000100200190000000130000c13d...",
      // Hash of the bytecode used to identify EraVM dependencies during deployment.
      "hash": "010000d5bf4dd6262304eb67a95a76e6e4b0e9f1dc3d2c524c129c6464939407",
      // Resolved library specifiers.
      "linker_symbols": [
        // The format is following that of solc: "libraryPath:libraryName".
        "Greeter.sol:GreeterHelper"
      ],
      // Resolved factory dependency (CREATE/CREATE2) specifiers.
      "factory_dependencies": [
        // The format is "contractPath:contractName".
        // Dependencies are resolved automatically if all bytecode objects are passed to the linker.
        "Dependency.sol:GreeterDependency"
      ]
    }
  },
  // Lists of unresolved symbols, such as those not provided to the linker.
  // The linker caller must add the missing specifiers and call the linker again.
  "unlinked": {
    "tests/data/bytecodes/linker.zbin": {
      // Unresolved library specifiers.
      "linker_symbols": [
        // The format is following that of solc: "libraryPath:libraryName".
        "Greeter.sol:GreeterHelper"
      ],
      // Unresolved factory dependency (CREATE/CREATE2) specifiers.
      "factory_dependencies": [
        // The format is "contractPath:contractName".
        // Dependencies are resolved automatically if all bytecode objects are passed to the linker.
        "Dependency.sol:GreeterDependency"
      ]
    }
  },
  // Linked raw bytecode files that do not require linking, so they were not processed in the current call.
  "ignored": {
    "tests/data/bytecodes/ignored.zbin": {
      // Linked raw EraVM bytecode.
      "bytecode": "0000008003000039000000400030043f0000000100200190000000130000c13d...",
      // Hash of the bytecode used to identify EraVM dependencies during deployment.
      "hash": "010000d5bf4dd6262304eb67a95abcdefc3d2c524c129c6464939407"
    }
  }
}

Basic CLI

This mode is suitable for experiments and quick checks. Linking is done in several steps:

  1. A contract with a library dependency is compiled to bytecode:
// SPDX-License-Identifier: Unlicensed

pragma solidity ^0.8.0;

library GreeterHelper {
    function addPrefix(Greeter greeter, string memory great) public view returns (string memory) {
        return string.concat(greeter._prefix(), great);
    }
}

contract Greeter {
    string public greeting;
    string public _prefix;

    constructor(string memory _greeting) {
        greeting = _greeting;
        _prefix = "The greating is:";
    }

    function greet() public view returns (string memory) {
        return GreeterHelper.addPrefix(this, greeting);
    }
}
zksolc './Greeter.sol' --output-dir './output' --bin
  1. Check for unlinked library and factory dependency references.

It can be done with the following command, where the --library argument is intentionally omitted:

zksolc --link './output/Greeter.sol/Greeter.zbin'

Output:

{
  "linked": {},
  "unlinked": {
    "./output/Greeter.sol/Greeter.zbin": {
      "linker_symbols": ["Greeter.sol:GreeterHelper"],
      "factory_dependencies": []
    }
  },
  "ignored": {}
}
  1. Provide library addresses to the linker.

The library addresses must be provided in the --libraries argument:

zksolc --link './output/Greeter.sol/Greeter.zbin' --libraries 'Greeter.sol:GreeterHelper=0x1234567812345678123456781234567812345678'

Output:

{
  "linked": {
    "./output/Greeter.sol/Greeter.zbin": {
      "bytecode": "0000008003000039000000400030043f0000000100200190000000130000c13d...",
      "hash": "010000bd2bcef5602ae1ebc0b812cc65d88655a8d972ac10227f142e1838093c",
      "linker_symbols": ["Greeter.sol:GreeterHelper"],
      "factory_dependencies": []
    }
  },
  "unlinked": {},
  "ignored": {}
}

If you run the last command above once again, nothing will happen, and the previously linked file will show up as ignored:

{
  "linked": {},
  "unlinked": {},
  "ignored": {
    "./output/Greeter.sol/Greeter.zbin": {
      "bytecode": "0000008003000039000000400030043f0000000100200190000000130000c13d...",
      "hash": "010000bd2bcef5602ae1ebc0b812cc65d88655a8d972ac10227f142e1838093c"
    }
  }
}

EraVM Target Compilation Specification

This is a technical deep dive into the specifics of compiling for the EraVM target.

The deep dive outlines concepts, modules, and terms used in the EraVM target compilation process, such as:

The following sections provide a detailed specification of compilation of each individual instruction:

Finally, this document describes the binary layout of EraVM bytecode:

Glossary

TermDefinition
zksolcSolidity compiler developed by Matter Labs.
solcHigh-level Solidity compiler developed by the Ethereum community. Called by zksolc to get IRs and other auxiliary data.
LLVMThe world's most popular and powerful compiler framework, used for optimizations and assembly generation.
AssemblerTool that translates assembly to bytecode.
LinkerTool that links dependencies, such as libraries, before final bytecode can be emitted.
Virtual MachineZKsync Era virtual machine with a custom instruction set.
EraVM SpecificationA combination of human readable documentation and formal description of EraVM, including its structure, semantics, and encoding.
IRIntermediate representation used by the compiler internally to represent source code.
YulOne of two Solidity IRs. A superset of assembly available in Solidity. Used by default for contracts written in Solidity ≥0.8.
EVM AssemblyOne of two Solidity IRs. A predecessor of Yul that is closer to EVM bytecode. Used by default for contracts written in Solidity <0.8.
LLVM IRIR native to the LLVM framework.
EraVM AssemblyText representation of EraVM bytecode. Emitted by the LLVM framework. Translated into EraVM bytecode by the EraVM assembler.
EraVM BytecodeContract bytecode executed by EraVM.
StackSegment of non-persistent contract memory. Consists of two parts: global data and function stack frame.
HeapSegment of non-persistent contract memory. Allocation is handled by the solc’s allocator only.
Auxiliary heapSegment of non-persistent contract memory. Introduced to avoid conflicts with the solc’s allocator.
CalldataSegment of non-persistent contract memory. Heap or auxiliary heap of the parent/caller contract.
Return dataSegment of non-persistent contract memory. Heap or auxiliary heap of the child/callee contract.
StoragePersistent contract memory with no important differences from that of EVM.
Transient storageTransient contract memory with no important differences from that of EVM.
System contractsSet of ZKsync kernel contracts written in Solidity by Matter Labs.
Contract contextStorage of the VM keeping data such as current address, caller’s address, block timestamp, etc.

Code Separation

In both EVM and EraVM, contract bytecode is divided into two segments: deploy and runtime. The deploy code — also known as the constructor — runs only once when the contract is first deployed. In contrast, the runtime code executes every time the contract is called.

However, on EraVM, both segments are deployed together rather than split into two separate chunks. The constructor is simply added to the contract as a standard public function, which the System Contracts invoke during deployment.

Just like on the EVM, the deploy code on EraVM takes the form of a single constructor. Our compiler merges this constructor into the runtime code while generating LLVM IR, as illustrated in the minimal example below.

LLVM IR

In the EraVM subset of LLVM IR, the @__entry function’s arguments %0 through %11 correspond to EraVM registers r1 through r12.

Specifically, register r2 maps to the argument %1. This register contains a bit that indicates whether the call is for deploy code, and that flag is used to branch between deploy and runtime code blocks.

define i256 @__entry(ptr addrspace(3) nocapture readnone %0, i256 %1, i256 %2, i256 %3, i256 %4, i256 %5, i256 %6, i256 %7, i256 %8, i256 %9, i256 %10, i256 %11) local_unnamed_addr #1 personality ptr @__personality {
entry:
  %is_deploy_code_call_flag_truncated = and i256 %1, 1                                                          ; check if the call is a deploy code call
  %is_deploy_code_call_flag.not = icmp eq i256 %is_deploy_code_call_flag_truncated, 0                           ; invert the flag
  br i1 %is_deploy_code_call_flag.not, label %runtime_code_call_block, label %deploy_code_call_block            ; branch to the deploy code block if the flag is set

deploy_code_call_block:                           ; preds = %entry
  store i256 32, ptr addrspace(2) inttoptr (i256 256 to ptr addrspace(2)), align 256                            ; store the offset of the array of immutables
  store i256 0, ptr addrspace(2) inttoptr (i256 288 to ptr addrspace(2)), align 32                              ; store the length of the array of immutables
  tail call void @llvm.eravm.return(i256 53919893334301279589334030174039261352344891250716429051063678533632) ; return the array of immutables using EraVM return ABI data encoding
  unreachable

runtime_code_call_block:                          ; preds = %entry
  store i256 42, ptr addrspace(1) null, align 4294967296                                                        ; store a value to return
  tail call void @llvm.eravm.return(i256 2535301200456458802993406410752)                                      ; return the value using EraVM return ABI data encoding
  unreachable
}

EraVM Assembly

In EraVM assembly, the branching logic appears as follows:

__entry:
.func_begin0:
	and!	    1, r2, r0
	jump.ne	  @.BB0_1
	add	      r0, r0, r1
	retl	    @DEFAULT_FAR_RETURN
.BB0_1:
	add	32,   r0, r1
	stm.ah	  256, r1
	stm.ah	  288, r0
	add	      code[@CPI0_0], r0, r1
	retl	    @DEFAULT_FAR_RETURN

EVM Assembly Translator

Our toolchain uses two Solidity code generators:

Namesolc supportsolc default
EVM Assembly>=0.4.12<0.8
Yul>=0.8.0>=0.8

ZKsync Fork of solc

EVM assembly is challenging to translate to LLVM IR because it obscures the contract’s control flow and relies heavily on dynamic jumps.

solc's EVM assembly representation introduces several challenges for our LLVM IR translator:

  1. Internal function pointers are stored in memory or storage, then dynamically loaded and called.
  2. Each iteration of local recursion allocates an additional stack frame.
  3. Some try-catch patterns leave values on the stack, complicating stack analysis.

All of these issues have been resolved in our fork of solc, where we removed dynamic jumps and added the necessary metadata in the code generation process.

Source Code

In this and the following sections, you will find a minimal example of a Solidity contract, its EVM assembly, and its translation to LLVM IR, which is then compiled into EraVM bytecode.

contract Example {
  function main() public pure returns (uint256 result) {
    result = 42;
  }
}

EVM Legacy Assembly

Produced by solc v0.7.6.

| Line | Instruction  | Value/Tag |
| ---- | ------------ | --------- |
| 000  | PUSH         | 80        |
| 001  | PUSH         | 40        |
| 002  | MSTORE       |           |
| 003  | CALLVALUE    |           |
| 004  | DUP1         |           |
| 005  | ISZERO       |           |
| 006  | PUSH         | [tag] 1   |
| 007  | JUMPI        |           |
| 008  | PUSH         | 0         |
| 009  | DUP1         |           |
| 010  | REVERT       |           |
| 011  | Tag 1        |           |
| 012  | JUMPDEST     |           |
| 013  | POP          |           |
| 014  | PUSH         | 4         |
| 015  | CALLDATASIZE |           |
| 016  | LT           |           |
| 017  | PUSH         | [tag] 2   |
| 018  | JUMPI        |           |
| 019  | PUSH         | 0         |
| 020  | CALLDATALOAD |           |
| 021  | PUSH         | E0        |
| 022  | SHR          |           |
| 023  | DUP1         |           |
| 024  | PUSH         | 5A8AC02D  |
| 025  | EQ           |           |
| 026  | PUSH         | [tag] 3   |
| 027  | JUMPI        |           |
| 028  | Tag 2        |           |
| 029  | JUMPDEST     |           |
| 030  | PUSH         | 0         |
| 031  | DUP1         |           |
| 032  | REVERT       |           |
| 033  | Tag 3        |           |
| 034  | JUMPDEST     |           |
| 035  | PUSH         | [tag] 4   |
| 036  | PUSH         | [tag] 5   |
| 037  | JUMP         | [in]      |
| 038  | Tag 4        |           |
| 039  | JUMPDEST     |           |
| 040  | PUSH         | 40        |
| 041  | DUP1         |           |
| 042  | MLOAD        |           |
| 043  | SWAP2        |           |
| 044  | DUP3         |           |
| 045  | MSTORE       |           |
| 046  | MLOAD        |           |
| 047  | SWAP1        |           |
| 048  | DUP2         |           |
| 049  | SWAP1        |           |
| 050  | SUB          |           |
| 051  | PUSH         | 20        |
| 052  | ADD          |           |
| 053  | SWAP1        |           |
| 054  | RETURN       |           |
| 055  | Tag 5        |           |
| 056  | JUMPDEST     |           |
| 057  | PUSH         | 2A        |
| 058  | SWAP1        |           |
| 059  | JUMP         | [out]     |

EthIR

EthIR (Ethereal IR) is an intermediate representation developed specifically for our translator. It serves several key purposes:

  1. Tracking the stack state to identify jump destinations.
  2. Duplicating blocks that are reachable from predecessors with different stack states.
  3. Reconstructing the complete control-flow graph of the contract using the aforementioned data.
  4. Resolving dependencies and static data chunks.

Meaning of EthIR data:

  1. V_<name> - a value returned by an instruction.
  2. T_<tag> - the tag of an assembly block.
  3. 40 - a hexadecimal constant.
  4. tests/solidity/simple/default.sol:Test - a contract full path definition.

Stack legend format: [ <current_1> | <current_2> | ... | <current_N> ] - [ <popped_1> | <popped_2> | ... | <popped_N> ] + [ <pushed_1> | <pushed_2> | ... | <pushed_N> ].

// The default entry function of the contract.
function main {
// The maximum stack size in the function.
    stack_usage: 6
block_dt_0/0:                           // Deploy Code Tag 0, Instance 0.
// PUSHed 0x80 onto the stack.
    PUSH           80                                                               [  ] + [ 80 ]
// PUSHed 0x40 onto the stack.
    PUSH           40                                                               [ 80 ] + [ 40 ]
// POPped 0x40 at 0x80 from the stack to store 0x80 at 0x40.
    MSTORE                                                                          [  ] - [ 80 | 40 ]
// PUSHed CALLVALUE onto the stack.
    CALLVALUE                                                                       [  ] + [ V_CALLVALUE ]
    DUP1                                                                            [ V_CALLVALUE ] + [ V_CALLVALUE ]
    ISZERO                                                                          [ V_CALLVALUE ] - [ V_CALLVALUE ] + [ V_ISZERO ]
    PUSH [tag]     1                                                                [ V_CALLVALUE | V_ISZERO ] + [ T_1 ]
// JUMPI schedules rt_0/0 for analysis with the current stack state.
    JUMPI                                                                           [ V_CALLVALUE ] - [ V_ISZERO | T_1 ]
    PUSH           0                                                                [ V_CALLVALUE ] + [ 0 ]
    DUP1                                                                            [ V_CALLVALUE | 0 ] + [ 0 ]
    REVERT                                                                          [ V_CALLVALUE ] - [ 0 | 0 ]
block_dt_1/0: (predecessors: dt_0/0)    // Deploy Code Tag 1, Instance 0; the only predecessor of this block is dt_0/0.
// JUMPDESTs are ignored as we are only interested in the stack state and tag destinations.
    JUMPDEST                                                                        [ V_CALLVALUE ]
    POP                                                                             [  ] - [ V_CALLVALUE ]
    PUSH #[$]      tests/solidity/simple/default.sol:Test                           [  ] + [ tests/solidity/simple/default.sol:Test ]
    DUP1                                                                            [ tests/solidity/simple/default.sol:Test ] + [ tests/solidity/simple/default.sol:Test ]
    PUSH [$]       tests/solidity/simple/default.sol:Test                           [ tests/solidity/simple/default.sol:Test | tests/solidity/simple/default.sol:Test ] + [ tests/solidity/simple/default.sol:Test ]
    PUSH           0                                                                [ tests/solidity/simple/default.sol:Test | tests/solidity/simple/default.sol:Test | tests/solidity/simple/default.sol:Test ] + [ 0 ]
    CODECOPY                                                                        [ tests/solidity/simple/default.sol:Test ] - [ tests/solidity/simple/default.sol:Test | tests/solidity/simple/default.sol:Test | 0 ]
    PUSH           0                                                                [ tests/solidity/simple/default.sol:Test ] + [ 0 ]
    RETURN                                                                          [  ] - [ tests/solidity/simple/default.sol:Test | 0 ]
// The runtime code is analyzed in the same control-flow graph as the deploy code, as it is possible to call its functions from the constructor.
block_rt_0/0:                           // Deploy Code Tag 0, Instance 0.
    PUSH           80                                                               [  ] + [ 80 ]
    PUSH           40                                                               [ 80 ] + [ 40 ]
    MSTORE                                                                          [  ] - [ 80 | 40 ]
    CALLVALUE                                                                       [  ] + [ V_CALLVALUE ]
    DUP1                                                                            [ V_CALLVALUE ] + [ V_CALLVALUE ]
    ISZERO                                                                          [ V_CALLVALUE ] - [ V_CALLVALUE ] + [ V_ISZERO ]
    PUSH [tag]     1                                                                [ V_CALLVALUE | V_ISZERO ] + [ T_1 ]
    JUMPI                                                                           [ V_CALLVALUE ] - [ V_ISZERO | T_1 ]
    PUSH           0                                                                [ V_CALLVALUE ] + [ 0 ]
    DUP1                                                                            [ V_CALLVALUE | 0 ] + [ 0 ]
    REVERT                                                                          [ V_CALLVALUE ] - [ 0 | 0 ]
block_rt_1/0: (predecessors: rt_0/0)    // Runtime Code Tag 1, Instance 0; the only predecessor of this block is rt_0/0.
    JUMPDEST                                                                        [ V_CALLVALUE ]
    POP                                                                             [  ] - [ V_CALLVALUE ]
    PUSH           4                                                                [  ] + [ 4 ]
    CALLDATASIZE                                                                    [ 4 ] + [ V_CALLDATASIZE ]
    LT                                                                              [  ] - [ 4 | V_CALLDATASIZE ] + [ V_LT ]
    PUSH [tag]     2                                                                [ V_LT ] + [ T_2 ]
    JUMPI                                                                           [  ] - [ V_LT | T_2 ]
    PUSH           0                                                                [  ] + [ 0 ]
    CALLDATALOAD                                                                    [  ] - [ 0 ] + [ V_CALLDATALOAD ]
    PUSH           E0                                                               [ V_CALLDATALOAD ] + [ E0 ]
    SHR                                                                             [  ] - [ V_CALLDATALOAD | E0 ] + [ V_SHR ]
    DUP1                                                                            [ V_SHR ] + [ V_SHR ]
    PUSH           5A8AC02D                                                         [ V_SHR | V_SHR ] + [ 5A8AC02D ]
    EQ                                                                              [ V_SHR ] - [ V_SHR | 5A8AC02D ] + [ V_EQ ]
    PUSH [tag]     3                                                                [ V_SHR | V_EQ ] + [ T_3 ]
    JUMPI                                                                           [ V_SHR ] - [ V_EQ | T_3 ]
    Tag 2                                                                           [ V_SHR ]
// This instance is called with a different stack state using the JUMPI above.
block_rt_2/0: (predecessors: rt_1/0)    // Runtime Code Tag 2, Instance 0.
    JUMPDEST                                                                        [  ]
    PUSH           0                                                                [  ] + [ 0 ]
    DUP1                                                                            [ 0 ] + [ 0 ]
    REVERT                                                                          [  ] - [ 0 | 0 ]
// This instance is also called from rt_1/0, but using a fallthrough 'Tag 2'.
// Given different stack states, we create a new instance of the block operating on different data
// and potentially different tag destinations, although usually such blocks are merged back by LLVM.
block_rt_2/1: (predecessors: rt_1/0)    // Runtime Code Tag 2, Instance 1.
    JUMPDEST                                                                        [ V_SHR ]
    PUSH           0                                                                [ V_SHR ] + [ 0 ]
    DUP1                                                                            [ V_SHR | 0 ] + [ 0 ]
    REVERT                                                                          [ V_SHR ] - [ 0 | 0 ]
block_rt_3/0: (predecessors: rt_1/0)    // Runtime Code Tag 3, Instance 0.
    JUMPDEST                                                                        [ V_SHR ]
    PUSH [tag]     4                                                                [ V_SHR ] + [ T_4 ]
    PUSH [tag]     5                                                                [ V_SHR | T_4 ] + [ T_5 ]
    JUMP           [in]                                                             [ V_SHR | T_4 ] - [ T_5 ]
block_rt_4/0: (predecessors: rt_5/0)    // Runtime Code Tag 4, Instance 0.
    JUMPDEST                                                                        [ V_SHR | 2A ]
    PUSH           40                                                               [ V_SHR | 2A ] + [ 40 ]
    DUP1                                                                            [ V_SHR | 2A | 40 ] + [ 40 ]
    MLOAD                                                                           [ V_SHR | 2A | 40 ] - [ 40 ] + [ V_MLOAD ]
    SWAP2                                                                           [ V_SHR | V_MLOAD | 40 | 2A ]
    DUP3                                                                            [ V_SHR | V_MLOAD | 40 | 2A ] + [ V_MLOAD ]
    MSTORE                                                                          [ V_SHR | V_MLOAD | 40 ] - [ 2A | V_MLOAD ]
    MLOAD                                                                           [ V_SHR | V_MLOAD ] - [ 40 ] + [ V_MLOAD ]
    SWAP1                                                                           [ V_SHR | V_MLOAD | V_MLOAD ]
    DUP2                                                                            [ V_SHR | V_MLOAD | V_MLOAD ] + [ V_MLOAD ]
    SWAP1                                                                           [ V_SHR | V_MLOAD | V_MLOAD | V_MLOAD ]
    SUB                                                                             [ V_SHR | V_MLOAD ] - [ V_MLOAD | V_MLOAD ] + [ V_SUB ]
    PUSH           20                                                               [ V_SHR | V_MLOAD | V_SUB ] + [ 20 ]
    ADD                                                                             [ V_SHR | V_MLOAD ] - [ V_SUB | 20 ] + [ V_ADD ]
    SWAP1                                                                           [ V_SHR | V_ADD | V_MLOAD ]
    RETURN                                                                          [ V_SHR ] - [ V_ADD | V_MLOAD ]
block_rt_5/0: (predecessors: rt_3/0)    // Runtime Code Tag 5, Instance 0.
    JUMPDEST                                                                        [ V_SHR | T_4 ]
    PUSH           2A                                                               [ V_SHR | T_4 ] + [ 2A ]
    SWAP1                                                                           [ V_SHR | 2A | T_4 ]
// JUMP [out] is usually a return statement
    JUMP           [out]                                                            [ V_SHR | 2A ] - [ T_4 ]

Unoptimized LLVM IR

In LLVM IR, the required stack space is allocated at the start of the main function, and every stack operation uses a statically known stack pointer with an offset derived from EthIR.

; Function Attrs: nofree null_pointer_is_valid
define i256 @__entry(ptr addrspace(3) %0, i256 %1, i256 %2, i256 %3, i256 %4, i256 %5, i256 %6, i256 %7, i256 %8, i256 %9, i256 %10, i256 %11) #7 personality ptr @__personality {
entry:
  store ptr addrspace(3) %0, ptr @ptr_calldata, align 32
  %abi_pointer_value = ptrtoint ptr addrspace(3) %0 to i256
  %abi_pointer_value_shifted = lshr i256 %abi_pointer_value, 96
  %abi_length_value = and i256 %abi_pointer_value_shifted, 4294967295
  store i256 %abi_length_value, ptr @calldatasize, align 32
  %calldatasize = load i256, ptr @calldatasize, align 32
  %ptr_calldata = load ptr addrspace(3), ptr @ptr_calldata, align 32
  %calldata_end_pointer = getelementptr i8, ptr addrspace(3) %ptr_calldata, i256 %calldatasize
  store ptr addrspace(3) %calldata_end_pointer, ptr @ptr_return_data, align 32
  store ptr addrspace(3) %calldata_end_pointer, ptr @ptr_decommit, align 32
  %calldatasize1 = load i256, ptr @calldatasize, align 32
  %ptr_calldata2 = load ptr addrspace(3), ptr @ptr_calldata, align 32
  %calldata_end_pointer3 = getelementptr i8, ptr addrspace(3) %ptr_calldata2, i256 %calldatasize1
  store ptr addrspace(3) %calldata_end_pointer3, ptr @ptr_active, align 32
  store ptr addrspace(3) %calldata_end_pointer3, ptr getelementptr inbounds ([16 x ptr addrspace(3)], ptr @ptr_active, i256 0, i256 1), align 32
  store ptr addrspace(3) %calldata_end_pointer3, ptr getelementptr inbounds ([16 x ptr addrspace(3)], ptr @ptr_active, i256 0, i256 2), align 32
  store ptr addrspace(3) %calldata_end_pointer3, ptr getelementptr inbounds ([16 x ptr addrspace(3)], ptr @ptr_active, i256 0, i256 3), align 32
  store ptr addrspace(3) %calldata_end_pointer3, ptr getelementptr inbounds ([16 x ptr addrspace(3)], ptr @ptr_active, i256 0, i256 4), align 32
  store ptr addrspace(3) %calldata_end_pointer3, ptr getelementptr inbounds ([16 x ptr addrspace(3)], ptr @ptr_active, i256 0, i256 5), align 32
  store ptr addrspace(3) %calldata_end_pointer3, ptr getelementptr inbounds ([16 x ptr addrspace(3)], ptr @ptr_active, i256 0, i256 6), align 32
  store ptr addrspace(3) %calldata_end_pointer3, ptr getelementptr inbounds ([16 x ptr addrspace(3)], ptr @ptr_active, i256 0, i256 7), align 32
  store ptr addrspace(3) %calldata_end_pointer3, ptr getelementptr inbounds ([16 x ptr addrspace(3)], ptr @ptr_active, i256 0, i256 8), align 32
  store ptr addrspace(3) %calldata_end_pointer3, ptr getelementptr inbounds ([16 x ptr addrspace(3)], ptr @ptr_active, i256 0, i256 9), align 32
  store ptr addrspace(3) %calldata_end_pointer3, ptr getelementptr inbounds ([16 x ptr addrspace(3)], ptr @ptr_active, i256 0, i256 10), align 32
  store ptr addrspace(3) %calldata_end_pointer3, ptr getelementptr inbounds ([16 x ptr addrspace(3)], ptr @ptr_active, i256 0, i256 11), align 32
  store ptr addrspace(3) %calldata_end_pointer3, ptr getelementptr inbounds ([16 x ptr addrspace(3)], ptr @ptr_active, i256 0, i256 12), align 32
  store ptr addrspace(3) %calldata_end_pointer3, ptr getelementptr inbounds ([16 x ptr addrspace(3)], ptr @ptr_active, i256 0, i256 13), align 32
  store ptr addrspace(3) %calldata_end_pointer3, ptr getelementptr inbounds ([16 x ptr addrspace(3)], ptr @ptr_active, i256 0, i256 14), align 32
  store ptr addrspace(3) %calldata_end_pointer3, ptr getelementptr inbounds ([16 x ptr addrspace(3)], ptr @ptr_active, i256 0, i256 15), align 32
  store i256 %1, ptr @call_flags, align 32
  store i256 %2, ptr @extra_abi_data, align 32
  store i256 %3, ptr getelementptr inbounds ([10 x i256], ptr @extra_abi_data, i256 0, i32 1), align 32
  store i256 %4, ptr getelementptr inbounds ([10 x i256], ptr @extra_abi_data, i256 0, i32 2), align 32
  store i256 %5, ptr getelementptr inbounds ([10 x i256], ptr @extra_abi_data, i256 0, i32 3), align 32
  store i256 %6, ptr getelementptr inbounds ([10 x i256], ptr @extra_abi_data, i256 0, i32 4), align 32
  store i256 %7, ptr getelementptr inbounds ([10 x i256], ptr @extra_abi_data, i256 0, i32 5), align 32
  store i256 %8, ptr getelementptr inbounds ([10 x i256], ptr @extra_abi_data, i256 0, i32 6), align 32
  store i256 %9, ptr getelementptr inbounds ([10 x i256], ptr @extra_abi_data, i256 0, i32 7), align 32
  store i256 %10, ptr getelementptr inbounds ([10 x i256], ptr @extra_abi_data, i256 0, i32 8), align 32
  store i256 %11, ptr getelementptr inbounds ([10 x i256], ptr @extra_abi_data, i256 0, i32 9), align 32
  %is_deploy_code_call_flag_truncated = and i256 %1, 1
  %is_deploy_code_call_flag = icmp eq i256 %is_deploy_code_call_flag_truncated, 1
  br i1 %is_deploy_code_call_flag, label %deploy_code_call_block, label %runtime_code_call_block

return:                                           ; preds = %runtime_code_call_block, %deploy_code_call_block
  ret i256 0

deploy_code_call_block:                           ; preds = %entry
  call void @__deploy()
  br label %return

runtime_code_call_block:                          ; preds = %entry
  call void @__runtime()
  br label %return
}

; Function Attrs: nofree null_pointer_is_valid
define private void @__deploy() #7 personality ptr @__personality {
entry:
  call void @main(i1 true)
  br label %return

return:                                           ; preds = %entry
  ret void
}

; Function Attrs: nofree null_pointer_is_valid
define private void @__runtime() #7 personality ptr @__personality {
entry:
  call void @main(i1 false)
  br label %return

return:                                           ; preds = %entry
  ret void
}

; Function Attrs: nofree null_pointer_is_valid
define private void @main(i1 %0) #7 personality ptr @__personality {
entry:
  %stack_var_000 = alloca i256, align 32
  store i256 0, ptr %stack_var_000, align 32
  %stack_var_001 = alloca i256, align 32
  store i256 0, ptr %stack_var_001, align 32
  %stack_var_002 = alloca i256, align 32
  store i256 0, ptr %stack_var_002, align 32
  %stack_var_003 = alloca i256, align 32
  store i256 0, ptr %stack_var_003, align 32
  %stack_var_004 = alloca i256, align 32
  store i256 0, ptr %stack_var_004, align 32
  %stack_var_005 = alloca i256, align 32
  store i256 0, ptr %stack_var_005, align 32
  %stack_var_006 = alloca i256, align 32
  store i256 0, ptr %stack_var_006, align 32
  br i1 %0, label %"block_dt_0/0", label %"block_rt_0/0"

return:                                           ; No predecessors!
  ret void

"block_dt_0/0":                                   ; preds = %entry
  store i256 128, ptr %stack_var_000, align 32
  store i256 64, ptr %stack_var_001, align 32
  %argument_0 = load i256, ptr %stack_var_001, align 32
  %argument_1 = load i256, ptr %stack_var_000, align 32
  %memory_store_pointer = inttoptr i256 %argument_0 to ptr addrspace(1)
  store i256 %argument_1, ptr addrspace(1) %memory_store_pointer, align 1
  %get_u128_value = call i256 @llvm.eravm.getu128()
  store i256 %get_u128_value, ptr %stack_var_000, align 32
  %dup1 = load i256, ptr %stack_var_000, align 32
  store i256 %dup1, ptr %stack_var_001, align 32
  %argument_01 = load i256, ptr %stack_var_001, align 32
  %comparison_result = icmp eq i256 %argument_01, 0
  %comparison_result_extended = zext i1 %comparison_result to i256
  store i256 %comparison_result_extended, ptr %stack_var_001, align 32
  store i256 1, ptr %stack_var_002, align 32
  %conditional_dt_1_condition = load i256, ptr %stack_var_001, align 32
  %conditional_dt_1_condition_compared = icmp ne i256 %conditional_dt_1_condition, 0
  br i1 %conditional_dt_1_condition_compared, label %"block_dt_1/0", label %conditional_dt_1_join_block

"block_dt_1/0":                                   ; preds = %"block_dt_0/0"
  store i256 2, ptr %stack_var_000, align 32
  br label %"block_dt_2/0"

"block_dt_2/0":                                   ; preds = %"block_dt_1/0"
  store i256 0, ptr %stack_var_000, align 32
  %dup14 = load i256, ptr %stack_var_000, align 32
  store i256 %dup14, ptr %stack_var_001, align 32
  store i256 0, ptr %stack_var_002, align 32
  store i256 0, ptr %stack_var_003, align 32
  %argument_05 = load i256, ptr %stack_var_003, align 32
  %argument_16 = load i256, ptr %stack_var_002, align 32
  %argument_2 = load i256, ptr %stack_var_001, align 32
  %calldata_copy_destination_pointer = inttoptr i256 %argument_05 to ptr addrspace(1)
  %calldata_pointer = load ptr addrspace(3), ptr @ptr_calldata, align 32
  %calldata_source_pointer = getelementptr i8, ptr addrspace(3) %calldata_pointer, i256 %argument_16
  call void @llvm.memcpy.p1.p3.i256(ptr addrspace(1) align 1 %calldata_copy_destination_pointer, ptr addrspace(3) align 1 %calldata_source_pointer, i256 %argument_2, i1 false)
  store i256 0, ptr %stack_var_001, align 32
  %argument_07 = load i256, ptr %stack_var_001, align 32
  %argument_18 = load i256, ptr %stack_var_000, align 32
  store i256 32, ptr addrspace(2) inttoptr (i256 256 to ptr addrspace(2)), align 1
  store i256 0, ptr addrspace(2) inttoptr (i256 288 to ptr addrspace(2)), align 1
  call void @__return(i256 256, i256 64, i256 2)
  unreachable

"block_rt_0/0":                                   ; preds = %entry
  store i256 128, ptr %stack_var_000, align 32
  store i256 64, ptr %stack_var_001, align 32
  %argument_09 = load i256, ptr %stack_var_001, align 32
  %argument_110 = load i256, ptr %stack_var_000, align 32
  %memory_store_pointer11 = inttoptr i256 %argument_09 to ptr addrspace(1)
  store i256 %argument_110, ptr addrspace(1) %memory_store_pointer11, align 1
  %get_u128_value12 = call i256 @llvm.eravm.getu128()
  store i256 %get_u128_value12, ptr %stack_var_000, align 32
  %dup113 = load i256, ptr %stack_var_000, align 32
  store i256 %dup113, ptr %stack_var_001, align 32
  %argument_014 = load i256, ptr %stack_var_001, align 32
  %comparison_result15 = icmp eq i256 %argument_014, 0
  %comparison_result_extended16 = zext i1 %comparison_result15 to i256
  store i256 %comparison_result_extended16, ptr %stack_var_001, align 32
  store i256 1, ptr %stack_var_002, align 32
  %conditional_rt_1_condition = load i256, ptr %stack_var_001, align 32
  %conditional_rt_1_condition_compared = icmp ne i256 %conditional_rt_1_condition, 0
  br i1 %conditional_rt_1_condition_compared, label %"block_rt_1/0", label %conditional_rt_1_join_block

"block_rt_1/0":                                   ; preds = %"block_rt_0/0"
  store i256 4, ptr %stack_var_000, align 32
  %calldatasize = load i256, ptr @calldatasize, align 32
  store i256 %calldatasize, ptr %stack_var_001, align 32
  %argument_019 = load i256, ptr %stack_var_001, align 32
  %argument_120 = load i256, ptr %stack_var_000, align 32
  %comparison_result21 = icmp ult i256 %argument_019, %argument_120
  %comparison_result_extended22 = zext i1 %comparison_result21 to i256
  store i256 %comparison_result_extended22, ptr %stack_var_000, align 32
  store i256 2, ptr %stack_var_001, align 32
  %conditional_rt_2_condition = load i256, ptr %stack_var_000, align 32
  %conditional_rt_2_condition_compared = icmp ne i256 %conditional_rt_2_condition, 0
  br i1 %conditional_rt_2_condition_compared, label %"block_rt_2/0", label %conditional_rt_2_join_block

"block_rt_2/0":                                   ; preds = %"block_rt_1/0"
  store i256 0, ptr %stack_var_000, align 32
  store i256 0, ptr %stack_var_001, align 32
  %argument_032 = load i256, ptr %stack_var_001, align 32
  %argument_133 = load i256, ptr %stack_var_000, align 32
  call void @__revert(i256 %argument_032, i256 %argument_133, i256 0)
  unreachable

"block_rt_2/1":                                   ; preds = %conditional_rt_3_join_block
  store i256 0, ptr %stack_var_001, align 32
  store i256 0, ptr %stack_var_002, align 32
  %argument_034 = load i256, ptr %stack_var_002, align 32
  %argument_135 = load i256, ptr %stack_var_001, align 32
  call void @__revert(i256 %argument_034, i256 %argument_135, i256 0)
  unreachable

"block_rt_3/0":                                   ; preds = %conditional_rt_2_join_block
  store i256 4, ptr %stack_var_001, align 32
  store i256 5, ptr %stack_var_002, align 32
  br label %"block_rt_5/0"

"block_rt_4/0":                                   ; preds = %"block_rt_6/0"
  store i256 64, ptr %stack_var_002, align 32
  %argument_036 = load i256, ptr %stack_var_002, align 32
  %memory_load_pointer = inttoptr i256 %argument_036 to ptr addrspace(1)
  %memory_load_result = load i256, ptr addrspace(1) %memory_load_pointer, align 1
  store i256 %memory_load_result, ptr %stack_var_002, align 32
  %dup137 = load i256, ptr %stack_var_002, align 32
  store i256 %dup137, ptr %stack_var_003, align 32
  %dup3 = load i256, ptr %stack_var_001, align 32
  store i256 %dup3, ptr %stack_var_004, align 32
  %dup2 = load i256, ptr %stack_var_003, align 32
  store i256 %dup2, ptr %stack_var_005, align 32
  %argument_038 = load i256, ptr %stack_var_005, align 32
  %argument_139 = load i256, ptr %stack_var_004, align 32
  %memory_store_pointer40 = inttoptr i256 %argument_038 to ptr addrspace(1)
  store i256 %argument_139, ptr addrspace(1) %memory_store_pointer40, align 1
  store i256 32, ptr %stack_var_004, align 32
  %argument_041 = load i256, ptr %stack_var_004, align 32
  %argument_142 = load i256, ptr %stack_var_003, align 32
  %addition_result = add i256 %argument_041, %argument_142
  store i256 %addition_result, ptr %stack_var_003, align 32
  %swap2_top_value = load i256, ptr %stack_var_003, align 32
  %swap2_swap_value = load i256, ptr %stack_var_001, align 32
  store i256 %swap2_swap_value, ptr %stack_var_003, align 32
  store i256 %swap2_top_value, ptr %stack_var_001, align 32
  store i256 64, ptr %stack_var_002, align 32
  %argument_043 = load i256, ptr %stack_var_002, align 32
  %memory_load_pointer44 = inttoptr i256 %argument_043 to ptr addrspace(1)
  %memory_load_result45 = load i256, ptr addrspace(1) %memory_load_pointer44, align 1
  store i256 %memory_load_result45, ptr %stack_var_002, align 32
  %dup146 = load i256, ptr %stack_var_002, align 32
  store i256 %dup146, ptr %stack_var_003, align 32
  %swap2_top_value47 = load i256, ptr %stack_var_003, align 32
  %swap2_swap_value48 = load i256, ptr %stack_var_001, align 32
  store i256 %swap2_swap_value48, ptr %stack_var_003, align 32
  store i256 %swap2_top_value47, ptr %stack_var_001, align 32
  %argument_049 = load i256, ptr %stack_var_003, align 32
  %argument_150 = load i256, ptr %stack_var_002, align 32
  %subtraction_result = sub i256 %argument_049, %argument_150
  store i256 %subtraction_result, ptr %stack_var_002, align 32
  %swap1_top_value = load i256, ptr %stack_var_002, align 32
  %swap1_swap_value = load i256, ptr %stack_var_001, align 32
  store i256 %swap1_swap_value, ptr %stack_var_002, align 32
  store i256 %swap1_top_value, ptr %stack_var_001, align 32
  %argument_051 = load i256, ptr %stack_var_002, align 32
  %argument_152 = load i256, ptr %stack_var_001, align 32
  call void @__return(i256 %argument_051, i256 %argument_152, i256 0)
  unreachable

"block_rt_5/0":                                   ; preds = %"block_rt_3/0"
  store i256 0, ptr %stack_var_002, align 32
  store i256 42, ptr %stack_var_003, align 32
  %swap1_top_value53 = load i256, ptr %stack_var_003, align 32
  %swap1_swap_value54 = load i256, ptr %stack_var_002, align 32
  store i256 %swap1_swap_value54, ptr %stack_var_003, align 32
  store i256 %swap1_top_value53, ptr %stack_var_002, align 32
  %dup155 = load i256, ptr %stack_var_002, align 32
  store i256 %dup155, ptr %stack_var_003, align 32
  br label %"block_rt_6/0"

"block_rt_6/0":                                   ; preds = %"block_rt_5/0"
  %swap1_top_value56 = load i256, ptr %stack_var_002, align 32
  %swap1_swap_value57 = load i256, ptr %stack_var_001, align 32
  store i256 %swap1_swap_value57, ptr %stack_var_002, align 32
  store i256 %swap1_top_value56, ptr %stack_var_001, align 32
  br label %"block_rt_4/0"

conditional_dt_1_join_block:                      ; preds = %"block_dt_0/0"
  store i256 0, ptr %stack_var_001, align 32
  store i256 0, ptr %stack_var_002, align 32
  %argument_02 = load i256, ptr %stack_var_002, align 32
  %argument_13 = load i256, ptr %stack_var_001, align 32
  call void @__revert(i256 %argument_02, i256 %argument_13, i256 0)
  unreachable

conditional_rt_1_join_block:                      ; preds = %"block_rt_0/0"
  store i256 0, ptr %stack_var_001, align 32
  store i256 0, ptr %stack_var_002, align 32
  %argument_017 = load i256, ptr %stack_var_002, align 32
  %argument_118 = load i256, ptr %stack_var_001, align 32
  call void @__revert(i256 %argument_017, i256 %argument_118, i256 0)
  unreachable

conditional_rt_2_join_block:                      ; preds = %"block_rt_1/0"
  store i256 0, ptr %stack_var_000, align 32
  %argument_023 = load i256, ptr %stack_var_000, align 32
  %calldata_pointer24 = load ptr addrspace(3), ptr @ptr_calldata, align 32
  %calldata_pointer_with_offset = getelementptr i8, ptr addrspace(3) %calldata_pointer24, i256 %argument_023
  %calldata_value = load i256, ptr addrspace(3) %calldata_pointer_with_offset, align 32
  store i256 %calldata_value, ptr %stack_var_000, align 32
  store i256 224, ptr %stack_var_001, align 32
  %argument_025 = load i256, ptr %stack_var_001, align 32
  %argument_126 = load i256, ptr %stack_var_000, align 32
  %shr_call = call i256 @__shr(i256 %argument_025, i256 %argument_126)
  store i256 %shr_call, ptr %stack_var_000, align 32
  %dup127 = load i256, ptr %stack_var_000, align 32
  store i256 %dup127, ptr %stack_var_001, align 32
  store i256 3758009808, ptr %stack_var_002, align 32
  %argument_028 = load i256, ptr %stack_var_002, align 32
  %argument_129 = load i256, ptr %stack_var_001, align 32
  %comparison_result30 = icmp eq i256 %argument_028, %argument_129
  %comparison_result_extended31 = zext i1 %comparison_result30 to i256
  store i256 %comparison_result_extended31, ptr %stack_var_001, align 32
  store i256 3, ptr %stack_var_002, align 32
  %conditional_rt_3_condition = load i256, ptr %stack_var_001, align 32
  %conditional_rt_3_condition_compared = icmp ne i256 %conditional_rt_3_condition, 0
  br i1 %conditional_rt_3_condition_compared, label %"block_rt_3/0", label %conditional_rt_3_join_block

conditional_rt_3_join_block:                      ; preds = %conditional_rt_2_join_block
  store i256 2, ptr %stack_var_001, align 32
  br label %"block_rt_2/1"
}

attributes #0 = { cold noreturn nounwind }
attributes #1 = { nocallback nofree nounwind willreturn memory(argmem: readwrite) }
attributes #2 = { nocallback nofree nosync nounwind willreturn memory(none) }
attributes #3 = { nounwind willreturn memory(inaccessiblemem: readwrite) }
attributes #4 = { nounwind }
attributes #5 = { nounwind willreturn memory(none) }
attributes #6 = { nomerge nounwind willreturn memory(inaccessiblemem: readwrite) }
attributes #7 = { nofree null_pointer_is_valid }

Optimized LLVM IR

LLVM optimizes away the redundancy, resulting in the LLVM IR shown below.

; Function Attrs: nofree noreturn null_pointer_is_valid
define i256 @__entry(ptr addrspace(3) %0, i256 %1, i256 %2, i256 %3, i256 %4, i256 %5, i256 %6, i256 %7, i256 %8, i256 %9, i256 %10, i256 %11) local_unnamed_addr #1 personality ptr @__personality {
entry:
  %is_deploy_code_call_flag_truncated = and i256 %1, 1
  %is_deploy_code_call_flag.not = icmp eq i256 %is_deploy_code_call_flag_truncated, 0
  store i256 128, ptr addrspace(1) inttoptr (i256 64 to ptr addrspace(1)), align 64
  %get_u128_value.i.i4 = tail call i256 @llvm.eravm.getu128()
  br i1 %is_deploy_code_call_flag.not, label %runtime_code_call_block, label %deploy_code_call_block

deploy_code_call_block:                           ; preds = %entry
  %comparison_result.i.i = icmp eq i256 %get_u128_value.i.i4, 0
  br i1 %comparison_result.i.i, label %"block_dt_1/0.i.i", label %"block_rt_2/0.i.i"

"block_dt_1/0.i.i":                               ; preds = %deploy_code_call_block
  store i256 32, ptr addrspace(2) inttoptr (i256 256 to ptr addrspace(2)), align 256
  store i256 0, ptr addrspace(2) inttoptr (i256 288 to ptr addrspace(2)), align 32
  tail call void @llvm.eravm.return(i256 53919893334301279589334030174039261352344891250716429051063678533632)
  unreachable

"block_rt_2/0.i.i":                               ; preds = %runtime_code_call_block, %conditional_rt_2_join_block.i.i, %deploy_code_call_block
  tail call void @llvm.eravm.revert(i256 0)
  unreachable

runtime_code_call_block:                          ; preds = %entry
  %abi_pointer_value = ptrtoint ptr addrspace(3) %0 to i256
  %comparison_result.i.i5 = icmp ne i256 %get_u128_value.i.i4, 0
  %12 = and i256 %abi_pointer_value, 340282366604025813406317257057592410112
  %comparison_result21.i.i = icmp eq i256 %12, 0
  %or.cond.i = select i1 %comparison_result.i.i5, i1 true, i1 %comparison_result21.i.i
  br i1 %or.cond.i, label %"block_rt_2/0.i.i", label %conditional_rt_2_join_block.i.i

"block_rt_3/0.i.i":                               ; preds = %conditional_rt_2_join_block.i.i
  store i256 42, ptr addrspace(1) inttoptr (i256 128 to ptr addrspace(1)), align 128
  tail call void @llvm.eravm.return(i256 2535301202817642044428229017600)
  unreachable

conditional_rt_2_join_block.i.i:                  ; preds = %runtime_code_call_block
  %calldata_value.i.i = load i256, ptr addrspace(3) %0, align 32
  %shift_res.i.mask.i.i = and i256 %calldata_value.i.i, -26959946667150639794667015087019630673637144422540572481103610249216
  %comparison_result30.i.i = icmp eq i256 %shift_res.i.mask.i.i, -14476345239007179661737236217584162293203948892620596377535322027150077329408
  br i1 %comparison_result30.i.i, label %"block_rt_3/0.i.i", label %"block_rt_2/0.i.i"
}

; Function Attrs: noreturn nounwind
declare void @llvm.eravm.revert(i256) #2

; Function Attrs: noreturn nounwind
declare void @llvm.eravm.return(i256) #2

attributes #0 = { mustprogress nofree nosync nounwind willreturn memory(none) }
attributes #1 = { nofree noreturn null_pointer_is_valid }
attributes #2 = { noreturn nounwind }

EraVM Assembly

The optimized LLVM IR is then compiled into EraVM assembly, resulting in a bytecode size comparable to that produced via the Yul pipeline.

        .text
        .file   "test.sol:Example"
        .globl  __entry
__entry:
.func_begin0:
        add     128, r0, r3
        stm.h   64, r3
        ldvl    r3
        and!    1, r2, r0
        jump.ne @.BB0_1
        sub!    r3, r0, r0
        jump.ne @.BB0_2
        and!    code[@CPI0_1], r1, r0
        jump.eq @.BB0_2
        ldp     r1, r1
        and     code[@CPI0_2], r1, r1
        sub.s!  code[@CPI0_3], r1, r0
        jump.ne @.BB0_2
        add     42, r0, r1
        stm.h   128, r1
        add     code[@CPI0_4], r0, r1
        retl    @DEFAULT_FAR_RETURN
.BB0_1:
        sub!    r3, r0, r0
        jump.ne @.BB0_2
        add     32, r0, r1
        stm.ah  256, r1
        stm.ah  288, r0
        add     code[@CPI0_0], r0, r1
        retl    @DEFAULT_FAR_RETURN
.BB0_2:
        add     r0, r0, r1
        revl    @DEFAULT_FAR_REVERT
.func_end0:

        .rodata
CPI0_0:
        .cell   53919893334301279589334030174039261352344891250716429051063678533632
CPI0_1:
        .cell   340282366604025813406317257057592410112
CPI0_2:
        .cell   -26959946667150639794667015087019630673637144422540572481103610249216
CPI0_3:
        .cell   -14476345239007179661737236217584162293203948892620596377535322027150077329408
CPI0_4:
        .cell   2535301202817642044428229017600
        .text
DEFAULT_UNWIND:
        pncl    @DEFAULT_UNWIND
DEFAULT_FAR_RETURN:
        retl    @DEFAULT_FAR_RETURN
DEFAULT_FAR_REVERT:
        revl    @DEFAULT_FAR_REVERT

For comparison, the Yul pipeline of solc v0.8.28 generates the following EraVM assembly:

        .text
        .file   "test.sol:Example"
        .globl  __entry
__entry:
.func_begin0:
        add     128, r0, r3
        stm.h   64, r3
        and!    1, r2, r0
        jump.ne @.BB0_1
        and!    code[@CPI0_1], r1, r0
        jump.eq @.BB0_7
        ldp     r1, r1
        and     code[@CPI0_2], r1, r1
        sub.s!  code[@CPI0_3], r1, r0
        jump.ne @.BB0_7
        ldvl    r1
        sub!    r1, r0, r0
        jump.ne @.BB0_7
        add     42, r0, r1
        stm.h   128, r1
        add     code[@CPI0_4], r0, r1
        retl    @DEFAULT_FAR_RETURN
.BB0_1:
        ldvl    r1
        sub!    r1, r0, r0
        jump.ne @.BB0_7
        add     32, r0, r1
        stm.ah  256, r1
        stm.ah  288, r0
        add     code[@CPI0_0], r0, r1
        retl    @DEFAULT_FAR_RETURN
.BB0_7:
        add     r0, r0, r1
        revl    @DEFAULT_FAR_REVERT
.func_end0:

        .rodata
CPI0_0:
        .cell   53919893334301279589334030174039261352344891250716429051063678533632
CPI0_1:
        .cell   340282366604025813406317257057592410112
CPI0_2:
        .cell   -26959946667150639794667015087019630673637144422540572481103610249216
CPI0_3:
        .cell   -14476345239007179661737236217584162293203948892620596377535322027150077329408
CPI0_4:
        .cell   2535301202817642044428229017600
        .text
DEFAULT_UNWIND:
        pncl    @DEFAULT_UNWIND
DEFAULT_FAR_RETURN:
        retl    @DEFAULT_FAR_RETURN
DEFAULT_FAR_REVERT:
        revl    @DEFAULT_FAR_REVERT

System Contracts

Many EVM instructions require special handling by the System Contracts. The full detailed list of instructions that require special handling is provided at the EVM instructions reference.

There are several types of System Contracts from the perspective of how they are handled by zksolc:

  1. Environmental data storage.
  2. KECCAK256 hash function.
  3. Contract deployer.
  4. Ether value simulator.
  5. Simulator of immutables.
  6. Event handler.

Environmental Data Storage

Such storage contracts are accessed with static calls in order to retrieve values for the block, transaction, and other environmental entities: CHAINID, DIFFICULTY, BLOCKHASH, etc.

One good example of such contract is SystemContext that provides the majority of the environmental data.

Since EVM is not using external calls for these instructions, we must use the auxiliary heap for their calldata.

Steps to handle such instructions:

  1. Store the calldata for the System Contract call on the auxiliary heap.
  2. Call the System Contract with a static call.
  3. Check the return status code of the call.
  4. Revert or throw if the status code is zero.
  5. Read the ABI data and extract the result. All such System Contracts return a single 256-bit value.
  6. Return the value as the result of the EVM instruction.

KECCAK256 Hash Function

Handling of this function is similar to Environmental Data Storage with one key difference: because the EVM also uses heap memory to store the calldata for KECCAK256, the IR generator allocates the required memory chunk, so zksolc does not need to use the auxiliary heap.

Contract Deployer

See handling CREATE and dependency code substitution instructions on ZKsync Era documentation.

Ether Value Simulator

EraVM does not support passing Ether natively, so this feature is provided by a special System Contract called MsgValueSimulator.

An external call is redirected through this simulator if:

  1. The call is ordinary, that is neither static nor delegate.
  2. Its Ether value is non-zero.

Calls to the simulator require additional data passed via ABI using registers:

  1. Ether value.
  2. The address of the contract to call.
  3. The system call bit, set only when redirecting a call to the ContractDeployer, that is, when CREATE or CREATE2 is called with non-zero Ether.

To pass Ether in EraVM, the compiler uses:

  1. The special 128-bit register context_u128 which is a part of the EraVM transient state.
  2. An immutable value of context_u128 captured in the stack frame at the moment of the call.

Details on setting and capturing this value are covered in the Context Register of the EraVM specification.

Simulator of Immutables

Refer to the handling immutables documentation in ZKsync Era.

Event Handler

Event payloads are sent to a special System Contract called EventWriter. As with EVM, the payload consists of topics and data:

  1. Topics with a length prefix are passed via ABI using registers.
  2. Data is passed through the default heap, just like in EVM.

Auxiliary Heap

zksolc works on the IR level. Because of this, they cannot directly manage the heap memory allocator; that responsibility remains with the high-level source code compilers that emit IRs.

However, there are scenarios in which EraVM must allocate memory on the heap while EVM does not, leading to the introduction of the auxiliary heap. The auxiliary heap is used for:

  1. Returning immutables from the constructor.
  2. Allocating calldata and return data for calls to System Contracts.

While the ordinary heap contains calldata and return data for calls to user contracts, the auxiliary heap holds calldata and return data for calls to System Contracts. This preserves EVM compatibility by preventing System Contract calls from affecting calldata or return data, thereby avoiding conflicts with the heap layout that contract developers expect.

For more details on heaps, refer to the EraVM specification, which describes types of heaps, their connections to stack frames and memory growth, and their role in contract-to-contract communication.

Exception Handling

This page highlights specific nuances of exception handling (EH) in the EraVM architecture.

In essence, EraVM uses two EH mechanisms: contract-level and function-level. The former is inherited from the EVM architecture, while the latter aligns more closely with general-purpose languages.

Contract LevelFunction Level
Yul Examplerevert(0, 0)verbatim("throw")
Native toEVMGeneral-purpose languages
Handled byEraVMCompiler
Caught byCaller contractCaller function
EfficiencyHighLow

Contract Level

This type of exception is inherited from the EVM architecture. In EVM, instructions like REVERT and INVALID immediately terminate the contract’s execution and return control to the callee. It is impossible to catch them within the contract; only the callee can detect them by checking the call status code.

// callee
revert(0, 0)

// caller
let success = call(...)
if iszero(success) {
    // option 1: rethrow on the contract level
    returndatacopy(...)
    revert(...)

    // option 2: rethrow on the function level
    verbatim("throw") // only available in the Yul mode
}

EraVM’s behavior is fully equivalent: the VM unwinds the call stack all the way to the contract’s top-level function frame, leaving no possibility to intercept or handle the exception along the way.

These types of exceptions are more efficient, as you can revert at any point of the execution without propagating the exception all the way up.

Implementation

In EraVM, contracts invoke one another via the far_call instruction, which includes the exception handler’s address among its arguments.

Function Level

This type of exception handling is common in general-purpose languages such as C++. As a result, it integrates naturally into LLVM, even though it is not supported by the smart contract languages our compilers handle. This is also why the two EH mechanisms are treated separately and do not interact within high-level code.

In general-purpose languages, a range of EH operators (e.g. try , throw, and catch) typically indicates which code sections can throw exceptions and how they should be handled. These tools are absent in Solidity and its EVM Yul dialect, so we introduced extensions to the EraVM Yul dialect supported by zksolc.

If the contract does not define an EH function named ZKSYNC_CATCH_NEAR_CALL, there is no need to generate catch blocks. Panics will simply propagate to the callee contract by EraVM without any extra overhead.

Several constraints arise from Yul’s structure and the nature of smart contracts:

  1. Any function beginning with ZKSYNC_NEAR_CALL is implicitly wrapped with try. If there is an exception handler defined, the following will happen:
    • A panic will be caught by the caller of such function.
    • Control then transfers to the EH function.
    • After the EH function finishes, control returns to the caller of ZKSYNC_NEAR_CALL.
  2. Every operation can be considered throw.
    • Any instruction may panic due to out-of-gas, so all instructions can potentially throw.
    • This reduces optimization opportunities.
  3. The catch block is represented by the ZKSYNC_CATCH_NEAR_CALL function in Yul.
    • A panic in ZKSYNC_NEAR_CALL makes its caller catch the exception and call the EH function.
    • Once the EH function completes, control returns to the caller of ZKSYNC_NEAR_CALL.
  4. Only one EH function is allowed, and it must be named ZKSYNC_CATCH_NEAR_CALL.
    • This approach is not very efficient because every function must include an LLVM IR catch block to capture and propagate exceptions to the EH function.
// Follow the numbers for the order of execution. The call order is:
//     1. caller
//     2. ZKSYNC_NEAR_CALL_callee
//     3. callee_even_deeper
//     4. ZKSYNC_CATCH_NEAR_CALL
//     5. caller

function ZKSYNC_NEAR_CALL_callee() -> value {    // 03
    value := callee_even_deeper()                // 04
}

function callee_even_deeper() -> value {         // 05
    verbatim("throw")                            // 06
}

// Each LLVM IR function automatically includes an implicit 'catch' block,
// which performs the following actions:
//     1. If a return value is expected, keep it zero-initialized ('zero').
//     2. Call the EH function ('ZKSYNC_CATCH_NEAR_CALL').
//     3. Resume execution with the next instruction (e.g., 'value := 42').
function caller() -> value {                      // 01
    let zero := ZKSYNC_NEAR_CALL_callee()         // 02
    value := 42                                   // 09
}

// This handler can also revert execution. Reverts in EH functions cannot be caught,
// so they immediately terminate the execution and return control to the callee contract.
function ZKSYNC_CATCH_NEAR_CALL() {               // 07
    log0(...)                                     // 08
}

Reference

In this specification, instructions are categorized according to their relevance to the EVM instruction set:

Most native EVM instructions are represented in both Yul and EVM assembly IRs. When they are not, this is explicitly noted in the instruction description.

Native EVM Instructions

EVM instructions are grouped into categories based on the official reference:

EraVM Assembly

The assembly generated for LLVM standard library functions depends on available optimizations, which vary by version. If you do not see an assembly example for a particular instruction, try compiling a reproducing contract using the latest zksolc.

For a comprehensive list of instructions, see the EraVM specification, which provides them in its table of contents.

Arithmetic

ADD

Original EVM instruction.

LLVM IR

%addition_result = add i256 %value1, %value2

LLVM IR instruction documentation

EraVM Assembly

add     r1, r2, r1

For more detail, see the EraVM specification reference

MUL

Original EVM instruction.

Differences from EVM

  1. The carry is written to the 2nd output register

LLVM IR

%multiplication_result = mul i256 %value1, %value2

EraVM can output the carry of the multiplication operation. In this case, the result is a tuple of two values: the multiplication result and the carry. The carry is written to the 2nd output register. The snippet below returns the carry value.

%value1_extended = zext i256 %value1 to i512
%value2_extended = zext i256 %value2 to i512
%result_extended = mul nuw i512 %value1_extended, %value2_extended
%result_shifted = lshr i512 %result_extended, 256
%result = trunc i512 %result_shifted to i256

LLVM IR instruction documentation

EraVM Assembly

mul     r1, r2, r1, r2

For more detail, see the EraVM specification reference

SUB

Original EVM instruction.

LLVM IR

%subtraction_result = sub i256 %value1, %value2

LLVM IR instruction documentation

EraVM Assembly

sub     r1, r2, r1

For more detail, see the EraVM specification reference

DIV

Original EVM instruction.

Differences from EVM

  1. The remainder is written to the 2nd output register

LLVM IR

define i256 @__div(i256 %arg1, i256 %arg2) #0 {
entry:
  %is_divider_zero = icmp eq i256 %arg2, 0
  br i1 %is_divider_zero, label %return, label %division

division:
  %div_res = udiv i256 %arg1, %arg2
  br label %return

return:
  %res = phi i256 [ 0, %entry ], [ %div_res, %division ]
  ret i256 %res
}

LLVM IR instruction documentation

For more detail, see the EraVM specification reference

SDIV

Original EVM instruction.

LLVM IR

define i256 @__sdiv(i256 %arg1, i256 %arg2) #0 {
entry:
  %is_divider_zero = icmp eq i256 %arg2, 0
  br i1 %is_divider_zero, label %return, label %division_overflow

division_overflow:
  %is_divided_int_min = icmp eq i256 %arg1, -57896044618658097711785492504343953926634992332820282019728792003956564819968
  %is_minus_one = icmp eq i256 %arg2, -1
  %is_overflow = and i1 %is_divided_int_min, %is_minus_one
  br i1 %is_overflow, label %return, label %division

division:
  %div_res = sdiv i256 %arg1, %arg2
  br label %return

return:
  %res = phi i256 [ 0, %entry ], [ %arg1, %division_overflow ], [ %div_res, %division ]
  ret i256 %res
}

LLVM IR instruction documentation

EraVM does not have a similar instruction.

MOD

Original EVM instruction.

Differences from EVM

  1. The remainder is written to the 2nd output register

LLVM IR

define i256 @__mod(i256 %arg1, i256 %arg2) #0 {
entry:
  %is_divider_zero = icmp eq i256 %arg2, 0
  br i1 %is_divider_zero, label %return, label %remainder

remainder:
  %rem_res = urem i256 %arg1, %arg2
  br label %return

return:
  %res = phi i256 [ 0, %entry ], [ %rem_res, %remainder ]
  ret i256 %res
}

LLVM IR instruction documentation

For more detail, see the EraVM specification reference

SMOD

Original EVM instruction.

LLVM IR

define i256 @__smod(i256 %arg1, i256 %arg2) #0 {
entry:
  %is_divider_zero = icmp eq i256 %arg2, 0
  br i1 %is_divider_zero, label %return, label %division_overflow

division_overflow:
  %is_divided_int_min = icmp eq i256 %arg1, -57896044618658097711785492504343953926634992332820282019728792003956564819968
  %is_minus_one = icmp eq i256 %arg2, -1
  %is_overflow = and i1 %is_divided_int_min, %is_minus_one
  br i1 %is_overflow, label %return, label %remainder

remainder:
  %rem_res = srem i256 %arg1, %arg2
  br label %return

return:
  %res = phi i256 [ 0, %entry ], [ 0, %division_overflow ], [ %rem_res, %remainder ]
  ret i256 %res
}

LLVM IR instruction documentation

EraVM does not have a similar instruction.

ADDMOD

Original EVM instruction.

LLVM IR

define i256 @__addmod(i256 %arg1, i256 %arg2, i256 %modulo) #0 {
entry:
  %is_zero = icmp eq i256 %modulo, 0
  br i1 %is_zero, label %return, label %addmod

addmod:
  %arg1m = urem i256 %arg1, %modulo
  %arg2m = urem i256 %arg2, %modulo
  %res = call {i256, i1} @llvm.uadd.with.overflow.i256(i256 %arg1m, i256 %arg2m)
  %sum = extractvalue {i256, i1} %res, 0
  %obit = extractvalue {i256, i1} %res, 1
  %sum.mod = urem i256 %sum, %modulo
  br i1 %obit, label %overflow, label %return

overflow:
  %mod.inv = xor i256 %modulo, -1
  %sum1 = add i256 %sum, %mod.inv
  %sum.ovf = add i256 %sum1, 1
  br label %return

return:
  %value = phi i256 [0, %entry], [%sum.mod, %addmod], [%sum.ovf, %overflow]
  ret i256 %value
}

EraVM does not have a similar instruction.

MULMOD

Original EVM instruction.

LLVM IR

define i256 @__mulmod(i256 %arg1, i256 %arg2, i256 %modulo) #0 {
entry:
  %cccond = icmp eq i256 %modulo, 0
  br i1 %cccond, label %ccret, label %entrycont

ccret:
  ret i256 0

entrycont:
  %arg1m = urem i256 %arg1, %modulo
  %arg2m = urem i256 %arg2, %modulo
  %less_then_2_128 = icmp ult i256 %modulo, 340282366920938463463374607431768211456
  br i1 %less_then_2_128, label %fast, label %slow

fast:
  %prod = mul i256 %arg1m, %arg2m
  %prodm = urem i256 %prod, %modulo
  ret i256 %prodm

slow:
  %arg1e = zext i256 %arg1m to i512
  %arg2e = zext i256 %arg2m to i512
  %prode = mul i512 %arg1e, %arg2e
  %prodl = trunc i512 %prode to i256
  %prodeh = lshr i512 %prode, 256
  %prodh = trunc i512 %prodeh to i256
  %res = call i256 @__ulongrem(i256 %prodl, i256 %prodh, i256 %modulo)
  ret i256 %res
}

EraVM does not have a similar instruction.

EXP

Original EVM instruction.

LLVM IR

define i256 @__exp(i256 %value, i256 %exp) "noinline-oz" #0 {
entry:
  %exp_is_non_zero = icmp eq i256 %exp, 0
  br i1 %exp_is_non_zero, label %return, label %exponent_loop_body

return:
  %exp_res = phi i256 [ 1, %entry ], [ %exp_res.1, %exponent_loop_body ]
  ret i256 %exp_res

exponent_loop_body:
  %exp_res.2 = phi i256 [ %exp_res.1, %exponent_loop_body ], [ 1, %entry ]
  %exp_val = phi i256 [ %exp_val_halved, %exponent_loop_body ], [ %exp, %entry ]
  %val_squared.1 = phi i256 [ %val_squared, %exponent_loop_body ], [ %value, %entry ]
  %odd_test = and i256 %exp_val, 1
  %is_exp_odd = icmp eq i256 %odd_test, 0
  %exp_res.1.interm = select i1 %is_exp_odd, i256 1, i256 %val_squared.1
  %exp_res.1 = mul i256 %exp_res.1.interm, %exp_res.2
  %val_squared = mul i256 %val_squared.1, %val_squared.1
  %exp_val_halved = lshr i256 %exp_val, 1
  %exp_val_is_less_2 = icmp ult i256 %exp_val, 2
  br i1 %exp_val_is_less_2, label %return, label %exponent_loop_body
}

EraVM does not have a similar instruction.

SIGNEXTEND

Original EVM instruction.

LLVM IR

define i256 @__signextend(i256 %numbyte, i256 %value) #0 {
entry:
  %is_overflow = icmp uge i256 %numbyte, 31
  br i1 %is_overflow, label %return, label %signextend

signextend:
  %numbit_byte = mul nuw nsw i256 %numbyte, 8
  %numbit = add nsw nuw i256 %numbit_byte, 7
  %numbit_inv = sub i256 256, %numbit
  %signmask = shl i256 1, %numbit
  %valmask = lshr i256 -1, %numbit_inv
  %ext1 = shl i256 -1, %numbit
  %signv = and i256 %signmask, %value
  %sign = icmp ne i256 %signv, 0
  %valclean = and i256 %value, %valmask
  %sext = select i1 %sign, i256 %ext1, i256 0
  %result = or i256 %sext, %valclean
  br label %return

return:
  %signext_res = phi i256 [%value, %entry], [%result, %signextend]
  ret i256 %signext_res
}

EraVM does not have a similar instruction.

Bitwise

AND

Original EVM instruction.

LLVM IR

%and_result = and i256 %value1, %value2

LLVM IR instruction documentation

EraVM Assembly

ptr.add stack[@ptr_calldata], r0, r1
ptr.add.s       36, r1, r2
ld      r2, r2
ptr.add.s       4, r1, r1
ld      r1, r1
and     r1, r2, r1
st.1    128, r1

EraVM instruction: and

OR

Original EVM instruction.

LLVM IR

%or_result = or i256 %value1, %value2

LLVM IR instruction documentation

EraVM Assembly

ptr.add stack[@ptr_calldata], r0, r1
ptr.add.s       36, r1, r2
ld      r2, r2
ptr.add.s       4, r1, r1
ld      r1, r1
or      r1, r2, r1
st.1    128, r1

EraVM instruction: or

XOR

Original EVM instruction.

LLVM IR

%xor_result = or i256 %value1, %value2

LLVM IR instruction documentation

EraVM Assembly

ptr.add stack[@ptr_calldata], r0, r1
ptr.add.s       36, r1, r2
ld      r2, r2
ptr.add.s       4, r1, r1
ld      r1, r1
xor     r1, r2, r1
st.1    128, r1

EraVM instruction: xor

NOT

Original EVM instruction.

LLVM IR

%xor_result = xor i256 %value, -1

EraVM Assembly

ptr.add stack[@ptr_calldata], r1, r1
ld      r1, r1
sub.s   1, r0, r2
xor     r1, r2, r1
st.1    128, r1

EraVM instruction: xor

BYTE

Original EVM instruction.

LLVM IR

define i256 @__byte(i256 %index, i256 %value) #0 {
entry:
  %is_overflow = icmp ugt i256 %index, 31
  br i1 %is_overflow, label %return, label %extract_byte

extract_byte:
  %bits_offset = shl i256 %index, 3
  %value_shifted_left = shl i256 %value, %bits_offset
  %value_shifted_right = lshr i256 %value_shifted_left, 248
  br label %return

return:
  %res = phi i256 [ 0, %entry ], [ %value_shifted_right, %extract_byte ]
  ret i256 %res
}

SHL

Original EVM instruction.

LLVM IR

define i256 @__shl(i256 %shift, i256 %value) #0 {
entry:
  %is_overflow = icmp ugt i256 %shift, 255
  br i1 %is_overflow, label %return, label %shift_value

shift_value:
  %shift_res = shl i256 %value, %shift
  br label %return

return:
  %res = phi i256 [ 0, %entry ], [ %shift_res, %shift_value ]
  ret i256 %res
}

SHR

Original EVM instruction.

LLVM IR

define i256 @__shr(i256 %shift, i256 %value) #0 {
entry:
  %is_overflow = icmp ugt i256 %shift, 255
  br i1 %is_overflow, label %return, label %shift_value

shift_value:
  %shift_res = lshr i256 %value, %shift
  br label %return

return:
  %res = phi i256 [ 0, %entry ], [ %shift_res, %shift_value ]
  ret i256 %res
}

EraVM instruction: xor

SAR

Original EVM instruction.

LLVM IR

define i256 @__sar(i256 %shift, i256 %value) #0 {
entry:
  %is_overflow = icmp ugt i256 %shift, 255
  br i1 %is_overflow, label %arith_overflow, label %shift_value

arith_overflow:
  %is_val_positive = icmp sge i256 %value, 0
  %res_overflow = select i1 %is_val_positive, i256 0, i256 -1
  br label %return

shift_value:
  %shift_res = ashr i256 %value, %shift
  br label %return

return:
  %res = phi i256 [ %res_overflow, %arith_overflow ], [ %shift_res, %shift_value ]
  ret i256 %res
}

Block

This information is requested from a System Contract called SystemContext.

On how the contract is called, see the relevant section.

BLOCKHASH

Original EVM instruction.

COINBASE

Original EVM instruction.

TIMESTAMP

Original EVM instruction.

NUMBER

Original EVM instruction.

PREVRANDAO

Original EVM instruction.

DIFFICULTY

Original EVM

GASLIMIT

Original EVM instruction.

CHAINID

Original EVM instruction.

SELFBALANCE

Original EVM instruction.

Implemented as BALANCE with an ADDRESS as its argument.

BASEFEE

Original EVM instruction.

Calls

All EVM call instructions follow a similar handling approach. The call type is encoded at the assembly level, so we will focus on the common workflow and note any differences where they arise.

For more information, see the ZKsync Era documentation.

CALL

Original EVM instruction.

The code checks whether the call is non-static and whether the Ether value is non-zero. If both conditions are met, the call is redirected to the MsgValueSimulator.

DELEGATECALL

Original EVM instruction.

EraVM instruction: far_call

STATICCALL

Original EVM instruction.

EraVM instruction: far_call

CREATE

The EVM CREATE instructions are handled similarly.

For more information, see the ZKsync Era documentation.

CREATE

Original EVM instruction.

CREATE2

Original EVM instruction.

Environment

This information is requested a System Contract called SystemContext.

On how the contract is called, see the relevant section.

ADDRESS

Original EVM instruction.

This value is fetched with a native EraVM instruction: context.this.

BALANCE

Original EVM instruction.

ORIGIN

Original EVM instruction.

CALLER

Original EVM instruction.

This value is fetched with a native EraVM instruction: context.caller.

CALLVALUE

Original EVM instruction.

This value is fetched with a native EraVM instruction: context.get_context_u128.

CALLDATALOAD

Original EVM instruction.

Calldata is accessed using a generic memory access instruction, but the memory chunk itself references the caller’s heap. A “fat pointer” to the parent contract is passed via ABI through registers.

LLVM IR

@ptr_calldata = private unnamed_addr global ptr addrspace(3) null                   ; global variable declaration
...
store ptr addrspace(3) %0, ptr @ptr_calldata, align 32                              ; saving the pointer from `r1` to the global variable
...
%calldata_pointer = load ptr addrspace(3), ptr @ptr_calldata, align 32              ; loading the pointer from the global variable to `calldata_pointer`
%calldata_value = load i256, ptr addrspace(3) %calldata_pointer, align 32           ; loading the value from the calldata pointer

EraVM Assembly

ptr.add r1, r0, stack[@ptr_calldata]                                                ; saving the pointer from `r1` to the global variable
...
ptr.add stack[@ptr_calldata], r0, r1                                                ; loading the pointer from the global variable to `r1`
ld      r1, r1                                                                      ; loading the value to `r1`

CALLDATASIZE

Original EVM instruction.

The calldata size is stored in the fat pointer from the parent contract (see CALLDATALOAD), and it can be extracted using bitwise operations, as demonstrated below.

LLVM IR

@calldatasize = private unnamed_addr global i256 0                                  ; global variable declaration
...
%abi_pointer_value = ptrtoint ptr addrspace(3) %0 to i256                           ; converting the pointer to an integer
%abi_pointer_value_shifted = lshr i256 %abi_pointer_value, 96                       ; shifting the integer right 96 bits
%abi_length_value = and i256 %abi_pointer_value_shifted, 4294967295                 ; keeping the lowest 32 bits of the integer
store i256 %abi_length_value, ptr @calldatasize, align 32                           ; saving the value to the global variable

EraVM Assembly

ptr.add r1, r0, stack[@ptr_calldata]                                                ; saving the pointer from `r1` to the global variable
shr.s   96, r1, r1                                                                  ; shifting the integer right 96 bits
and     @CPI0_0[0], r1, stack[@calldatasize]                                        ; keeping the lowest 32 bits of the integer, saving the value to the global variable
...
CPI0_0:
    .cell 4294967295

CALLDATACOPY

Original EVM instruction.

Unlike on EVM, EraVM employs a simple loop over memory operations on 256-bit values.

LLVM IR

; loading the pointer from the global variable to `calldata_pointer`
%calldata_pointer = load ptr addrspace(3), ptr @ptr_calldata, align 32
; shifting the pointer by 122 bytes
%calldata_source_pointer = getelementptr i8, ptr addrspace(3) %calldata_pointer, i256 122
; copying 64 bytes from calldata at offset 122 to the heap at offset 128
call void @llvm.memcpy.p1.p3.i256(ptr addrspace(1) align 1 inttoptr (i256 128 to ptr addrspace(1)), ptr addrspace(3) align 1 %calldata_source_pointer, i256 64, i1 false)

EraVM Assembly

.BB0_3:
    shl.s   5, r2, r3           ; shifting the offset by 32
    ptr.add r1, r3, r4          ; adding the offset to the calldata pointer
    ld      r4, r4              ; reading the calldata value
    add     128, r3, r3         ; adding the offset to the heap pointer
    st.1    r3, r4              ; writing the calldata value to the heap
    add     1, r2, r2           ; incrementing the offset
    sub.s!  2, r2, r3           ; checking the bounds
    jump.lt @.BB0_3             ; loop continuation branching

CODECOPY

Original EVM instruction.

See the EraVM docs.

CODESIZE

Original EVM instruction.

See the EraVM docs.

GASPRICE

Original EVM instruction.

EXTCODESIZE

Original EVM instruction.

EXTCODECOPY

Original EVM instruction.

Not supported. Triggers a compile-time error.

RETURNDATASIZE

Original EVM instruction.

Similarly to CALLDATASIZE, return data size is read from the fat pointer that the child contract returns. It can also be extracted with bitwise operations.

LLVM IR

%contract_call_external = tail call { ptr addrspace(3), i1 } @__farcall(i256 0, i256 0, i256 undef, i256 undef, i256 undef, i256 undef, i256 undef, i256 undef, i256 undef, i256 undef, i256 undef, i256 undef)
%contract_call_external_result_abi_data = extractvalue { ptr addrspace(3), i1 } %contract_call_external, 0
%contract_call_memcpy_from_child_pointer_casted = ptrtoint ptr addrspace(3) %contract_call_external_result_abi_data to i256
%contract_call_memcpy_from_child_return_data_size_shifted = lshr i256 %contract_call_memcpy_from_child_pointer_casted, 96
%contract_call_memcpy_from_child_return_data_size_truncated = and i256 %contract_call_memcpy_from_child_return_data_size_shifted, 4294967295

EraVM Assembly

near_call       r0, @__farcall, @DEFAULT_UNWIND                 ; calling a child contract
shr.s   96, r1, r1                                              ; shifting the pointer value right 96 bits
and     @CPI0_1[0], r1, r1                                      ; keeping the lowest 32 bits of the pointer value
...
CPI0_1:
    .cell 4294967295

EraVM instruction: call

RETURNDATACOPY

Original EVM instruction.

Unlike on EVM, EraVM employs a simple loop over memory operations on 256-bit values.

LLVM IR

; loading the pointer from the global variable to `return_data_pointer`
%return_data_pointer = load ptr addrspace(3), ptr @ptr_return_data, align 32
; shifting the pointer by 122 bytes
%return_data_source_pointer = getelementptr i8, ptr addrspace(3) %return_data_pointer, i256 122
; copying 64 bytes from return data at offset 122 to the heap at offset 128
call void @llvm.memcpy.p1.p3.i256(ptr addrspace(1) align 1 inttoptr (i256 128 to ptr addrspace(1)), ptr addrspace(3) align 1 %return_data_source_pointer, i256 64, i1 false)

EraVM Assembly

.BB0_3:
    shl.s   5, r2, r3           ; shifting the offset by 32
    ptr.add r1, r3, r4          ; adding the offset to the return data pointer
    ld      r4, r4              ; reading the return data value
    add     128, r3, r3         ; adding the offset to the heap pointer
    st.1    r3, r4              ; writing the return data value to the heap
    add     1, r2, r2           ; incrementing the offset
    sub.s!  2, r2, r3           ; checking the bounds
    jump.lt @.BB0_3             ; loop continuation branching

EXTCODEHASH

Original EVM instruction.

Logging

Events

The EraVM event instructions operate at a lower level. Each LOG-like instruction is expanded into a loop, with each iteration writing two 256-bit words in the following order:

  1. The initializer cell, which describes the number of indexed words (e.g. I) and the size of non-indexed data in bytes (e.g. D).
  2. I indexed 32-byte words.
  3. D bytes of data.

If only one word remains to be written, the second input is zero.

For a detailed reference, see EraVM instruction: log.event

LOG0 - LOG4

LOG0 - LOG4

System Contract

This information is requested a System Contract called EventWriter.

On how the contract is called, see the relevant section.

Logical

LT

Original EVM instruction.

LLVM IR

%comparison_result = icmp ult i256 %value1, %value2
%comparison_result_extended = zext i1 %comparison_result to i256

LLVM IR instruction documentation

EraVM Assembly

ptr.add stack[@ptr_calldata], r0, r1
ptr.add.s       36, r1, r2
ld      r2, r2
ptr.add.s       4, r1, r1
ld      r1, r1
sub!    r1, r2, r1
add     0, r0, r1
add.lt  1, r0, r1
st.1    128, r1

GT

Original EVM instruction.

LLVM IR

%comparison_result = icmp ugt i256 %value1, %value2
%comparison_result_extended = zext i1 %comparison_result to i256

LLVM IR instruction documentation

EraVM Assembly

ptr.add stack[@ptr_calldata], r0, r1
ptr.add.s       36, r1, r2
ld      r2, r2
ptr.add.s       4, r1, r1
ld      r1, r1
sub!    r1, r2, r1
add     0, r0, r1
add.gt  1, r0, r1
st.1    128, r1

SLT

Original EVM instruction.

LLVM IR

%comparison_result = icmp slt i256 %value1, %value2
%comparison_result_extended = zext i1 %comparison_result to i256

LLVM IR instruction documentation

EraVM Assembly

ptr.add stack[@ptr_calldata], r0, r1
ptr.add.s       36, r1, r2
ld      r2, r2
ptr.add.s       4, r1, r1
ld      r1, r1
add     @CPI0_4[0], r0, r3
sub!    r1, r2, r4
add     r0, r0, r4
add.lt  r3, r0, r4
and     @CPI0_4[0], r2, r2
and     @CPI0_4[0], r1, r1
sub!    r1, r2, r5
add.le  r0, r0, r3
xor     r1, r2, r1
sub.s!  @CPI0_4[0], r1, r1
add     r4, r0, r1
add.eq  r3, r0, r1
sub!    r1, r0, r1
add     0, r0, r1
add.ne  1, r0, r1
st.1    128, r1

SGT

Original EVM instruction.

LLVM IR

%comparison_result = icmp sgt i256 %value1, %value2
%comparison_result_extended = zext i1 %comparison_result to i256

LLVM IR instruction documentation

EraVM Assembly

ptr.add stack[@ptr_calldata], r0, r1
ptr.add.s       36, r1, r2
ld      r2, r2
ptr.add.s       4, r1, r1
ld      r1, r1
add     @CPI0_4[0], r0, r3
sub!    r1, r2, r4
add     r0, r0, r4
add.gt  r3, r0, r4
and     @CPI0_4[0], r2, r2
and     @CPI0_4[0], r1, r1
sub!    r1, r2, r5
add.ge  r0, r0, r3
xor     r1, r2, r1
sub.s!  @CPI0_4[0], r1, r1
add     r4, r0, r1
add.eq  r3, r0, r1
sub!    r1, r0, r1
add     0, r0, r1
add.ne  1, r0, r1
st.1    128, r1

EQ

Original EVM instruction.

LLVM IR

%comparison_result = icmp eq i256 %value1, %value2
%comparison_result_extended = zext i1 %comparison_result to i256

LLVM IR instruction documentation

EraVM Assembly

ptr.add stack[@ptr_calldata], r0, r1
ptr.add.s       36, r1, r2
ld      r2, r2
ptr.add.s       4, r1, r1
ld      r1, r1
sub!    r1, r2, r1
add     0, r0, r1
add.eq  1, r0, r1
st.1    128, r1

ISZERO

Original EVM instruction.

LLVM IR

%comparison_result = icmp eq i256 %value, 0
%comparison_result_extended = zext i1 %comparison_result to i256

LLVM IR instruction documentation

EraVM Assembly

ptr.add stack[@ptr_calldata], r1, r1
ld      r1, r1
sub!    r1, r0, r1
add     0, r0, r1
add.eq  1, r0, r1
st.1    128, r1

Memory

MLOAD

Original EVM instruction.

Heap memory load operation is modeled with a native EraVM instruction.

LLVM IR

%value = load i256, ptr addrspace(1) %pointer, align 1

LLVM IR instruction documentation

EraVM Assembly

ld.1    r1, r2

See EraVM instruction: st.1

MSTORE

Original EVM instruction.

Heap memory load operation is modeled with a native EraVM instruction.

LLVM IR

store i256 128, ptr addrspace(1) inttoptr (i256 64 to ptr addrspace(1)), align 1

LLVM IR instruction documentation

EraVM Assembly

st.1    r1, r2

See EraVM instruction: st.1

MSTORE8

Original EVM instruction.

LLVM IR

define void @__mstore8(i256 addrspace(1)* nocapture nofree noundef dereferenceable(32) %addr, i256 %val) #2 {
entry:
  %orig_value = load i256, i256 addrspace(1)* %addr, align 1
  %orig_value_shifted_left = shl i256 %orig_value, 8
  %orig_value_shifted_right = lshr i256 %orig_value_shifted_left, 8
  %byte_value_shifted = shl i256 %val, 248
  %store_result = or i256 %orig_value_shifted_right, %byte_value_shifted
  store i256 %store_result, i256 addrspace(1)* %addr, align 1
  ret void
}

Return

STOP

Original EVM instruction.

This instruction is a RETURN with an empty data payload.

LLVM IR

The same as for RETURN.

RETURN

Original EVM instruction.

This instruction works differently in deploy code. For more information, see the ZKsync Era documentation.

LLVM IR

define void @__return(i256 %0, i256 %1, i256 %2) "noinline-oz" #5 personality i32()* @__personality {
entry:
  %abi = call i256@__aux_pack_abi(i256 %0, i256 %1, i256 %2)
  tail call void @llvm.syncvm.return(i256 %abi)
  unreachable
}

REVERT

Original EVM instruction.

LLVM IR

define void @__revert(i256 %0, i256 %1, i256 %2) "noinline-oz" #5 personality i32()* @__personality {
entry:
  %abi = call i256@__aux_pack_abi(i256 %0, i256 %1, i256 %2)
  tail call void @llvm.syncvm.revert(i256 %abi)
  unreachable
}

EraVM

See also EraVM instruction revert: when returning from near calls and when returning from far calls.

INVALID

Original EVM instruction.

This instruction is a REVERT with an empty data payload, but it also burns all available gas.

LLVM IR

The same as for REVERT.

EraVM

See also EraVM instruction revert: when returning from near calls and when returning from far calls.

SHA3

Original EVM instruction.

LLVM IR

define i256 @__sha3(i8 addrspace(1)* nocapture nofree noundef %0, i256 %1, i1 %throw_at_failure) "noinline-oz" #1 personality i32()* @__personality {
entry:
  %addr_int = ptrtoint i8 addrspace(1)* %0 to i256
  %2 = tail call i256 @llvm.umin.i256(i256 %addr_int, i256 4294967295)
  %3 = tail call i256 @llvm.umin.i256(i256 %1, i256 4294967295)
  %gas_left = tail call i256 @llvm.syncvm.gasleft()
  %4 = tail call i256 @llvm.umin.i256(i256 %gas_left, i256 4294967295)
  %abi_data_input_offset_shifted = shl nuw nsw i256 %2, 64
  %abi_data_input_length_shifted = shl nuw nsw i256 %3, 96
  %abi_data_gas_shifted = shl nuw nsw i256 %4, 192
  %abi_data_offset_and_length = add i256 %abi_data_input_length_shifted, %abi_data_input_offset_shifted
  %abi_data_add_gas = add i256 %abi_data_gas_shifted, %abi_data_offset_and_length
  %abi_data_add_system_call_marker = add i256 %abi_data_add_gas, 904625697166532776746648320380374280103671755200316906558262375061821325312
  %call_external = tail call { i8 addrspace(3)*, i1 } @__staticcall(i256 %abi_data_add_system_call_marker, i256 32784, i256 undef, i256 undef, i256 undef, i256 undef, i256 undef, i256 undef, i256 undef, i256 undef, i256 undef, i256 undef)
  %status_code = extractvalue { i8 addrspace(3)*, i1 } %call_external, 1
  br i1 %status_code, label %success_block, label %failure_block

success_block:
  %abi_data_pointer = extractvalue { i8 addrspace(3)*, i1 } %call_external, 0
  %data_pointer = bitcast i8 addrspace(3)* %abi_data_pointer to i256 addrspace(3)*
  %keccak256_child_data = load i256, i256 addrspace(3)* %data_pointer, align 1
  ret i256 %keccak256_child_data

failure_block:
  br i1 %throw_at_failure, label %throw_block, label %revert_block

revert_block:
  call void @__revert(i256 0, i256 0, i256 0)
  unreachable

throw_block:
  call void @__cxa_throw(i8* noalias nocapture nofree align 32 null, i8* noalias nocapture nofree align 32 undef, i8* noalias nocapture nofree align 32 undef)
  unreachable
}

Stack

POP

Original EVM instruction.

In Yul, it is only used for marking unused values, and is omitted in LLVM IR.

pop(staticcall(gas(), address(), 0, 64, 0, 32))

For EVMLA, see EVM Legacy Assembly Translator.

JUMPDEST

Original EVM instruction.

Unavailable in Yul.

Ignored in EVMLA. See EVM Legacy Assembly Translator for more information.

PUSH - PUSH32

Original EVM instructions.

Unavailable in Yul.

For EVMLA, see EVM Legacy Assembly Translator.

DUP1 - DUP16

Original EVM instructions.

Unavailable in Yul.

For EVMLA, see EVM Legacy Assembly Translator.

SWAP1 - SWAP16

Original EVM instructions.

Unavailable in Yul.

For EVMLA, see EVM Legacy Assembly Translator.

EVM Assembly Auxiliary Instructions

These instructions do not have a direct representation in EVM or EraVM instruction sets. Instead, they perform auxiliary operations required for generating the target bytecode.

PUSH [$]

The same as datasize.

PUSH #[$]

The same as dataoffset.

ASSIGNIMMUTABLE

The same as setimmutable.

For more information, see differences with Ethereum.

PUSHIMMUTABLE

The same as loadimmutable.

For more information, see differences with Ethereum.

PUSHLIB

The same as linkersymbol.

PUSHDEPLOYADDRESS

Returns the address the contract is deployed to.

PUSHSIZE

Can be only found in deploy code. On EVM, returns the total size of the runtime code and constructor arguments.

On EraVM, it is always 0, since EraVM does not operate on runtime code in deploy code.

PUSH data

Pushes a data chunk onto the stack. Data chunks are resolved during the processing of input assembly JSON.

PUSH [tag]

Pushes an EVM Legacy Assembly destination block identifier onto the stack.

Tag

Starts a new EVM Legacy Assembly block. Tags are processed during the translation of EVM Legacy Assembly into EthIR.

Yul Auxiliary Instructions

These instructions do not map directly to EVM or EraVM but instead perform auxiliary operations necessary for generating the target bytecode.

datasize

Original Yul auxiliary instruction.

Unlike on EVM, on EraVM this instruction returns only the size of the header part of the calldata sent to the ContractDeployer.

For more information, see CREATE.

dataoffset

Original Yul auxiliary instruction.

Unlike on EVM, this instruction does not relate to offsets. Instead, it returns the bytecode hash of the contract referenced by the Yul object identifier.

For more information, see CREATE.

datacopy

Original Yul auxiliary instruction.

Unlike on EVM, on EraVM this instruction copies the bytecode hash passed as dataoffset to the destination. Because our compiler translates instructions without analyzing the surrounding context, there is no other way to obtain the bytecode hash within datacopy.

For more information, see CREATE.

setimmutable

Original Yul auxiliary instruction.

Writes immutables to the auxiliary heap.

For more information, see the Differences with Ethereum.

loadimmutable

Original Yul auxiliary instruction.

Reads immutables from the ImmutableSimulator in runtime code, or from temporary values on auxiliary heap in deploy code.

For more information, see the Differences with Ethereum.

linkersymbol

Original Yul auxiliary instruction.

Sets the placeholder of a deployable library. The address must be passed to zksolc with the --libraries option, either in compiler or linker mode.

memoryguard

Original Yul auxiliary instruction.

It is a Yul optimizer hint ignored by zksolc.

verbatim

Original Yul auxiliary instruction.

Unlike on EVM, on EraVM this instruction has nothing to do with insertions of EVM bytecode. Instead, it is used to implement ZKsync EraVM Yul Extensions. In order to compile a Yul contract with extensions, both Yul mode and EraVM extensions must be enabled.

Extensions

EraVM extensions are a set of additional instructions that can be expressed in Solidity and Yul, that can only be compiled to EraVM bytecode.

There are two ways of using EraVM extensions with zksolc:

  1. Call simulations in Solidity.
  2. verbatim function in Yul mode.

Call simulations

Since zksolc could only operate on Yul received from solc, it was not possible to add EraVM-specific functionality to Solidity and Yul. Instead, zksolc introduced a hack with external call instructions that would be replaced with EraVM-specific instructions during emitting LLVM IR. In such external call instructions, the address argument denotes the instruction type, whereas the rest of the arguments are used as instruction arguments.

Call simulations are the only way to use EraVM extensions in Solidity.

verbatim

In Yul mode, there is a special instruction called verbatim that allows emitting EraVM-specific instructions directly from Yul. This instruction is more robust than call simulations, as it allows passing more arguments to the instruction, and it is not affected by the solc's optimizer. Unfortunately, verbatim is only available in Yul mode and cannot be used in Solidity.

It is recommended to only use verbatim in Yul mode, as it is more robust and less error-prone than call simulations in Solidity.

Call Types

In addition to EVM-like call, staticcall and delegatecall, EraVM introduces a few more call types:

  1. Mimic call
  2. System call
  3. Raw call

Each of the call types above has its by-ref modification, which allows passing pointers to ABI data instead of data itself.

Mimic Call

Mimic call is a call type that allows the caller to execute a call to a contract, but with the ability to specify the address of the contract that will be used as the caller. This is useful for EraVM System Contracts that need to call other contracts on behalf of the user. Essentially, it is a more complete version of DELEGATECALL.

For a deeper dive into the Mimic Call, visit the EraVM formal specification.

System Call

System call allows passing more arguments to the callee contract using EraVM registers. This is useful for System Contracts that often require auxiliary data that cannot be passed via calldata.

There are also system mimic calls, which are a combination of both, that is auxiliary arguments can be passed via EraVM registers.

Raw Call

Raw calls are similar to EVM's CALL, STATICCALL, and DELEGATECALL, but they do not encode the ABI data. Instead, the ABI data is passed as an argument to the instruction. This is useful for EraVM System Contracts that need to call other contracts with a specific ABI data that cannot be encoded in the calldata.

Active Pointers

Active pointers are a set of calldata and return data pointers stored in global LLVM IR variables. They are not accessible directly from Yul, but they can be used to forward call and return data between contracts.

The number of active pointers is fixed at 10, and they are numbered from 0 to 9. Some instructions can only use the 0th pointer due to the lack of spare arguments to specify the pointer number. In order to use pointers other than the 0th, use the swap instruction.

Instructions that use active pointers have a reference to this section.

Constant Arrays

Constant arrays are a set of global arrays that can be used to store constant values. They are not accessible directly from Yul, but they can be used to store constant values that are used in multiple places in the contract.

Instruction Reference

The sections below have the following structure:

  1. EraVM instruction name and substituted address.
  2. Instruction description.
  3. Pseudo-code illustrating the behavior under the hood.
  4. Solidity call simulation usage example.
  5. Yul verbatim usage example.

For instance:

Example (0xXXXX)

Executes an EraVM instruction.

Pseudo-code:

return_value = instruction(arg1, arg2, arg3)

Solidity usage:

assembly {
    let return_value := call(arg1, 0xXXXX, arg2, arg3, 0xFFFF, 0, 0)
}

Yul usage:

assembly {
    let return_value := verbatim_3i_1o("instruction", arg1, arg2, arg3)
}

Full list of instructions:

Notes:

  1. The input_length parameter is always set to 0xFFFF or non-zero value. It prevents the solc's optimizer from optimizing the call out.
  2. Instructions that do not modify state are using staticcall instead of call.
  3. Instructions such as raw calls preserve the call type, so they act as modifiers of call, staticcall, and delegatecall.

To L1 (0xFFFF)

Send a message to L1.

Pseudo-code:

to_l1(is_first, value_1, value_2)

Solidity usage:

assembly {
    let _ := call(is_first, 0xFFFF, value_1, value_2, 0xFFFF, 0, 0)
}

Yul usage:

assembly {
    let _ := verbatim_3i_0o("to_l1", is_first, value_1, value_2)
}

Precompile (0xFFFD)

Calls an EraVM precompile.

Pseudo-code:

return_value = precompile(input_data, ergs)

Solidity usage:

assembly {
    let return_value := staticcall(input_data, 0xFFFD, ergs, 0xFFFF, 0, 0)
}

Yul usage:

assembly {
    let return_value := verbatim_2i_01("precompile", input_data, ergs)
}

Decommit (0xFFDD)

Calls the EraVM decommit.

Pseudo-code:

return_value = decommit(versioned_hash, ergs)

Solidity usage:

assembly {
    let return_value := staticcall(versioned_hash, 0xFFDD, ergs, 0xFFFF, 0, 0)
}

Yul usage:

assembly {
    let return_value := verbatim_2i_01("decommit", input_data, ergs)
}

Set Context Value (0xFFF3)

Sets the 128-bit context value. Usually the value is used to pass Ether to the callee contract.

Pseudo-code:

set_context_value(value)

Solidity usage:

assembly {
    let _ := call(0, 0xFFF3, value, 0, 0xFFFF, 0, 0)
}

Yul usage:

assembly {
    let _ := verbatim_1i_0o("set_context_u128", value)
}

Set Pubdata Price (0xFFF2)

Sets the public data price.

Pseudo-code:

set_pubdata_price(value)

Solidity usage:

assembly {
    let _ := call(value, 0xFFF2, 0, 0, 0xFFFF, 0, 0)
}

Yul usage:

assembly {
    let _ := verbatim_1i_0o("set_pubdata_price", value)
}

Increment TX Counter (0xFFF1)

Increments the EraVM transaction counter.

Pseudo-code:

increment_tx_counter()

Solidity usage:

assembly {
    let _ := call(0, 0xFFF1, 0, 0, 0xFFFF, 0, 0)
}

Yul usage:

assembly {
    let _ := verbatim_0i_0o("increment_tx_counter")
}

Code Source (0xFFFE)

Returns the address where the contract is actually deployed, even if it is called with a delegate call. Mostly used in EraVM System Contracts.

Pseudo-code:

code_source = code_source()

Solidity usage:

assembly {
    let code_source := staticcall(0, 0xFFFE, 0, 0xFFFF, 0, 0)
}

Yul usage:

assembly {
    let code_source := verbatim_0i_1o("code_source")
}

Meta (0xFFFC)

Returns a part of the internal EraVM state.

Pseudo-code:

meta = meta()

Solidity usage:

assembly {
    let meta := staticcall(0, 0xFFFC, 0, 0xFFFF, 0, 0)
}

Yul usage:

assembly {
    let meta := verbatim_0i_1o("meta")
}

Get Calldata Pointer (0xFFF0)

Returns the ABI-encoded calldata pointer as integer.

Pseudo-code:

pointer = get_calldata_pointer()

Solidity usage:

assembly {
    let pointer := staticcall(0, 0xFFF0, 0, 0xFFFF, 0, 0)
}

Yul usage:

assembly {
    let pointer := verbatim_0i_1o("get_global::ptr_calldata")
}

Get Call Flags (0xFFEF)

Returns the call flags encoded as 256-bit integer.

Pseudo-code:

flags = get_call_flags()

Solidity usage:

assembly {
    let flags := staticcall(0, 0xFFEF, 0, 0xFFFF, 0, 0)
}

Yul usage:

assembly {
    let flags := verbatim_0i_1o("get_global::call_flags")
}

Get Return Data Pointer (0xFFEE)

Returns the ABI-encoded return data pointer as integer.

Pseudo-code:

pointer = get_return_data_pointer()

Solidity usage:

assembly {
    let pointer := staticcall(0, 0xFFEE, 0, 0xFFFF, 0, 0)
}

Yul usage:

assembly {
    let pointer := verbatim_0i_1o("get_global::ptr_return_data")
}

Get Extra ABI Data (0xFFE5)

Returns the N-th extra ABI data value passed via registers r3-r12.

Pseudo-code:

value = get_extra_abi_data(index)

Solidity usage:

assembly {
    let value := staticcall(index, 0xFFE5, 0, 0xFFFF, 0, 0)
}

Yul usage:

assembly {
    let value := verbatim_0i_1o("get_global::extra_abi_data_0")
    let value := verbatim_0i_1o("get_global::extra_abi_data_1")
    ...
    let value := verbatim_0i_1o("get_global::extra_abi_data_9")
}

Multiplication with Overflow (0xFFE6)

Performs a multiplication with overflow, returning the higher register.

Pseudo-code:

higher_register = mul_high(a, b)

Solidity usage:

assembly {
    let higher_register := staticcall(a, 0xFFE6, b, 0xFFFF, 0, 0)
}

Yul usage:

assembly {
    let higher_register := verbatim_2i_1o("mul_high", a, b)
}

Event Initialize (0xFFED)

Initializes a new EVM-like event.

Pseudo-code:

event_initialize(value_1, value_2)

Solidity usage:

assembly {
    let _ := call(value_1, 0xFFED, value_2, 0, 0xFFFF, 0, 0)
}

Yul usage:

assembly {
    let _ := verbatim_2i_0o("event_initialize", value_1, value_2)
}

Event Write (0xFFEC)

Writes more data to the previously initialized EVM-like event.

Pseudo-code:

event_write(value_1, value_2)

Solidity usage:

assembly {
    let _ := call(value_1, 0xFFEC, value_2, 0, 0xFFFF, 0, 0)
}

Yul usage:

assembly {
    let _ := verbatim_2i_0o("event_write", value_1, value_2)
}

Mimic Call (0xFFFB)

Executes an EraVM mimic call.

Pseudo-code:

status = mimic_call(callee_address, mimic_address, abi_data)

Solidity usage:

assembly {
    let status := call(callee_address, 0xFFFB, 0, abi_data, mimic_address, 0, 0)
}

Yul usage:

assembly {
    let status := verbatim_3i_1o("mimic_call", callee_address, mimic_address, abi_data)
}

Mimic Call by Reference (0xFFF9)

Executes an EraVM mimic call, passing the 0th active pointer instead of ABI data.

Pseudo-code:

status = mimic_call_byref(callee_address, mimic_address)

Solidity usage:

assembly {
    let status := call(callee_address, 0xFFF9, 0, 0, mimic_address, 0, 0)
}

Yul usage:

assembly {
    let status := verbatim_2i_1o("mimic_call_byref", callee_address, mimic_address)
}

System Mimic Call (0xFFFA)

Executes an EraVM mimic call with additional arguments for System Contracts.

Pseudo-code:

status = system_mimic_call(callee_address, mimic_address, abi_data, r3_value, r4_value, [r5_value, r6_value])

Solidity usage:

assembly {
    let status := call(callee_address, 0xFFFA, 0, abi_data, mimic_address, r3_value, r4_value)
}

Yul usage:

assembly {
    let status := verbatim_5i_1o("system_mimic_call", callee_address, mimic_address, abi_data, r3_value, r4_value, r5_value, r6_value)
}

Yul's verbatim allows passing two more extra arguments as it is no limited by the semantics of the call instruction.

System Mimic Call by Reference (0xFFF8)

Executes an EraVM mimic call with additional arguments for System Contracts, passing the 0th active pointer instead of ABI data.

Pseudo-code:

status = system_mimic_call_byref(callee_address, mimic_address, r3_value, r4_value, [r5_value, r6_value])

Solidity usage:

assembly {
    let status := call(callee_address, 0xFFF8, 0, 0, mimic_address, r3_value, r4_value)
}

Yul usage:

assembly {
    let status := verbatim_4i_1o("system_mimic_call_byref", callee_address, mimic_address, r3_value, r4_value, r5_value, r6_value)
}

Yul's verbatim allows passing two more extra arguments as it is no limited by the semantics of the call instruction.

Raw Call (0xFFF7)

Executes an EraVM raw call.

Pseudo-code:

status = raw_call(callee_address, abi_data, output_offset, output_length)

Solidity usage:

assembly {
    let status := call(callee_address, 0xFFF7, 0, 0, abi_data, output_offset, output_length)
    let status := staticcall(callee_address, 0xFFF7, 0, abi_data, output_offset, output_length)
    let status := delegatecall(callee_address, 0xFFF7, 0, abi_data, output_offset, output_length)
}

Yul usage:

assembly {
    let status := verbatim_4i_1o("raw_call", callee_address, abi_data, output_offset, output_length)
    let status := verbatim_4i_1o("raw_static_call", callee_address, abi_data, output_offset, output_length)
    let status := verbatim_4i_1o("raw_delegate_call", callee_address, abi_data, output_offset, output_length)
}

Raw Call by Reference (0xFFF6)

Executes an EraVM raw call, passing the 0th active pointer instead of ABI data.

Pseudo-code:

status = raw_call_byref(callee_address, output_offset, output_length)

Solidity usage:

assembly {
    let status := call(callee_address, 0xFFF7, 0, 0, 0, output_offset, output_length)
    let status := staticcall(callee_address, 0xFFF7, 0, 0, output_offset, output_length)
    let status := delegatecall(callee_address, 0xFFF7, 0, 0, output_offset, output_length)
}

Yul usage:

assembly {
    let status := verbatim_3i_1o("raw_call_byref", callee_address, output_offset, output_length)
    let status := verbatim_3i_1o("raw_static_call_byref", callee_address, output_offset, output_length)
    let status := verbatim_3i_1o("raw_delegate_call_byref", callee_address, output_offset, output_length)
}

System Call (0xFFF5)

Executes an EraVM system call.

Pseudo-code:

status = system_call(callee_address, r3_value, r4_value, abi_data, r5_value, r6_value)

Solidity usage:

assembly {
    let status := call(callee_address, 0xFFF5, r3_value, r4_value, abi_data, r5_value, r6_value)
}

Yul usage:

assembly {
    let status := verbatim_6i_1o("system_call", callee_address, abi_data, r3_value, r4_value, r5_value, r6_value)
    let status := verbatim_6i_1o("system_static_call", callee_address, abi_data, r3_value, r4_value, r5_value, r6_value)
    let status := verbatim_6i_1o("system_delegate_call", callee_address, abi_data, r3_value, r4_value, r5_value, r6_value)
}

Static and delegate system calls are only available in Yul as verbatim.

System Call by Reference (0xFFF4)

Executes an EraVM system call, passing the 0th active pointer instead of ABI data.

Pseudo-code:

status = system_call_byref(callee_address, r3_value, r4_value, r5_value, r6_value)

Solidity usage:

assembly {
    let status := call(callee_address, 0xFFF4, r3_value, r4_value, 0xFFFF, r5_value, r6_value)
}

Yul usage:

assembly {
    let status := verbatim_5i_1o("system_call_byref", callee_address, r3_value, r4_value, r5_value, r6_value)
    let status := verbatim_5i_1o("system_static_call_byref", callee_address, r3_value, r4_value, r5_value, r6_value)
    let status := verbatim_5i_1o("system_delegate_call_byref", callee_address, r3_value, r4_value, r5_value, r6_value)
}

Static and delegate system calls are only available in Yul as verbatim.

Active Pointer: Load Calldata (0xFFEB)

Loads the calldata pointer to the 0th active pointer.

Pseudo-code:

active_ptr_load_calldata()

Solidity usage:

assembly {
    let _ := staticcall(0, 0xFFEB, 0, 0xFFFF, 0, 0)
}

Yul usage:

assembly {
    let _ := verbatim_0i_0o("calldata_ptr_to_active")
}

Active Pointer: Load Return Data (0xFFEA)

Loads the return data pointer to the 0th active pointer.

Pseudo-code:

active_ptr_load_return_data()

Solidity usage:

assembly {
    let _ := staticcall(0, 0xFFEA, 0, 0xFFFF, 0, 0)
}

Yul usage:

assembly {
    let _ := verbatim_0i_0o("return_data_ptr_to_active")
}

Active Pointer: Load Decommit (0xFFDC)

Loads the decommit pointer to the 0th active pointer.

Pseudo-code:

active_ptr_load_decommit()

Solidity usage:

assembly {
    let _ := staticcall(0, 0xFFDC, 0, 0xFFFF, 0, 0)
}

Yul usage:

assembly {
    let _ := verbatim_0i_0o("decommit_ptr_to_active")
}

Active Pointer: Increment (0xFFE9)

Increments the offset of the 0th active pointer.

Pseudo-code:

active_ptr_add(value)

Solidity usage:

assembly {
    let _ := staticcall(value, 0xFFE9, 0, 0xFFFF, 0, 0)
}

Yul usage:

assembly {
    let _ := verbatim_1i_0o("active_ptr_add_assign", value)
}

Active Pointer: Shrink (0xFFE8)

Decrements the slice length of the 0th active pointer.

Pseudo-code:

active_ptr_shrink(value)

Solidity usage:

assembly {
    let _ := staticcall(value, 0xFFE8, 0, 0xFFFF, 0, 0)
}

Yul usage:

assembly {
    let _ := verbatim_1i_0o("active_ptr_shrink_assign", value)
}

Active Pointer: Pack (0xFFE7)

Writes the upper 128 bits to the 0th active pointer.

Pseudo-code:

active_ptr_pack(value)

Solidity usage:

assembly {
    let _ := staticcall(value, 0xFFE7, 0, 0xFFFF, 0, 0)
}

Yul usage:

assembly {
    let _ := verbatim_1i_0o("active_ptr_pack_assign", value)
}

Active Pointer: Load (0xFFE4)

Loads a value from the 0th active pointer at the specified offset, similarly to EVM's CALLDATALOAD.

Pseudo-code:

value = active_ptr_load(offset)

Solidity usage:

assembly {
    let value := staticcall(offset, 0xFFE4, 0, 0xFFFF, 0, 0)
}

Yul usage:

assembly {
    let value := verbatim_1i_1o("active_ptr_data_load", offset)
}

Active Pointer: Copy (0xFFE3)

Copies a slice from the the 0th active pointer to the heap, similarly to EVM's CALLDATACOPY and RETURNDATACOPY.

Pseudo-code:

active_ptr_copy(destination, source, size)

Solidity usage:

assembly {
    let _ := staticcall(destination, 0xFFE3, source, 0xFFFF, size, 0)
}

Yul usage:

assembly {
    let _ := verbatim_3i_0o("active_ptr_data_copy", destination, source, size)
}

Active Pointer: Size (0xFFE2)

Returns the length of the slice referenced by the 0th active pointer, similarly to EVM's CALLDATASIZE and RETURNDATASIZE.

Pseudo-code:

size = active_ptr_size()

Solidity usage:

assembly {
    let size := staticcall(0, 0xFFE2, 0, 0xFFFF, 0, 0)
}

Yul usage:

assembly {
    let size := verbatim_0i_1o("active_ptr_data_size")
}

Active Pointer: Swap (0xFFD9)

Swaps the Nth and Mth active pointers. Swapping allows the active pointer instructions to use pointers other than the 0th.

Pseudo-code:

active_ptr_swap(N, M)

Solidity usage:

assembly {
    let _ := staticcall(N, 0xFFD9, M, 0xFFFF, 0, 0)
}

Yul usage:

assembly {
    let _ := verbatim_2i_0o("active_ptr_swap", N, M)
}

Active Pointer: Return (0xFFDB)

Returns from the contract, using the 0th active pointer as the return data.

Pseudo-code:

active_ptr_return()

Solidity usage:

assembly {
    let _ := staticcall(0, 0xFFDB, 0, 0xFFFF, 0, 0)
}

Yul usage:

assembly {
    let _ := verbatim_0i_0o("active_ptr_return_forward")
}

Active Pointer: Revert (0xFFDA)

Reverts from the contract, using the 0th active pointer as the return data.

Pseudo-code:

active_ptr_revert()

Solidity usage:

assembly {
    let _ := staticcall(0, 0xFFDA, 0, 0xFFFF, 0, 0)
}

Yul usage:

assembly {
    let _ := verbatim_0i_0o("active_ptr_revert_forward")
}

Constant Array: Declare (0xFFE1)

Declares a new global array of constants. After the array is declared, it must be right away filled with values using the set instruction and declared final using the finalization instruction.

Index must be an 8-bit constant value in the range [0; 255].

Size must be a 16-bit constant value in the range [0; 65535].

Pseudo-code:

const_array_declare(index, size)

Solidity usage:

assembly {
    let _ := staticcall(index, 0xFFE1, size, 0xFFFF, 0, 0)
}

This instruction is not available in Yul as verbatim.

Constant Array: Set (0xFFE0)

Sets a value in a global array of constants.

Index must be an 8-bit constant value in the range [0; 255].

Size must be a 16-bit constant value in the range [0; 65535].

Value must be a 256-bit constant value.

Pseudo-code:

const_array_set(index, size, value)

Solidity usage:

assembly {
    let _ := staticcall(index, 0xFFE0, size, 0xFFFF, value, 0)
}

This instruction is not available in Yul as verbatim.

Constant Array: Finalize (0xFFDF)

Finalizes a global array of constants.

Index must be an 8-bit constant value in the range [0; 255].

Pseudo-code:

const_array_finalize(index)

Solidity usage:

assembly {
    let _ := staticcall(index, 0xFFDF, 0, 0xFFFF, 0, 0)
}

This instruction is not available in Yul as verbatim.

Constant Array: Get (0xFFDE)

Gets a value from a global array of constants.

Index must be an 8-bit constant value in the range [0; 255].

Offset must be a 16-bit constant value in the range [0; 65535].

Pseudo-code:

value = const_array_get(index, offset)

Solidity usage:

assembly {
    let value := staticcall(index, 0xFFDE, offset, 0xFFFF, 0, 0)
}

This instruction is not available in Yul as verbatim.

Return Deployed (verbatim-only)

Returns heap data from the constructor.

Since EraVM constructors always return immutables via auxiliary heap, it is not possible to use them for EVM-like scenarios, such as EVM emulators.

Pseudo-code:

return_deployed(offset, length)

Yul usage:

assembly {
    let _ := verbatim_2i_0o("return_deployed", offset, length)
}

Throw (verbatim-only)

Throws a function-level exception.

For a deeper dive into EraVM exceptions, see this page.

Pseudo-code:

throw()

Yul usage:

assembly {
    let _ := verbatim_0i_0o("throw")
}

EraVM Binary Layout

This page describes how assembler listing looks like and how it is transformed to bytecode that can be deployed to the chain.

Definitions

  • A directive is a command issued to the assembler, which is not translated into an executable bytecode instruction. Their names start with a period, for example, .cell. Directives are used to regulate the translation process.
  • An instruction constitutes the smallest executable segment of bytecode. In EraVM, each instruction is exactly eight bytes long.
  • A word is a 256-bit unsigned integer in a big-endian format.

Structure of Assembly File

This section describes the structure of an EraVM assembly file, a text file typically with the extension .zasm.

Data Types

  • U256 word, a 256-bit unsigned integer number, big-endian.
  • U16 16-bit unsigned integer number, big-endian.

Sections

The source code within an EraVM assembly is organized into several sections. The start of a section is denoted by one of the following directives:

  • .rodata: constant, read-only data.
  • .data: global mutable data.
  • .text: executable code.

The description of any section may be spread across the file:

.rodata
    .cell 0
.text
    <some instruction>
.rodata
    .cell 1

In this example, multiple .rodata sections appear, but in the resulting binary file they will be merged into a single contiguous region of memory. The same principle applies to other sections.

Defining Data

The .cell directive defines data:

.rodata
    .cell -1
    .cell 23090
.data
    .cell 1213

Notes:

  • Using .cell in the .data section is deprecated and will not be supported in the future versions of assembly.
  • The value of cell is provided as a signed 256-bit decimal number.
  • Negative numbers will be encoded as 256-bit 2's complement, e.g. -1 is encoded as 0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff.
  • An optional + sign before positive numbers is allowed, e.g. .cell +123.
  • Hexadecimal integer literals are not supported.

Symbols (label names) are supported, for example:

.text

f:
   add r0, r0, r0

g:
   add r0, r0, r0

.rodata

my_cells:
    .cell @lab1
    .cell @lab2
    .cell -1

A single @ is prefixing the label name.

Each .cell is 256-bit wide, even though an address such as @lab1 or @lab2 is just 16-bit wide. Addresses are padded with zeroes to fit in the word.

Overall Structure

The structure of an assembly file is described as follows:

<file> := <section>*

<section> :=
    | ".rodata" <eol>  <const-element>*
    | ".data" <eol> <data-element> *
    | ".text" <eol>  <code-element> *

<const-element> := <label> | <cell>
<label> ::= [a-zA-Z_.@][0-9a-zA-Z_.@]

<data-element> := <label> | <cell>
<cell> :=
    ".cell" <256-bit signed or unsigned constant>

<comment> ::= ";" .*
<labels> ::= (<label> ":" ) *
<code-element> ::= <labels> <instruction> <operand-list> <comment>? EOL
  • EOL stands for “end of line”.
  • <instruction>, <operand-list> depend on the specific instruction. See the EraVM specification.

Execution model

This section provides some elements of the execution environment, the Era Virtual Machine. Full execution model is described in EraVM specification.

Registers

EraVM has 16 general-purpose registers and 2 special registers:

  • PC is a 16-bit program counter register; it holds the address of the next instruction to be executed.
  • SP is a 16-bit stack pointer register. It points to the address following the top of the stack.

Memory

EraVM memory is divided into pages. When a contract is launched, EraVM assigns several pages to it:

  • Code page.

    • Immutable.

    • Contains 2^{16} words.

    • Used to store both instructions and the constants of type U256.

    • Each word may contain 4 instructions or one constant.

    • Instructions and constants are indistinguishable.

    • Code page is addressable in two ways:

      • When EraVM fetches instruction from this page using PC, it addresses 8-byte chunks.

      • When EraVM fetches constants from this page, it addresses 32-byte (word-sized) chunks.

        For example, reading a constant by the address 0 will yield a word composed of binary encoded instructions number 0, 1, 2 and 3; reading a constant by the address 1 from this page will yield a binary encoding for the instructions number 4,5,6,7, and so on.

  • Heap page.

    • Contains 2^{32} bytes and is byte-addressable.
    • However, it is only possible to read words from heap, not the individual bytes.
  • Data stack page.

    • Contains 2^{16} words.
    • Grows towards higher addresses, so every push-like instruction advances SP by at least one.
    • Reserving space on stack is therefore incrementing the value of SP.
    • Each word has an additional tag. If the tag is set, the word contains a pointer to a heap page, either of this contract or belonging to a different contract.
    • Data on stack page can be addressed by their absolute addresses, or relative to SP.
    • Global mutable variables are allocated on stack.

Callstack

EraVM has a separate call stack, a utility data structure that holds information about call frames. There are two kinds of call frames in the EraVM, corresponding to near and far calls:

  • Far call frame corresponds to a call to a different contract.
  • Near call frame corresponds to a near call to the code inside the same contract. Near calls are a low-level mechanism that is used mostly in system contracts.

Call stack differs from the data stack pages, described in section Memory.

Binary layout

The binary file published on chain and passed to EraVM has no structure. It is an image loaded at the beginning of the code page (with offset 0).

The initial value of PC is zero, therefore the execution will start at the first instruction on the code page. Instructions or functions in .text section are not reordered, so the first instruction appearing in the assembly file will be executed first, regardless of labels.

The length of the binary should be an odd number of words, that is, 32 * (2N+1) bytes.

The last word in the binary file is the metadata hash, see section Metadata Hash.

Symbols

There are three default predefined symbols:

  1. DEFAULT_UNWIND: default exception handler / stack unroller for near call instruction call.
  2. DEFAULT_FAR_RETURN: default stack unroller for returns (see Landing Pads).
  3. DEFAULT_FAR_REVERT: default stack unroller for reverts (see Landing Pads).

If the user did not define one of these labels, the assembler will define it and emit a corresponding landing pad (see Landing Pads).

Linking and loading

This section details how the assembly file structure is flattened into a loadable image.

The binary file is divided into three regions:

  1. Initializer.
  2. Instructions.
  3. Constant pool.

The following subsections describe these regions.

Initializer region

Mutable global variables are allocated in the beginning of the stack page, not in code. The stack page supports absolute addressing, therefore the global variables can be accessed directly by their addresses.

If the assembly file defines global variables, the assembler will emit a special initializer code in the beginning of the program; otherwise, initializer region is skipped and we pass to the code region immediately.

The first instruction of the initializer region is incsp <number of globals>. It allocates one word on a data stack per global mutable variable.

For each global that is initialized with a non-zero value, assembler does the following:

  • Copies its initializer to .rodata, which will be loaded to the code page.
  • Emits an instruction:
add code[INIT], r0, stack[IDX]

where:

  • INIT is the address of the initializer in the .rodata.
  • IDX is the index of the global variable.

For example, the following program:

.text

some_label:
  sub!   r0, r0, r0
  jump @some_label

.data
    my_globals:
    .cell 32

.rodata
    .cell 0

Will be translated as if it were written this way:

.text
init_globals:
    incsp 1
    add code[@global_init_0], r0, stack[0]

some_label:
    sub! r0, r0, r0
    jump @some_label

.rodata
    .cell 0
    global_init_0:
    .cell 32

Code region

The .text section is emitted after the initializer region or, if there are no globals, right in the start of the binary file. It is followed by the landing pads and the padding, before the start of the constant pool region.

Landing Pads

After emitting the instructions provided in the .text section of the assembly file, the assembler may emit the landing pads for near calls, returns and reverts. This happens for three predefined symbols: DEFAULT_UNWIND, DEFAULT_FAR_RETURN and DEFAULT_FAR_REVERT.

For example, if the symbol DEFAULT_FAR_RETURN is not explicitly defined, it will be defined automatically and the following landing pad will be appended to the executable code:

;; landing pad for returns
DEFAULT_FAR_RETURN:
    retl @DEFAULT_FAR_RETURN

If the contract executes an instruction retl @DEFAULT_FAR_RETURN, the control is passed to the address DEFAULT_FAR_RETURN, which hosts the same instruction. This starts a loop, popping all near call frames from the callstack. The last retl will perform a far return from the contract. This allows emitting retl @DEFAULT_FAR_RETURN to return from any place inside the contract, no matter how many near calls are currently active.

If neither of the predefined symbols DEFAULT_UNWIND, DEFAULT_FAR_RETURN, DEFAULT_FAR_REVERT was defined explicitly, the following code will be emitted after the .text section.

;; landing pad for near calls
DEFAULT_UNWIND:
    ret.panic.to_label r0, @DEFAULT_UNWIND

;; landing pad for returns
DEFAULT_FAR_RETURN:
    ret.ok.to_label r1, @DEFAULT_FAR_RETURN

;; landing pad for reverts
DEFAULT_FAR_REVERT:
    ret.revert.to_label r1, @DEFAULT_FAR_REVERT

Code padding

The code section starts at 0, if we count the initializing code as its part. Therefore, it is aligned on a 32 byte boundary. If the total number of instructions, with the landing pads, is not divisible by 4, the assembler emits 1 to 3 INVALID instructions as a padding. This way, the instructions will fill a certain number of words completely, and the following region (constant pool region) is aligned on a 32-byte boundary as well.

Constant pool region

The constant pool region is aligned on a 32 byte boundary. It is placed immediately after the code region and contains:

  • Constants defined in .rodata section.
  • Initializers for mutable globals.
  • Padding: nothing or a zero-word to ensure, that the total length of the binary file, including the following hash, equals to an odd number of words.
  • Optionally, metadata hash.

Metadata Hash

An optional, implementation-defined hash of the contract metadata, which may include its source. Depending on the initial layer where the compilation starts (a Solidity contract, its Yul code, assembly), the hash value may be different.

Building ZKsync compiler with sanitizers

This is a guide on how to build the ZKsync Solidity compiler with sanitizers enabled.

Introduction

Sanitizers are tools that help find bugs in code. They are used to detect memory corruption, leaks, and undefined behavior. The most common sanitizers are AddressSanitizer, MemorySanitizer, and ThreadSanitizer.

If you are not familiar with sanitizers, see the official documentation.

Who is this guide for?

This guide is for developers who want to debug issues with ZKsync compilers.

Prerequisites

For sanitizers build to work, the host LLVM compiler version that is used to build LLVM MUST have the same version as the LLVM compiler that is used internally by `rustc` to build the ZKsync compiler.

You can check the LLVM version used by rustc by running the following command rustc --version --verbose.

Build steps

The general steps to have a sanitizer enabled build include:

  1. Build the LLVM framework with the required sanitizer enabled.
  2. Build zksolc with the LLVM build from the previous step.

Please, follow the common installation instructions until the zksync-llvm build step.

This guide assumes the build with AddressSanitizer enabled.

Build LLVM with sanitizer enabled

When building LLVM, use --sanitizer <sanitizer> option and set build type to RelWithDebInfo:

zksync-llvm build --sanitizer=Address --build-type=RelWithDebInfo
Please note that the default Apple Clang compiler is not compatible with Rust. You need to install LLVM using Homebrew and specify the path to the LLVM compiler in the `--extra-args` option. For example:
zksync-llvm build --sanitizer=Address \
  --extra-args '\-DCMAKE_C_COMPILER=/opt/homebrew/opt/llvm/bin/clang' \
               '\-DCMAKE_CXX_COMPILER=/opt/homebrew/opt/llvm/bin/clang++'

Build zksolc with sanitizer enabled

To build the ZKsync compiler with sanitizer enabled, you need to set the RUSTFLAGS environment variable to -Z sanitizer=address and run the cargo build command. Sanitizers build is the feature that is available only for the nightly Rust compiler, it is recommended to set RUSTC_BOOTSTRAP=1 environment variable before the build.

It is also mandatory to use --target option to specify the target architecture. Otherwise, the build will fail. Please, check the table below to find the correct target for your platform.

PlatformLLVM Target Triple
Linux arm64aarch64-unknown-linux-gnu
Linux x86x86_64-unknown-linux-gnu
macOS arm64aarch64-apple-darwin
macOS x86x86_64-apple-darwin

Additionally, for proper reports symbolization it is recommended to set the ASAN_SYMBOLIZER_PATH environment variable. For more info, see symbolizing reports section of LLVM documentation.

For example, to build the ZKsync compiler for macOS arm64 with AddressSanitizer enabled, run the following command:

export RUSTC_BOOTSTRAP=1
export ASAN_SYMBOLIZER_PATH=$(which llvm-symbolizer) # check the path to llvm-symbolizer
TARGET=aarch64-apple-darwin # Change to your target
RUSTFLAGS="-Z sanitizer=address" cargo test --target=${TARGET}

Congratulations! You have successfully built the ZKsync compiler with sanitizers enabled.

Please, refer to the official documentation for more information on how to use sanitizers and their types.