Prover subsystem introduction

From protocol version v27 onwards Old Prover Stack (Witness Vector Generator & Prover Fri) have been removed in favor of the new Prover Stack (Circuit Prover)

The prover subsystem consists of several binaries that perform different steps of the batch proof generation process, as follows:

Prover gateway: interface between core and prover subsystems, fetches batch jobs from core, and sends batch proofs back to core.
Witness generator: component that takes batch information (tx execution/state diffs/computation results) and constructs witness for proof generation.
Circuit prover: component that generates a circuit proof (GPU accelerated).
Proof compressor: component that “wraps” the generated proof so that it can be sent to L1 (GPU accelerated).

While not technically a part of the prover workspace, the following components are essential for it:

Proof data handler: API on the core side which Prover gateway interacts with.
Prover Job Monitor: Metrics exporter and job rescheduler. In it’s absence, jobs would not be rescheduled and metrics used for autoscaling would not exist, rendering internal autoscaling infrastructure useless.

Finally, the prover workspace has several CLI tools:

Circuit key generator: CLI used to generate keys required for proving.
Prover CLI: CLI for observing and maintaining the production proving infrastructure.

There are core components that also participate in the proof generation process by preparing the input data, such as metadata calculator, commitment generator, basic witness input producer, and protective reads writer. We won’t cover them much in these docs, but it’s better to know that they exist and are important for the prover subsystem as well.

We’ll cover how the components work further in documentation.

How it runs

Proof generation is a multi-stage process, where the initial jobs are created by the Prover gateway, and then moved by the House Keeper until the proof is generated.

The real-life deployment of prover subsystem looks as follows:

1x prover gateway
1x house keeper
Many witness generators
Many witness vector generators
Many circuit provers
1+ proof compressors

Currently, the proving subsystem is designed to run in GCP. In theory, it’s mostly environment-agnostic, and all of the components can be launched locally, but more work is needed to run a production system in a distributed mode outside of GCP.

Witness generators, witness vector generators, and provers are spawned on demand based on the current system load via an autoscaler (WIP, so not released publicly yet). They can be spawned in multiple clusters among different zones, based on the availability of machines with required specs.

How to develop

Different parts of the subsystem have different hardware requirement, but the aggregated summary to be able to run everything on a single machine is as follows:

CPU with 16+ physical cores.
GPU with CUDA support and at least 24 GB of VRAM.
At least 64GB of RAM.
200+ GB of disk space. 400+ GB is recommended for development, as /target directory can get quite large.

Given that the requirements are quite high, it’s often more convenient developing the prover in a GCP VM rather than on a local machine. Setting up a VM is covered further in docs.

Creating a GCP VM

In this section we will cover the creation of a VM suitable for prover development. We assume that you already have access to the GCP cluster.

When you need a VM

Generally, you don’t always need a VM to work on prover. You typically need it to either modify the code under cfg(feature = "gpu") flag, or when you need to run some tests. Moreover, VMs are shared, e.g. many people have access to them, and you can’t store sensitive data (like SSH keys) there, so they can’t be used as primary workstations. Finally, the VMs with GPU aren’t cheap, so we expect you to use them when you really need them.

A typical workflow so far is to instantiate a new VM when you need it, and remove once you’re done. Remember: even if the VM is stopped, the SSD is persisted, so it’s not free.

Create a VM

Open Google cloud console and choose “Compute Engine”.

On the “Compute Engine” page choose the cluster suitable for creating VMs with GPU, and then click on “Create instance”.

We will need an GPU L4 instance, so find the zone that is close to you geographically and has such instances. At the time of writing, europe-west2 is one of the possible options. L4 is recommended as the cheapest option, but you may use a beefier machine if you need it.

When you choose the region, set the following options:

Name: A descriptive name that contains your name, e.g. john-doe-prover-dev-machine.
Region and zone: Values you’ve found above.
Machine configuration: “GPUs”, then:
- GPU Type: NVIDIA L4
- Number of GPUs: 1
- Machine type: Preset, g2-standard-16
Availability policies: Choose standard provisioning. Spot instances can be preempted while you work on them, which will disrupt your flow.
Then click on “VM provisioning model advanced settings” and
- Click on “Set a time limit for the VM”
- Set the limit to 8 hours
On VM termination: Stop
Boot disk: Click on “Change”, then:
- Operating system: Ubuntu
- Version: Ubuntu 22.04 LTS (x86/64)
- Boot disk type: SSD persistent disk
- Size: 300GB

Leave the remaining options as is and click on “Create”.

You will have to wait a bit and then your instance will be created. Once you see that the machine is running, click on an arrow near “SSH” in the list of options, and choose “Open in browser window”.

You should successfully connect to your machine now.

Warning

Don’t forget to remove the VM once you’ve finished your scope of work. It’s OK to keep the machine if you expect to work with it on the next working day, but otherwise it’s better to remove and create a new one when needed.

Adding your own ssh key (on local machine)

Using browser to connect to the machine may not be the most convenient option. Instead, we can add an SSH key to be able to connect there.

It is highly recommended to generate a new SSH key specifically for this VM, for example:

ssh-keygen -t rsa -f ~/.ssh/gcp_vm -C <YOUR WORK EMAIL> -b 2048

…where “your work email” is the same email you use to access GCP.

Check the contents of the public key:

cat ~/.ssh/gcp_vm.pub

Click on your machine name, then click on “Edit”. Scroll down until you see “SSH Keys” section and add the generated public key there. Then save.

Get back to the list of VMs and find the external IP of your VM. Now you should be able to connect to the VM via ssh. Assuming that your work email is abc@example.com and the external IP is 35.35.35.35:

ssh -i ~/.ssh/gcp_vm abc@35.35.35.35

Make the VM cozy

If you intend to use the VM somewhat regularly, install all the tools you would normally install on your own machine, like zsh and nvim.

It is also highly recommended to install tmux, as you will have to run multiple binaries and observe their output. If you don’t know what is it or why should you care, watch this video.

Native tmux may be hard to use, so you may also want to install some configuration for it, e.g.

oh-my-tmux or
tmux-sensible.

Finally, it is recommended to choose a different terminal theme or prompt than what you use locally, so that you can easily see whether you’re running in the VM or locally.

Connecting via VS Code

VS Code can connect to VMs via SSH, so you can have the comfort of using your own IDE while still running everything on a remote machine.

If you’re using WSL, note that VS Code will have to look up the keys in Windows, so you will have to copy your keys there as well, e.g.:

cp ~/.ssh/gcp_vm* /mnt/c/Users/User/.ssh

Then, when you open a fresh VS Code window, in the “Start” section:

Choose “Connect to Host”
Click on “Configure Hosts”
Create a host entry.

Host entry looks as follows:

Host <host_name>
  HostName <external IP>
  IdentityFile <path to private SSH key>
  User <your user name in VM>

E.g. for the command we’ve used as an example before: ssh -i ~/.ssh/gcp_vm abc@35.35.35.35, the file will be:

Host gcp_vm
  HostName 35.35.35.35
  IdentityFile ~/.ssh/gcp_vm
  User abc

Once you’ve configured the host, you can click on “Connect to” again, then “Connect to Host”, and your VM should be listed there. On the first connect you’ll have to confirm that you want to connect to it, and then choose the operating system (Linux).

On security

Do not store SSH keys, tokens, or other private information on GCP VMs. Do not use SSH keys forwarding either. These VMs are shared, and every person has root access to all the VMs by default.

You may, however, use tools like rsync or sshfs.

Development environment setup

In this section, we cover installing prerequisites for running prover subsystem. We assume that you have a prepared machine in place, e.g. a compatible local machine or a prepared GCP VM.

ZKsync repo setup

If you haven’t already, you need to initialize the ZKsync repository first. Follow this guide for that.

Before proceeding, make sure that you can run the server and integration tests pass.

Prover-specific prerequisites

Cmake 3.24 or higher

Use Kitware APT repository.

CUDA runtime

If you’re using a local machine, make sure that you have up-to-date GPU driver.

Use Official CUDA downloads.

Choose: OS -> Linux -> x86_64 -> Ubuntu (For WSL2 choose WSL-Ubuntu) -> 22.04 -> deb (network).

Install both the base and driver (kernel module flavor).

Setup environment variables: add the following to your configuration file (.bashrc/.zshrc):

# CUDA
export CUDA_HOME=/usr/local/cuda
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
export PATH=$PATH:$CUDA_HOME/bin

Reboot for the drivers to kick-in.

Bellman-CUDA

Bellman-CUDA is a library required for GPU proof compressor.

Navigate to some directory where you want to store the code, and then do the following:

git clone git@github.com:matter-labs/era-bellman-cuda.git
cmake -Bera-bellman-cuda/build -Sera-bellman-cuda/ -DCMAKE_BUILD_TYPE=Release
cmake --build era-bellman-cuda/build/

After that add the following environment variable to your config (.bashrc/.zshrc):

export BELLMAN_CUDA_DIR=<PATH_TO>/era-bellman-cuda

Don’t forget to reload it (e.g. source ~/.zshrc).

Running provers

Preparing

First, create a new chain with prover mode GPU:

zkstack chain create --prover-mode gpu

It will create a config similar to era, but with:

Proof sending mode set to OnlyRealProofs
Prover mode set to Local instead of GCS.

Key generation

This operation should only be done once; if you already generated keys, you can skip it.

The following command will generate the required keys:

zkstack prover setup-keys

With that, you should be ready to run the prover.

Running

Important! Generating a proof takes a lot of time, so if you just want to see whether you can generate a proof, do it against clean sequencer state (e.g. right after zkstack chain init).

We will be running a bunch of binaries, it’s recommended to run each in a separate terminal.

Server

zkstack server --components=api,tree,eth,state_keeper,housekeeper,commitment_generator,da_dispatcher,proof_data_handler,vm_runner_protective_reads,vm_runner_bwip

Prover gateway

zkstack prover run --component=gateway

Then wait until the first job is picked up. Prover gateway has to insert protocol information into the database, and until it happens, witness generators will panic and won’t be able to start.

Witness generator

Once a job is created, start witness generators:

zkstack prover run --component=witness-generator --round=all-rounds

--all_rounds means that witness generator will produce witnesses of all kinds. You can run a witness generator for each round separately, but it’s mostly useful in production environments.

Circuit Prover

zkstack prover run --component=circuit-prover --threads 32

Circuit prover takes outputs from witness generators and produces proofs out of it. As part of the process, there’s vector generation and GPU proving. Vector Generation is single-threaded time-consuming operation. You may run multiple jobs by changing --threads parameter, it should correspond to the number of CPU cores. Alternatively you can use advanced mode by setting --light_wvg_count and --heavy_wvg_count parameters instead of --threads. The exact amount depends strictly on your CPU/GPU specs, but a ballpark estimate (useful for local development) is 15 light & 1 heavy.

Note

The light threads typically uses approximately 2GB of RAM, with heavy ~10GB of RAM.

Prover job monitor

You can start the prover job monitor by specifying its component as follows.

zkstack prover run --component=prover-job-monitor

Insert protocol version in prover database

Before running the prover, you can insert the protocol version in the prover database by executing the following command:

zkstack dev prover insert-version --version <VERSION> --snark-wrapper=<SNARK_WRAPPER>

To query this information, use the following command:

zkstack dev prover info

Proof compressor

Warning

Both prover and proof compressor require 24GB of VRAM, and currently it’s not possible to make them use different GPU. So unless you have a GPU with 48GB of VRAM, you won’t be able to run both at the same time.

You should wait until the proof is generated, and once you see in the server logs that it tries to find available compressor, you can shut the prover down, and run the proof compressor:

zkstack prover run --component=compressor

Once the proof is compressed, proof gateway will see that and will send the generated proof back to core.

Prover flow

In this section, we’re going to learn what stages does the proof generation process have. It’s a complex process, so we’ll be looking at it from four perspectives:

Core<->Prover subsystem interactions.
Core side of workflow.
Prover pipeline.
Batch proof generation.
Infrastructure distribution.

After that, we will touch on how this flow is mapped on the actual production infrastructure.

Core <-> Prover subsystem interactions

Core and prover subsystem are built in such a way that they are mostly isolated from each other. Each side has its own database and GCS buckets, and both have “gateway” components they use for interaction.

The only exception here is the house_keeper: it’s a component that exists as a part of the server, it’s main purpose is to manage jobs (and emit metrics for job management) in the prover workspace, but at the same time it has access to both core and prover databases. The component will probably be split in the future and most of it will be moved to the prover workspace.

Otherwise, the interaction between subsystems can be expressed as follows:

sequenceDiagram
  participant C as Core
  participant P as Prover

  loop In parallel, for each batch
    P-->>+C: Get a job to prove
    C->>-P: Unproven batch
    P->>P: Calculate proof
    P->>C: Submit proof
  end

Core exposes an API, and Prover repeatedly polls this API, fetching new batch proof jobs and submitting batch proofs.

Core side of workflow

Despite the fact that the prover is isolated from the core, the core has multiple components specifically designed to prepare inputs for proving.

The following diagram shows what happens under the hood when the prover subsystem requests a new job:

sequenceDiagram
  box Core
  participant Ob as GCS
  participant DB as Core database
  participant API as Proof data handler
  end
  participant P as Prover
  P-->>+API: Get a job
  API-->>DB: Lock a suitable job
  DB->>API: Job is marked as "picked_up"
  API-->>Ob: Fetch BWIP data
  Ob->>API: Return BWIP data
  API-->>Ob: Fetch Merkle Tree data
  Ob->>API: Return Merkle Tree data
  API-->>DB: Fetch batch metadata
  DB->>API: Return batch metadata
  API->>-P: Return a job

First of all, proof_data_handler will check if all the data required for the proof generation is already prepared by the core. If so, it will lock the job so that it’s not assigned twice, and will fetch required information from multiple sources. Then this data is given to the prover together with the batch number.

Prover pipeline

Once job is received by the prover, it has to go through several different stages. Consider this a mental model of the pipeline, since in reality some stages happen in parallel, and some have different degree of sequencing.

sequenceDiagram
participant C as Core
box Prover
participant PG as Gateway
participant BPG as Basic WG+Proving
participant LPG as Leaf WG+Proving
participant NPG as Node WG+Proving
participant RTPG as Recursion tip WG+Proving
participant SPG as Scheduler WG+Proving
participant CP as Compressor
end
C-->>PG: Job
PG->>BPG: Batch data
BPG->>LPG: Basic proofs
LPG->>NPG: Aggregated proofs (round 1)
NPG->>NPG: Internal aggregation to get 1 proof per circuit type
NPG->>RTPG: Aggregated proofs (round 2)
RTPG->>SPG: Aggregated proofs (round 3)
SPG->>CP: Aggregated proof (round 4)
CP->>PG: SNARK proof
PG-->>C: Proof

When we process the initial job (during basic witness generation) we create many sub-jobs for basic proof generation. Once they are processed, we start to aggregate generated proofs, and we do it in “levels”. With each aggregation level, we reduce the number of jobs.

Aggregation levels are commonly referred by numbers in the prover workspace, from 0 to 4. So if someone mentions “aggregation round 2”, they refer to the “node” stage, and round 4 corresponds to the “scheduler” stage. Proof compression is considered separate operation, and doesn’t have a numeric value.

Jobs within the aggregation round may also have different types, but this will be covered later.

The actual numbers may vary, but just for example there might exist a batch, so that it initially creates 10000 jobs, which are processed as follows:

On round 0, we also emit 10000 jobs. We aren’t doing “actual” aggregation here.
On round 1, we’re turning 10000 jobs into 100.
On round 2, we should turn these 100 jobs into at most 16. Depending on the batch parameters, it may required additional “iterations” of the stage. For example, after we processed the initial 100 jobs, we may get 35 proofs. Then, additional node level jobs will be created, until we reduce the number to at most 16.
On round 3, we’re turning 16 jobs into 1.
On round 4, we already have just 1 job, and we produce a single aggregated proof.
Finally, the proof is processed by the proof compressor and sent back to the core.

Once again, these numbers are just for example, and don’t necessarily represent the actual state of affairs. The exact number of jobs depend on number of txs in a batch (and what’s done inside those txs) while the aggregation split (mapping of N circuits of level X to M circuits of level X + 1) is determined by the config geometry.

Actual proof generation

Every “job” we mentioned has several sub-stages. More precisely, it receives some kind of input, which is followed by witness generation, witness vector generation, and circuit proving. The output of circuit proving is passed as an input for the next “job” in the pipeline.

For each aggregation level mentioned above the steps are the same, though the inputs and outputs are different.

sequenceDiagram
participant Ob as Prover GCS
participant DB as Prover DB
participant WG as Witness Generator
participant CP as Circuit Prover
WG-->>DB: Get WG job
DB->>WG: Job
WG-->>Ob: Get job data
Ob->>WG: Data for witness generation
WG->>WG: Build witness
WG->>Ob: Save witness
WG->>DB: Create prover job
CP-->>DB: Get prover job
DB->>CP: Prover job
CP->>CP: Build witness vector
CP->>CP: Generate a proof
CP->>Ob: Store proof
CP->>DB: Mark proof as stored

Circuits

Finally, even within the same level, there may be different circuit types. Under the hood, they prove the correctness of different parts of computations. From a purely applied point of view, it mostly means that initially we receive X jobs of N types, which cause Y jobs of M types, and so on.

So, in addition to the aggregation layer, we also have a circuit ID. A tuple of aggregation round and circuit ID form an unique job identifier, which allows us to understand which inputs we should receive, what processing logic we should run, and which outputs we should produce.

As of Jul 2024, we have 35 circuit types mapped to 5 aggregation layers.

Note

The specifics of each circuit type and aggregation layers are out of scope for this document, but you can find more information on that in the further reading section.

Protocol versions

Finally, ZKsync has protocol versions, and it has upgrades from time to time. Each protocol version upgrade is defined on L1, and the version follows SemVer convention, e.g. each version is defined as 0.x.y. During the protocol version upgrade, one of three things can change:

Protocol behavior. For example, we add new functionality and our VM starts working differently.
Circuits implementation. For example, VM behavior doesn’t change, but we add more constraints to the circuits.
Contracts changes. For example, we add a new method to the contract, which doesn’t affect neither VM or circuits.

For the first two cases, there will be changes in circuits, and there will be new verification keys. It means, that the proving process will be different. The latter has no implications for L2 behavior.

As a result, after upgrade, we may need to generate different proofs. But given that upgrades happen asynchronously, we cannot guarantee that all the “old” batched will be proven at the time of upgrade.

Because of that, prover is protocol version aware. Each binary that participates in proving is designed to only generate proofs for a single protocol version. Once the upgrade happens, “old” provers continue working on the “old” unproven batches, and simultaneously we start spawning “new” provers for the batches generated with the new protocol version. Once all the “old” batches are proven, no “old” provers will be spawned anymore.

Recap

That’s a quite sophisticated infrastructure, and it may be hard to understand it in one go. Here’s a quick recap of this page:

Main components of the prover subsystem are house keeper, prover gateway, witness generator, circuit prover, and proof compressor.
House keeper and prover gateway don’t perform any significant computations, and there is just one instance of each.
Witness generator, witness vector generator, and GPU prover work together as a “sub-pipeline”.
As of Jul 2024, the pipeline consists of 5 aggregation rounds, which are further split into 35 (aggregation_round, circuit_id) pairs, followed by the proof compression.
Provers are versioned according to the L1 protocol version. There may be provers with different versions running at the same time.

Proving a batch

If you got to this section, then most likely you are wondering how to prove and verify the batch by yourself. After releases prover-v15.1.0 and core-v24.9.0 prover subsystem doesn’t need access to core database anymore, which means you can run only prover subsystem and prove batches without running the whole core system. This guide will help you with that.

Requirements

Hardware

Setup for running the whole process should be the same as described here, except you need 48 GB of GPU, which requires an NVIDIA A100 80GB GPU.

Prerequisites

First of all, you need to install CUDA drivers, all other things will be dealt with by zkstack and prover_cli tools. For that, check the following guide(you can skip bellman-cuda step).

Install the prerequisites, which you can find here. Note, that if you are not using Google VM instance, you also need to install gcloud.

Now, you can use zkstack and prover_cli tools for setting up the env and running prover subsystem.

First, install zkstackup with:

curl -L https://raw.githubusercontent.com/matter-labs/zksync-era/main/zkstack_cli/zkstackup/install | bash

Then install the most recent version of zkstack with:

zkstackup

Initializing system

After you have installed the tool, you can create ecosystem(you need to run only if you are outside of zksync-era) by running:

zkstack ecosystem create --l1-network=localhost --prover-mode=gpu --wallet-creation=localhost --l1-batch-commit-data-generator-mode=rollup --start-containers=true

The command will create the ecosystem and all the necessary components for the prover subsystem. You can leave default values for all the prompts you will see Now, you need to initialize the prover subsystem by running:

zkstack prover init --setup-database=true --use-default=true --dont-drop=false

For prompts you can leave default values as well.

Proving the batch

Getting data needed for proving

At this step, we need to get the witness inputs data for the batch you want to prove. Database information now lives in input file, called witness_inputs_<batch>.bin generated by different core components).

If batch was produced by your system, the file is stored by prover gateway in GCS (or your choice of object storage – check config). At the point of getting it, most likely there is no artifacts directory created. If you have cloned the zksync-era repo, then it is in the root of ecosystem directory. Create artifacts directory by running:
```
mkdir -p <path/to/era/prover/artifacts/witness_inputs>
```
To access it from GCS (assuming you have access to the bucket), run:
```
gsutil cp gs://your_bucket/witness_inputs/witness_inputs_<batch>.bin <path/to/era/prover/artifacts/witness_inputs>
```
If you want to prove the batch produced by zkSync, you can get the data from the ExternalProofIntegrationAPI using {address}/proof_generation_data endpoint. You need to replace {address} with the address of the API and provide the batch number as a query data to get the data for specific batch, otherwise, you will receive latest data for the batch, that was already proven. Example:
```
wget --content-disposition {address}/proof_generation_data
```
or
```
wget --content-disposition {address}/proof_generation_data/{l1_batch_number}
```

Preparing database

After you have the data, you need to prepare the system to run the batch. So, database needs to know about the batch and the protocol version it should use. You can do that with running

zkstack dev prover info

Example output:

===============================

Current prover setup information:

Protocol version: 0.24.2

Snark wrapper: 0x14f97b81e54b35fe673d8708cc1a19e1ea5b5e348e12d31e39824ed4f42bbca2

Database URL: postgres://postgres:notsecurepassword@localhost:5432/zksync_prover_localhost_era

===============================

This command will provide you with the information about the semantic protocol version(you need to know only minor and patch versions) and snark wrapper value. In the example, MINOR_VERSION is 24, PATCH_VERSION is 2, and SNARK_WRAPPER is 0x14f97b81e54b35fe673d8708cc1a19e1ea5b5e348e12d31e39824ed4f42bbca2.

Now, with the use of prover_cli tool, you can insert the data about the batch and protocol version into the database:

First, get the database URL(you can find it in <ecosystem_dir>/chains/<chain_name>/configs/secrets.yaml - it is the prover_url value) Now, insert the information about protocol version in the database:

prover_cli <DATABASE_URL> insert-version --version=<MINOR_VERSION> --patch=<PATCH_VERSION> --snark-wrapper=<SNARK_WRAPPER>

And finally, provide the data about the batch:

prover_cli <DATABASE_URL> insert-batch --number=<BATCH_NUMBER> --version=<MINOR_VERSION> --patch=<PATCH_VERSION>

Also, provers need to know which setup keys they should use. It may take some time, but you can generate them with:

zkstack prover generate-sk

Running prover subsystem

At this step, all the data is prepared and you can run the prover subsystem. To do that, run the following commands:

zkstack prover run --component=prover
zkstack prover run --component=witness-generator --round=all-rounds
zkstack prover run --component=witness-vector-generator --threads=10
zkstack prover run --component=compressor
zkstack prover run --component=prover-job-monitor

And you are good to go! The prover subsystem will prove the batch and you can check the results in the database.

Verifying zkSync batch

Now, assuming the proof is already generated, you can verify using ExternalProofIntegrationAPI. Usually proof is stored in GCS bucket(for which you can use the same steps as for getting the witness inputs data here, but locally you can find it in /artifacts/proofs_fri directory). Now, simply send the data to the endpoint {address}/verify_batch/{batch_number}.

Example:

curl -v  -F proof=@{path_to_proof_binary} {address_of_API}/verify_proof/{l1_batch_number}

API will respond with status 200 if the proof is valid and with the error message otherwise.

ZKsync Prover Documentation