Skip to content

EVM Bytecode

EVM bytecode is the machine language that the Ethereum Virtual Machine (EVM) can understand and execute. Just as a CPU can only execute binary machine code, the EVM does not directly understand smart contract source code written in high-level programming languages like Solidity or Vyper. Instead, it executes a low-level instruction sequence generated by a compiler — the bytecode.

The Problem It Solves

Ethereum aims to be a "world computer" that allows developers to write smart contracts with arbitrary logic. To achieve this goal, the following challenges need to be addressed:

  1. Platform Independence: Smart contracts need to run identically on thousands of heterogeneous nodes worldwide (different operating systems, hardware architectures).
  2. Resource Control: Since blockchain resources are limited, the cost of each computational step (Gas) must be precisely calculated to prevent abuse and infinite loops.
  3. Code Compactness: On-chain storage is extremely expensive, so execution code needs to be as compact as possible to reduce storage and deployment costs.

EVM bytecode solves the above problems by defining a standardized, hardware-independent instruction set.

Implementation Mechanisms and Principles

Compilation Process

After a developer writes a Solidity contract, the compiler (such as solc) converts it into a hexadecimal bytecode string (e.g., 0x60806040...).

Bytecode Structure

A typical contract bytecode contains two main parts: 1. Creation Code (Init Code): This part of the code is executed only once during contract deployment. It is responsible for initializing contract state (such as constructor logic) and returning the runtime code. 2. Runtime Code (Deployed Code): This is the code actually stored on the blockchain. When an external account or another contract calls this contract, it is this code that gets executed.

Instruction Execution

Bytecode consists of a series of opcodes, each being one byte (8 bits) representing a specific instruction (such as PUSH, ADD, SSTORE). * Fetch: The EVM program counter (PC) reads the current byte. * Decode: The byte is parsed into the corresponding operation instruction. * Execute: The EVM operates on the stack, memory, or storage according to the instruction. * Metering: Each instruction has a fixed Gas cost, and the EVM deducts the corresponding fee from the total Gas provided by the transaction.

Key Features

  • Stack-Based: The EVM is a stack machine. Most instructions take parameters from the top of the stack and push results back onto it. This design simplifies the virtual machine implementation.
  • Turing Complete: Supports control flow instructions like jumps (JUMP), theoretically capable of executing any computational logic (limited by Gas).
  • Sandboxed Environment: Bytecode runs in an isolated EVM environment with no direct access to the network, filesystem, or other processes, ensuring security.
  • Gas Metering: Every bytecode instruction has a precisely defined Gas consumption, enforcing the economic model of computational resources at the bytecode level.
  • ABI (Application Binary Interface): While bytecode defines the logic, ABI defines how to interact with the bytecode (how to encode function calls and parameters).
  • Disassembler: A tool that converts hexadecimal bytecode back into readable opcode mnemonics.
  • Decompiler: A tool that attempts to reconstruct high-level language logic from bytecode, used for analyzing closed-source contracts.