Skip to content

Cloudbreak

Cloudbreak Overview

Cloudbreak is Solana's horizontally scaled account database and the core component behind Solana's high-performance storage. Traditional blockchains typically use a single database to store all state, and performance degrades sharply as the number of accounts grows. Cloudbreak maps account data to memory-mapped files and leverages the random access characteristics of SSDs to achieve horizontal scaling of account data. This enables Solana to support billions of accounts while maintaining extremely low read/write latency.

Official Website: https://solana.com/

Core Features

1. Memory-Mapped Architecture

Revolutionary storage design:

  • Memory-Mapped Files: Leverages the operating system's mmap mechanism
  • Lazy Loading: Only loads data that is actually accessed
  • Zero-Copy: Reads directly from file mappings without copying
  • Automatic Paging: The OS manages data movement between memory and disk
  • Transparent Caching: The OS automatically caches hot data

2. Horizontal Scaling Capability

Breaking through traditional database limitations:

  • Multi-File Sharding: Accounts are distributed across multiple files
  • Parallel Access: Multi-threaded parallel read/write to different shards
  • Independent Growth: Each shard scales independently
  • No Central Bottleneck: No single index limitation
  • TB-Scale Capacity: Supports billions of accounts

3. SSD Optimization

Fully utilizing modern storage:

  • Random Access: Optimized for SSD random read/write
  • Sequential Writing: Batch writes maximize throughput
  • Wear Leveling: Distributed writes extend SSD lifespan
  • Compression Optimization: Optional compression saves space
  • Read-Ahead Optimization: Predictive pre-reading of related data

How It Works

1. Account Storage Structure

Cloudbreak Memory-Mapped Structure:

+-----------------------------+
|  Memory-Mapped File (Append Vec) |
+-----------------------------+
|  Account 1 | Meta | Data    |
|  Account 2 | Meta | Data    |
|  Account 3 | Meta | Data    |
|  ...                        |
|  Account N | Meta | Data    |
+-----------------------------+
         |
   Physical SSD File

2. Account Data Format

Storage structure for each account:

pub struct StoredAccount {
    pub meta: StoredMeta,
    pub account: Account,
}

pub struct StoredMeta {
    pub write_version: u64,  // Write version number
    pub pubkey: Pubkey,      // Account public key
    pub data_len: u64,       // Data length
}

pub struct Account {
    pub lamports: u64,       // SOL balance
    pub data: Vec<u8>,       // Account data
    pub owner: Pubkey,       // Owner program
    pub executable: bool,    // Whether executable
    pub rent_epoch: u64,     // Rent epoch
}

3. Read/Write Flow

Reading an Account:

// 1. Look up the index to locate the account
let location = account_index.get(&pubkey)?;

// 2. Access the file via mmap
let storage = get_storage(location.slot, location.store_id);

// 3. Zero-copy read of account data
let account = storage.get_account(location.offset)?;

// The OS handles automatically:
// - Page cache hit -> Read directly from memory (nanosecond-level)
// - Page cache miss -> Load from SSD to memory (microsecond-level)

Writing an Account:

// 1. Append-write to the current append vec
let storage = get_current_storage()?;
let offset = storage.append_account(&account)?;

// 2. Update the index
account_index.insert(pubkey, AccountLocation {
    slot,
    store_id,
    offset,
});

// 3. Batch flush to disk
// The OS asynchronously flushes in the background without blocking writes

Practical Applications

1. Account Queries

High-performance account access:

use solana_program::{
    account_info::AccountInfo,
    entrypoint::ProgramResult,
};

// Cloudbreak provides extremely low-latency account access
fn process_instruction(accounts: &[AccountInfo]) -> ProgramResult {
    let user_account = &accounts[0];

    // Read account data (microsecond-level)
    let data = user_account.try_borrow_data()?;

    // Update account data
    let mut data_mut = user_account.try_borrow_mut_data()?;
    data_mut[0] = 42;

    // Cloudbreak automatically handles persistence
    Ok(())
}

2. Batch Queries

Leveraging parallel access:

import { Connection, PublicKey } from '@solana/web3.js'

const connection = new Connection('https://api.mainnet-beta.solana.com')

// Cloudbreak supports efficient batch queries
const pubkeys = [
  new PublicKey('Account1...'),
  new PublicKey('Account2...'),
  new PublicKey('Account3...'),
  // ... thousands of accounts
]

// Parallel query, Cloudbreak automatically optimizes
const accounts = await connection.getMultipleAccountsInfo(pubkeys)

console.log('Query complete:', accounts.length)

3. State Snapshots

Creating account snapshots:

# Create a full state snapshot
solana-validator --snapshot-interval-slots 1000

# View snapshot information
solana-ledger-tool snapshot list

# Restore from snapshot
solana-validator --snapshot /path/to/snapshot

Architecture Design

1. Append Vec

Append-only storage:

pub struct AppendVec {
    path: PathBuf,           // File path
    map: MmapMut,            // Memory mapping
    current_len: AtomicUsize, // Current length
    file_size: u64,          // File size
}

impl AppendVec {
    // Append an account
    pub fn append_account(&mut self, account: &StoredAccount) -> Result<usize> {
        let offset = self.current_len.load(Ordering::Relaxed);
        let size = account.stored_size();

        // Check space
        if offset + size > self.file_size {
            return Err(AppendVecError::NoSpace);
        }

        // Write data (zero-copy)
        unsafe {
            let dst = self.map.as_mut_ptr().add(offset);
            ptr::copy_nonoverlapping(account as *const _ as *const u8, dst, size);
        }

        // Update length
        self.current_len.fetch_add(size, Ordering::Release);

        Ok(offset)
    }

    // Read an account
    pub fn get_account(&self, offset: usize) -> Result<&StoredAccount> {
        // Read directly from mmap (zero-copy)
        unsafe {
            let ptr = self.map.as_ptr().add(offset) as *const StoredAccount;
            Ok(&*ptr)
        }
    }
}

2. Index Structure

Efficient account indexing:

pub struct AccountsIndex {
    // Pubkey -> AccountLocation mapping
    map: DashMap<Pubkey, RwLock<AccountMapEntry>>,
}

pub struct AccountMapEntry {
    slot_list: Vec<(Slot, AccountInfo)>,
    ref_count: AtomicU64,
}

pub struct AccountInfo {
    store_id: AppendVecId,  // Storage ID
    offset: usize,          // Offset
    lamports: u64,          // Balance snapshot
}

3. Sharding Strategy

Intelligent data sharding:

  • Time-based Sharding: Storage files split by slot
  • Capacity-based Sharding: Each file has a fixed size (e.g., 4GB)
  • Automatic Switching: New files are automatically created when a file is full
  • Parallel Writing: Different threads write to different files
  • Independent Management: Each shard has an independent lifecycle

Coordination with Other Components

1. Coordination with Sealevel

Supporting parallel execution:

  • Lock-free Reads: Multi-threaded concurrent account reads
  • Write Isolation: Writes to different accounts do not conflict
  • Batch Loading: Sealevel batch-preloads accounts
  • Zero-Copy: Directly passes mmap pointers

2. Coordination with Gulf Stream

Optimizing pre-execution:

  • Pre-loading: Gulf Stream predicts needed accounts
  • Prefetch Optimization: Cloudbreak preloads into memory
  • Cache Hits: Accounts are already in memory at execution time
  • Reduced Latency: Minimizes disk I/O wait time

3. Coordination with PoH

Ensuring consistency:

  • Version Control: Slot-based version management
  • Rollback Support: Can roll back to any slot
  • Fork Handling: Supports multi-fork account states
  • Finality: Persisted after PoH confirmation

Performance Optimization

1. Caching Strategy

Multi-tier caching:

  • L1 Cache: In-memory cache for recently accessed accounts
  • L2 Cache: OS page cache
  • Warm-up: Preloads hot accounts at startup
  • LRU Eviction: Automatically evicts cold data
  • Compressed Cache: Compresses infrequently used accounts to save memory

2. Garbage Collection

Automatic cleanup of expired data:

// Mark old account versions
fn mark_old_accounts(current_slot: Slot) {
    for (pubkey, entry) in accounts_index.iter() {
        entry.slot_list.retain(|(slot, _)| {
            // Keep the latest version and unconfirmed versions
            *slot >= current_slot - 1000
        });
    }
}

// Reclaim space
fn shrink_storage(store_id: AppendVecId) {
    // 1. Create a new file
    let new_vec = AppendVec::new(calc_shrunk_size(store_id));

    // 2. Copy live accounts
    for account in get_alive_accounts(store_id) {
        new_vec.append_account(account);
    }

    // 3. Update the index
    update_index_to_new_vec(new_vec);

    // 4. Delete the old file
    remove_old_vec(store_id);
}

3. I/O Optimization

Maximizing storage performance:

  • Batch Flushing: Aggregates small writes
  • Asynchronous I/O: Non-blocking I/O operations
  • Direct I/O: Bypasses page cache (in specific scenarios)
  • Read-ahead: Sequential pre-reading of adjacent data
  • Write Coalescing: Merges consecutive writes
  • Sealevel: Parallel runtime
  • Gulf Stream: Mempool-less forwarding
  • PoH (Proof of History): Proof of time
  • mmap: Memory mapping system call
  • Append-only Log: Append-only log

Summary

Cloudbreak provides Solana with extreme-performance account storage through memory-mapped files and horizontal scaling design. It cleverly leverages the operating system's virtual memory management and the random access characteristics of modern SSDs to achieve microsecond-level read latency and extremely high write throughput. Deeply integrated with Sealevel's parallel execution and Gulf Stream's pre-forwarding, Cloudbreak serves as a critical pillar of Solana's high-performance infrastructure. Append-only storage, automatic garbage collection, and intelligent caching mechanisms ensure long-term stable system operation. For developers, Cloudbreak is transparent -- simply use the standard Solana SDK to automatically enjoy its performance advantages. As storage technology evolves (NVMe, persistent memory), Cloudbreak will continue to optimize, providing an even stronger storage foundation for Solana's large-scale applications.