Cloudbreak

Cloudbreak Overview¶

Cloudbreak is Solana's horizontally scaled account database and the core component behind Solana's high-performance storage. Traditional blockchains typically use a single database to store all state, and performance degrades sharply as the number of accounts grows. Cloudbreak maps account data to memory-mapped files and leverages the random access characteristics of SSDs to achieve horizontal scaling of account data. This enables Solana to support billions of accounts while maintaining extremely low read/write latency.

Official Website: https://solana.com/

Core Features¶

1. Memory-Mapped Architecture¶

Revolutionary storage design:

Memory-Mapped Files: Leverages the operating system's mmap mechanism
Lazy Loading: Only loads data that is actually accessed
Zero-Copy: Reads directly from file mappings without copying
Automatic Paging: The OS manages data movement between memory and disk
Transparent Caching: The OS automatically caches hot data

2. Horizontal Scaling Capability¶

Breaking through traditional database limitations:

Multi-File Sharding: Accounts are distributed across multiple files
Parallel Access: Multi-threaded parallel read/write to different shards
Independent Growth: Each shard scales independently
No Central Bottleneck: No single index limitation
TB-Scale Capacity: Supports billions of accounts

3. SSD Optimization¶

Fully utilizing modern storage:

Random Access: Optimized for SSD random read/write
Sequential Writing: Batch writes maximize throughput
Wear Leveling: Distributed writes extend SSD lifespan
Compression Optimization: Optional compression saves space
Read-Ahead Optimization: Predictive pre-reading of related data

How It Works¶

1. Account Storage Structure¶

Cloudbreak Memory-Mapped Structure:

+-----------------------------+
|  Memory-Mapped File (Append Vec) |
+-----------------------------+
|  Account 1 | Meta | Data    |
|  Account 2 | Meta | Data    |
|  Account 3 | Meta | Data    |
|  ...                        |
|  Account N | Meta | Data    |
+-----------------------------+
         |
   Physical SSD File

2. Account Data Format¶

Storage structure for each account:

pub struct StoredAccount {
    pub meta: StoredMeta,
    pub account: Account,
}

pub struct StoredMeta {
    pub write_version: u64,  // Write version number
    pub pubkey: Pubkey,      // Account public key
    pub data_len: u64,       // Data length
}

pub struct Account {
    pub lamports: u64,       // SOL balance
    pub data: Vec<u8>,       // Account data
    pub owner: Pubkey,       // Owner program
    pub executable: bool,    // Whether executable
    pub rent_epoch: u64,     // Rent epoch
}

3. Read/Write Flow¶

Reading an Account:

// 1. Look up the index to locate the account
let location = account_index.get(&pubkey)?;

// 2. Access the file via mmap
let storage = get_storage(location.slot, location.store_id);

// 3. Zero-copy read of account data
let account = storage.get_account(location.offset)?;

// The OS handles automatically:
// - Page cache hit -> Read directly from memory (nanosecond-level)
// - Page cache miss -> Load from SSD to memory (microsecond-level)

Writing an Account:

// 1. Append-write to the current append vec
let storage = get_current_storage()?;
let offset = storage.append_account(&account)?;

// 2. Update the index
account_index.insert(pubkey, AccountLocation {
    slot,
    store_id,
    offset,
});

// 3. Batch flush to disk
// The OS asynchronously flushes in the background without blocking writes

Practical Applications¶

1. Account Queries¶

High-performance account access:

use solana_program::{
    account_info::AccountInfo,
    entrypoint::ProgramResult,
};

// Cloudbreak provides extremely low-latency account access
fn process_instruction(accounts: &[AccountInfo]) -> ProgramResult {
    let user_account = &accounts[0];

    // Read account data (microsecond-level)
    let data = user_account.try_borrow_data()?;

    // Update account data
    let mut data_mut = user_account.try_borrow_mut_data()?;
    data_mut[0] = 42;

    // Cloudbreak automatically handles persistence
    Ok(())
}

2. Batch Queries¶

Leveraging parallel access:

import { Connection, PublicKey } from '@solana/web3.js'

const connection = new Connection('https://api.mainnet-beta.solana.com')

// Cloudbreak supports efficient batch queries
const pubkeys = [
  new PublicKey('Account1...'),
  new PublicKey('Account2...'),
  new PublicKey('Account3...'),
  // ... thousands of accounts
]

// Parallel query, Cloudbreak automatically optimizes
const accounts = await connection.getMultipleAccountsInfo(pubkeys)

console.log('Query complete:', accounts.length)

3. State Snapshots¶

Creating account snapshots:

# Create a full state snapshot
solana-validator --snapshot-interval-slots 1000

# View snapshot information
solana-ledger-tool snapshot list

# Restore from snapshot
solana-validator --snapshot /path/to/snapshot

Architecture Design¶

1. Append Vec¶

Append-only storage:

pub struct AppendVec {
    path: PathBuf,           // File path
    map: MmapMut,            // Memory mapping
    current_len: AtomicUsize, // Current length
    file_size: u64,          // File size
}

impl AppendVec {
    // Append an account
    pub fn append_account(&mut self, account: &StoredAccount) -> Result<usize> {
        let offset = self.current_len.load(Ordering::Relaxed);
        let size = account.stored_size();

        // Check space
        if offset + size > self.file_size {
            return Err(AppendVecError::NoSpace);
        }

        // Write data (zero-copy)
        unsafe {
            let dst = self.map.as_mut_ptr().add(offset);
            ptr::copy_nonoverlapping(account as *const _ as *const u8, dst, size);
        }

        // Update length
        self.current_len.fetch_add(size, Ordering::Release);

        Ok(offset)
    }

    // Read an account
    pub fn get_account(&self, offset: usize) -> Result<&StoredAccount> {
        // Read directly from mmap (zero-copy)
        unsafe {
            let ptr = self.map.as_ptr().add(offset) as *const StoredAccount;
            Ok(&*ptr)
        }
    }
}

2. Index Structure¶

Efficient account indexing:

pub struct AccountsIndex {
    // Pubkey -> AccountLocation mapping
    map: DashMap<Pubkey, RwLock<AccountMapEntry>>,
}

pub struct AccountMapEntry {
    slot_list: Vec<(Slot, AccountInfo)>,
    ref_count: AtomicU64,
}

pub struct AccountInfo {
    store_id: AppendVecId,  // Storage ID
    offset: usize,          // Offset
    lamports: u64,          // Balance snapshot
}

3. Sharding Strategy¶

Intelligent data sharding:

Time-based Sharding: Storage files split by slot
Capacity-based Sharding: Each file has a fixed size (e.g., 4GB)
Automatic Switching: New files are automatically created when a file is full
Parallel Writing: Different threads write to different files
Independent Management: Each shard has an independent lifecycle

Coordination with Other Components¶

1. Coordination with Sealevel¶

Supporting parallel execution:

Lock-free Reads: Multi-threaded concurrent account reads
Write Isolation: Writes to different accounts do not conflict
Batch Loading: Sealevel batch-preloads accounts
Zero-Copy: Directly passes mmap pointers

2. Coordination with Gulf Stream¶

Optimizing pre-execution:

Pre-loading: Gulf Stream predicts needed accounts
Prefetch Optimization: Cloudbreak preloads into memory
Cache Hits: Accounts are already in memory at execution time
Reduced Latency: Minimizes disk I/O wait time

3. Coordination with PoH¶

Ensuring consistency:

Version Control: Slot-based version management
Rollback Support: Can roll back to any slot
Fork Handling: Supports multi-fork account states
Finality: Persisted after PoH confirmation

Performance Optimization¶

1. Caching Strategy¶

Multi-tier caching:

L1 Cache: In-memory cache for recently accessed accounts
L2 Cache: OS page cache
Warm-up: Preloads hot accounts at startup
LRU Eviction: Automatically evicts cold data
Compressed Cache: Compresses infrequently used accounts to save memory

2. Garbage Collection¶

Automatic cleanup of expired data:

// Mark old account versions
fn mark_old_accounts(current_slot: Slot) {
    for (pubkey, entry) in accounts_index.iter() {
        entry.slot_list.retain(|(slot, _)| {
            // Keep the latest version and unconfirmed versions
            *slot >= current_slot - 1000
        });
    }
}

// Reclaim space
fn shrink_storage(store_id: AppendVecId) {
    // 1. Create a new file
    let new_vec = AppendVec::new(calc_shrunk_size(store_id));

    // 2. Copy live accounts
    for account in get_alive_accounts(store_id) {
        new_vec.append_account(account);
    }

    // 3. Update the index
    update_index_to_new_vec(new_vec);

    // 4. Delete the old file
    remove_old_vec(store_id);
}

3. I/O Optimization¶

Maximizing storage performance:

Batch Flushing: Aggregates small writes
Asynchronous I/O: Non-blocking I/O operations
Direct I/O: Bypasses page cache (in specific scenarios)
Read-ahead: Sequential pre-reading of adjacent data
Write Coalescing: Merges consecutive writes

Sealevel: Parallel runtime
Gulf Stream: Mempool-less forwarding
PoH (Proof of History): Proof of time
mmap: Memory mapping system call
Append-only Log: Append-only log

Summary¶

Cloudbreak provides Solana with extreme-performance account storage through memory-mapped files and horizontal scaling design. It cleverly leverages the operating system's virtual memory management and the random access characteristics of modern SSDs to achieve microsecond-level read latency and extremely high write throughput. Deeply integrated with Sealevel's parallel execution and Gulf Stream's pre-forwarding, Cloudbreak serves as a critical pillar of Solana's high-performance infrastructure. Append-only storage, automatic garbage collection, and intelligent caching mechanisms ensure long-term stable system operation. For developers, Cloudbreak is transparent -- simply use the standard Solana SDK to automatically enjoy its performance advantages. As storage technology evolves (NVMe, persistent memory), Cloudbreak will continue to optimize, providing an even stronger storage foundation for Solana's large-scale applications.