Cloudbreak
Cloudbreak Overview¶
Cloudbreak is Solana's horizontally scaled account database and the core component behind Solana's high-performance storage. Traditional blockchains typically use a single database to store all state, and performance degrades sharply as the number of accounts grows. Cloudbreak maps account data to memory-mapped files and leverages the random access characteristics of SSDs to achieve horizontal scaling of account data. This enables Solana to support billions of accounts while maintaining extremely low read/write latency.
Official Website: https://solana.com/
Core Features¶
1. Memory-Mapped Architecture¶
Revolutionary storage design:
- Memory-Mapped Files: Leverages the operating system's mmap mechanism
- Lazy Loading: Only loads data that is actually accessed
- Zero-Copy: Reads directly from file mappings without copying
- Automatic Paging: The OS manages data movement between memory and disk
- Transparent Caching: The OS automatically caches hot data
2. Horizontal Scaling Capability¶
Breaking through traditional database limitations:
- Multi-File Sharding: Accounts are distributed across multiple files
- Parallel Access: Multi-threaded parallel read/write to different shards
- Independent Growth: Each shard scales independently
- No Central Bottleneck: No single index limitation
- TB-Scale Capacity: Supports billions of accounts
3. SSD Optimization¶
Fully utilizing modern storage:
- Random Access: Optimized for SSD random read/write
- Sequential Writing: Batch writes maximize throughput
- Wear Leveling: Distributed writes extend SSD lifespan
- Compression Optimization: Optional compression saves space
- Read-Ahead Optimization: Predictive pre-reading of related data
How It Works¶
1. Account Storage Structure¶
Cloudbreak Memory-Mapped Structure:
+-----------------------------+
| Memory-Mapped File (Append Vec) |
+-----------------------------+
| Account 1 | Meta | Data |
| Account 2 | Meta | Data |
| Account 3 | Meta | Data |
| ... |
| Account N | Meta | Data |
+-----------------------------+
|
Physical SSD File
2. Account Data Format¶
Storage structure for each account:
pub struct StoredAccount {
pub meta: StoredMeta,
pub account: Account,
}
pub struct StoredMeta {
pub write_version: u64, // Write version number
pub pubkey: Pubkey, // Account public key
pub data_len: u64, // Data length
}
pub struct Account {
pub lamports: u64, // SOL balance
pub data: Vec<u8>, // Account data
pub owner: Pubkey, // Owner program
pub executable: bool, // Whether executable
pub rent_epoch: u64, // Rent epoch
}
3. Read/Write Flow¶
Reading an Account:
// 1. Look up the index to locate the account
let location = account_index.get(&pubkey)?;
// 2. Access the file via mmap
let storage = get_storage(location.slot, location.store_id);
// 3. Zero-copy read of account data
let account = storage.get_account(location.offset)?;
// The OS handles automatically:
// - Page cache hit -> Read directly from memory (nanosecond-level)
// - Page cache miss -> Load from SSD to memory (microsecond-level)
Writing an Account:
// 1. Append-write to the current append vec
let storage = get_current_storage()?;
let offset = storage.append_account(&account)?;
// 2. Update the index
account_index.insert(pubkey, AccountLocation {
slot,
store_id,
offset,
});
// 3. Batch flush to disk
// The OS asynchronously flushes in the background without blocking writes
Practical Applications¶
1. Account Queries¶
High-performance account access:
use solana_program::{
account_info::AccountInfo,
entrypoint::ProgramResult,
};
// Cloudbreak provides extremely low-latency account access
fn process_instruction(accounts: &[AccountInfo]) -> ProgramResult {
let user_account = &accounts[0];
// Read account data (microsecond-level)
let data = user_account.try_borrow_data()?;
// Update account data
let mut data_mut = user_account.try_borrow_mut_data()?;
data_mut[0] = 42;
// Cloudbreak automatically handles persistence
Ok(())
}
2. Batch Queries¶
Leveraging parallel access:
import { Connection, PublicKey } from '@solana/web3.js'
const connection = new Connection('https://api.mainnet-beta.solana.com')
// Cloudbreak supports efficient batch queries
const pubkeys = [
new PublicKey('Account1...'),
new PublicKey('Account2...'),
new PublicKey('Account3...'),
// ... thousands of accounts
]
// Parallel query, Cloudbreak automatically optimizes
const accounts = await connection.getMultipleAccountsInfo(pubkeys)
console.log('Query complete:', accounts.length)
3. State Snapshots¶
Creating account snapshots:
# Create a full state snapshot
solana-validator --snapshot-interval-slots 1000
# View snapshot information
solana-ledger-tool snapshot list
# Restore from snapshot
solana-validator --snapshot /path/to/snapshot
Architecture Design¶
1. Append Vec¶
Append-only storage:
pub struct AppendVec {
path: PathBuf, // File path
map: MmapMut, // Memory mapping
current_len: AtomicUsize, // Current length
file_size: u64, // File size
}
impl AppendVec {
// Append an account
pub fn append_account(&mut self, account: &StoredAccount) -> Result<usize> {
let offset = self.current_len.load(Ordering::Relaxed);
let size = account.stored_size();
// Check space
if offset + size > self.file_size {
return Err(AppendVecError::NoSpace);
}
// Write data (zero-copy)
unsafe {
let dst = self.map.as_mut_ptr().add(offset);
ptr::copy_nonoverlapping(account as *const _ as *const u8, dst, size);
}
// Update length
self.current_len.fetch_add(size, Ordering::Release);
Ok(offset)
}
// Read an account
pub fn get_account(&self, offset: usize) -> Result<&StoredAccount> {
// Read directly from mmap (zero-copy)
unsafe {
let ptr = self.map.as_ptr().add(offset) as *const StoredAccount;
Ok(&*ptr)
}
}
}
2. Index Structure¶
Efficient account indexing:
pub struct AccountsIndex {
// Pubkey -> AccountLocation mapping
map: DashMap<Pubkey, RwLock<AccountMapEntry>>,
}
pub struct AccountMapEntry {
slot_list: Vec<(Slot, AccountInfo)>,
ref_count: AtomicU64,
}
pub struct AccountInfo {
store_id: AppendVecId, // Storage ID
offset: usize, // Offset
lamports: u64, // Balance snapshot
}
3. Sharding Strategy¶
Intelligent data sharding:
- Time-based Sharding: Storage files split by slot
- Capacity-based Sharding: Each file has a fixed size (e.g., 4GB)
- Automatic Switching: New files are automatically created when a file is full
- Parallel Writing: Different threads write to different files
- Independent Management: Each shard has an independent lifecycle
Coordination with Other Components¶
1. Coordination with Sealevel¶
Supporting parallel execution:
- Lock-free Reads: Multi-threaded concurrent account reads
- Write Isolation: Writes to different accounts do not conflict
- Batch Loading: Sealevel batch-preloads accounts
- Zero-Copy: Directly passes mmap pointers
2. Coordination with Gulf Stream¶
Optimizing pre-execution:
- Pre-loading: Gulf Stream predicts needed accounts
- Prefetch Optimization: Cloudbreak preloads into memory
- Cache Hits: Accounts are already in memory at execution time
- Reduced Latency: Minimizes disk I/O wait time
3. Coordination with PoH¶
Ensuring consistency:
- Version Control: Slot-based version management
- Rollback Support: Can roll back to any slot
- Fork Handling: Supports multi-fork account states
- Finality: Persisted after PoH confirmation
Performance Optimization¶
1. Caching Strategy¶
Multi-tier caching:
- L1 Cache: In-memory cache for recently accessed accounts
- L2 Cache: OS page cache
- Warm-up: Preloads hot accounts at startup
- LRU Eviction: Automatically evicts cold data
- Compressed Cache: Compresses infrequently used accounts to save memory
2. Garbage Collection¶
Automatic cleanup of expired data:
// Mark old account versions
fn mark_old_accounts(current_slot: Slot) {
for (pubkey, entry) in accounts_index.iter() {
entry.slot_list.retain(|(slot, _)| {
// Keep the latest version and unconfirmed versions
*slot >= current_slot - 1000
});
}
}
// Reclaim space
fn shrink_storage(store_id: AppendVecId) {
// 1. Create a new file
let new_vec = AppendVec::new(calc_shrunk_size(store_id));
// 2. Copy live accounts
for account in get_alive_accounts(store_id) {
new_vec.append_account(account);
}
// 3. Update the index
update_index_to_new_vec(new_vec);
// 4. Delete the old file
remove_old_vec(store_id);
}
3. I/O Optimization¶
Maximizing storage performance:
- Batch Flushing: Aggregates small writes
- Asynchronous I/O: Non-blocking I/O operations
- Direct I/O: Bypasses page cache (in specific scenarios)
- Read-ahead: Sequential pre-reading of adjacent data
- Write Coalescing: Merges consecutive writes
Related Concepts and Technologies¶
- Sealevel: Parallel runtime
- Gulf Stream: Mempool-less forwarding
- PoH (Proof of History): Proof of time
- mmap: Memory mapping system call
- Append-only Log: Append-only log
Summary¶
Cloudbreak provides Solana with extreme-performance account storage through memory-mapped files and horizontal scaling design. It cleverly leverages the operating system's virtual memory management and the random access characteristics of modern SSDs to achieve microsecond-level read latency and extremely high write throughput. Deeply integrated with Sealevel's parallel execution and Gulf Stream's pre-forwarding, Cloudbreak serves as a critical pillar of Solana's high-performance infrastructure. Append-only storage, automatic garbage collection, and intelligent caching mechanisms ensure long-term stable system operation. For developers, Cloudbreak is transparent -- simply use the standard Solana SDK to automatically enjoy its performance advantages. As storage technology evolves (NVMe, persistent memory), Cloudbreak will continue to optimize, providing an even stronger storage foundation for Solana's large-scale applications.