mirror of
https://github.com/facebook/rocksdb.git
synced 2026-03-20 06:24:22 +00:00
Page:
Tuning RocksDB on Spinning Disks
Pages
A Tutorial of RocksDB SST formats
Administration and Data Access Tool
Allocating Some Indexes and Bloom Filters using Huge Page TLB
Approximate Size
Articles about Rocks
Asynchronous IO
Atomic flush
Background Error Handling
Basic Operations
Benchmarking tools
BlobDB
Block Cache
Block cache analysis and simulation tools
Building on Windows
Checkpoints
Choose Level Compaction Files
Column Families
Compaction Filter
Compaction Stats and DB Status
Compaction Trivial Move
Compaction
Compression
Creating and Ingesting SST files
CuckooTable Format
Daily Off‐peak Time Option
Data Block Hash Index
Delete A Range Of Keys
Delete Stale Files
DeleteRange Implementation
DeleteRange
Developing with an IDE
Dictionary Compression
Direct IO
EventListener
External Table (Experimental)
FIFO compaction style
Features Not in LevelDB
Full File Checksum and Checksum Handoff
Fuzz Test
Home
How to ask a performance related question
How to backup RocksDB
How to persist in memory RocksDB database
How we keep track of live SST files
IO Tracer and Parser
IO
Implement Queue Service Using RocksDB
Index Block Format
Indexing SST Files for Better Lookup Performance
Iterator Implementation
Iterator
JNI Debugging
Journal
Known Issues
Leveled Compaction
Logger
Logging in RocksJava
Low Priority Write
MANIFEST
Managing Disk Space Utilization
Manual Compaction
MemTable
Memory usage in RocksDB
Mempurge (Memtable Garbage Collection) [Experimental]
Merge Operator Implementation
Merge Operator
Multi Column Family Iterator
MultiGet Performance
Object Registry
Online Verification
Open Projects
Option String and Option Map
Partitioned Index Filters
Perf Context and IO Stats Context
Performance Benchmark 2014
Performance Benchmark 201807
Performance Benchmarks October 2022
Performance Benchmarks
Pipelined Write
PlainTable Format
Platform Requirements
Prefix Seek
Projects Being Developed
Proposal: Unifying Level and Universal Compactions
Proposals on Improving Rocksdb's Options
Publication
Rate Limiter
Read Modify Write Benchmarks
Read only and Secondary instances
Reducing memcpy overhead when using Iterators
Remote Compaction
Replication Helpers
RocksDB Bloom Filter
RocksDB Compatibility Between Different Releases
RocksDB Configurable Objects
RocksDB Contribution Guide
RocksDB Extensions
RocksDB FAQ
RocksDB In Memory Workload Performance Benchmarks
RocksDB Options File
RocksDB Overview
RocksDB Public Communication and Information Channels
RocksDB Release Methodology
RocksDB Repairer
RocksDB Trace, Replay, Analyzer, and Workload Generation
RocksDB Troubleshooting Guide
RocksDB Tuning Guide
RocksDB Users and Use Cases
RocksDB version macros
RocksJava API TODO
RocksJava Basics
RocksJava Performance on Flash Storage
Rocksdb Architecture Guide
Rocksdb BlockBasedTable Format
Rocksdb Table Format
SST File Manager
SST Partitioner
SecondaryCache (Experimental)
SeekForPrev
Setup Options and Basic Tuning
Simulation Cache
Single Delete
Slow Deletion
Snapshot
Space Tuning
Speed Up DB Open
Statistics
Stress test
Subcompaction
Tailing Iterator
Talks
Terminology
Tests
The Customizable Class
Third party language bindings
This is a test
Thread Pool
Tiered Storage (Experimental)
Tiered Storage Benchmarking
Time to Live
Track WAL in MANIFEST
Transactions
Tuning RocksDB from Java
Tuning RocksDB on Spinning Disks
Two Phase Commit Implementation
Universal Compaction
User defined Timestamp
WAL Compression
WAL Performance
WAL Recovery Modes
What's new in RocksDB2.7
Wide Columns
Write Ahead Log (WAL)
Write Ahead Log File Format
Write Batch With Index
Write Buffer Manager
Write Stalls
WritePrepared Transactions
WriteUnprepared Transactions
[To Be Deprecated] Persistent Read Cache
log_format.txt
poc test
poc.html
poc
testpoc.html
unordered_write
No results
2
Tuning RocksDB on Spinning Disks
Siying Dong edited this page 2020-12-30 16:02:39 -08:00
Table of Contents
- Memory / Persistent Storage ratio is usually much lower for databases on spinning disks. If the ratio of data to RAM is too large then you can reduce the memory required to keep performance critical data in RAM. Suggestions:
- Spinning disks usually provide much lower random read throughput than flash.
- Throughput gap between random read vs. sequential read is much higher in spinning disks. Suggestions:
- Spinning disks are much larger than flash:
Spinning disks are different for RocksDB, for some main reasons:
Memory / Persistent Storage ratio is usually much lower for databases on spinning disks. If the ratio of data to RAM is too large then you can reduce the memory required to keep performance critical data in RAM. Suggestions:
- Use relatively larger block sizes to reduce index block size. You should use at least 64KB block size. You can consider 256KB or even 512KB. The downside of using large blocks is that RAM is wasted in the block cache.
- Turn on BlockBasedTableOptions.cache_index_and_filter_blocks=true as it's very likely you can't fit all index and bloom filters in memory. Even if you can, it's better to set it for safety.
- enable options.optimize_filters_for_hits to reduce some bloom filter block size.
- Be careful about whether you have enough memory to keep all bloom filters. If you can't then bloom filters might hurt performance.
- Try to encode keys as compact as possible. Shorter keys can reduce index block size.
Spinning disks usually provide much lower random read throughput than flash.
- Set options.skip_stats_update_on_db_open=true to speed up DB open time.
- This is a controversial suggestion: use level-based compaction, as it is more friendly to reduce reads from disks.
- If you use level-based compaction, use options.level_compaction_dynamic_level_bytes=true.
- Set options.max_file_opening_threads to a value larger than 1 if the server has multiple disks.
Throughput gap between random read vs. sequential read is much higher in spinning disks. Suggestions:
- Enable RocksDB-level read ahead for compaction inputs: options.compaction_readahead_size with options.new_table_reader_for_compaction_inputs=true
- Use relatively large file sizes. We suggest at least 256MB
- Use relatively larger block sizes
Spinning disks are much larger than flash:
- To avoid too many file descriptors, use larger files. We suggest at least file size of 256MB.
- If you use universal compaction style, don't make single DB size too large, because the full compaction will take a long time and impact performance. You can use more DBs but single DB size is smaller than 500GB.