Berkeley DB Java Edition 4.1 Improvements

The new release of Berkeley DB Java Edition (JE), Release 4.1, includes several new features which drastically improve out-of-cache performance. When a JE application has a data set that does not fit entirely in cache, and there is no particular working set of data that fits in cache, the application will see the best performance when as much of the internal btree nodes (the index) are kept in cache. 4.1 includes improvements to make JE's cache management more efficient, and provides statistics to help determine cache efficiency.
It's worth giving a shout-out to the Sun ISV Engineering lab people who were invaluable in this effort. They let us use a lot of their big-iron hardware for 3 months of intense tuning and performance analysis, all before the merger was completed.
The first important new feature is Concurrent Eviction. In past releases, cache eviction has been carried out by JE daemon threads, application threads which call JE operations, and an optional single evictor thread. The actual eviction operation was serialized, and could create a bottleneck where many threads could be seen waiting upon the method com.sleepycat.je.evictor.Evictor.doEvict().
In 4.1.6, cache eviction is no longer serialized and can be executed concurrently. In addition, JE now has a dedicated configurable thread pool which will do cache eviction when memory limits are reached. Eviction is done by this dedicated pool, by other JE daemon threads, and by application threads. JE attempts to skew the eviction workload toward the pool and daemon threads, in order to offload application threads.
The second important feature is in-memory btree Internal Node (IN) compression which is targeted at reducing the memory needed to hold these nodes in cache. One of the optimizations reduces the in-memory footprint of an IN when only a small portion of it has been referenced, as would be the case when data records are accessed in a random order, or when a subset of data is accessed. It does not help if the application is doing (e.g.) a database-wide cursor traversal. A second optimization in this area occurs when a key's length is less than or equal to 16 bytes, which can be true either when the key is naturally small, or when key prefixing is enabled throughDatabase.setKeyPrefix().
A user ran tests comparing JE 4.0.103 and JE 4.1 in a read-only workload and shared the results with us. When the database fits completely in the cache (4 GB of memory), performance is about the same. Dropping the cache to 2 GB (all INs still fit into memory), performance (throughput and latency) improves 5%. When the cache is further reduced to various values between 1 GB and 512 MB (only some of the INs fit in memory), performance improves more than 3x.
One other interesting note about these tests is that the test configuration has enough memory to hold the database in the file system cache (even though they did not allocate enough memory to the JE cache to hold all of the database). The net of this is that there is no "true" IO occurring, but rather all IO is only to the file system cache. By putting the data in the file system cache rather than on the Java heap (and therefore the JE cache), GC overhead is reduced while still maintaining "in-memory" performance (since the data was in the file system cache).
What is also interesting about this is that the existing JE-tuning adage that out-of-cache scenarios should adjust the je.evictor.lruOnly and je.evictor.nodesPerScan parameters is changing. By varying these values in 4.1 from the recommended norms (false and 100, respectively), the user is able to achieve even better performance. We will of course be updating our FAQ entries to state the new recommended values.
Naturally we're very excited about these results and want to share them with you. Stay tuned for more news when we have the results of read/write workloads.

No comments:

Post a Comment