A Few Thoughts on Benchmarks

If you have spent any time with computer benchmarks, you have probably heard the expression “Lies, damned lies, and benchmarks!” Why do people feel that way about benchmarks? Well, it is because many benchmarks have been presented with little or no context about what that benchmark really meant. “What was the hardware configuration?” “Was it running special firmware?” “How was the software tuned and tweaked?” “Was the test run entirely out of cache?” “Does this result have any relationship to what I need to do?”

To have any meaning, benchmark results should address all of those questions and more. Someone studying the claims should be able to re-create the hardware and software and have a reasonable expectation of coming close to the same results. It may not reflect the performance that would be enjoyed by any particular real-world application, but it should be possible, more or less, to recreate those results in that particular context.

IOmeter
IOmeter is a common storage benchmarking tool, originally developed by Intel, but an open source project now for more than a decade. IOmeter works with one or more usage profiles. Each profile specifies one or more block sizes that it will read or write at, a percentage of read vs write I/Os and a percentage of random vs sequential I/Os. The test also specifies how many I/Os may be queued for asynchronous operation to better model the results an application might see. Simple applications will issue a single read or write and then wait for that to complete, synchronously, before moving on to the next operation. More sophisticated applications can queue a number of I/O requests before they need to start waiting for results. This allows the operating system, the RAID controller and the controller within the disk or SSD to optimize their performance by moving more data at a time, and for mechanical disks, by performing seeks in an order that minimizes head movements.

There are several important measurements IOmeter results. Bandwidth is the one that many reports focus on, but bandwidth is most interesting in long sequential read and write operations. For smaller, random input/output operations, the number of those operations per second, IOPS, is usually much more important. And finally, the latency, the time it takes from issuing the operation to completion of the operation can be very important for many applications.

ION’s Benchmarks
ION’s benchmark summaries on the web show each of these variables. Each summary also includes a link to spreadsheets with all of the details, the context, including run-time. Running a benchmark for longer periods of time is important to make sure that benefits of caching are the same as they would be in full-time operation of a real application. Running the test for a long time is especially important for SSD-based systems, because the performance of an SSD, if not properly provisioned, can change over time. ION’s IOmeter tests all run each individual benchmark for one hour or more and include a significant ramp time doing that same operation before it begins keeping statistics.

Real World
Synthetic Benchmarks, like IOmeter, can do a good job of setting expectations for performance of real applications when there is a good understanding of the I/O patterns, compute demands and memory usage of the real application. In the end, though, the only real way to know how an application or service will perform is to test with the hardware and software being considered. To be complete, testing should be done with data and conditions that are known to really stress the system under test.

One Reply to “A Few Thoughts on Benchmarks”

One Reply to “A Few Thoughts on Benchmarks”

Leave a Reply Cancel reply