NVMe SSDs offer a direct PCI express connection to storage, skipping the overhead and latency that comes with assuming that storage is rotating media when it is really SSD. Individually, NVMe SSDs offer a significant performance advantage over SATA SSDs. Over the last year, ION has invested much time and energy in benchmarking the performance of NVMe SSDs in servers. In practice, the applications are still limited.
ION started with the assumption that all primary storage in a server must be redundant. It is true that SSDs are much more reliable than spinning disks as a repository for data, but it is also true that failures are possible. Assuming that any data important enough to be kept on a server is too important to be at risk, RAID is a requirement. The present state of RAID with NVMe SSDs requires the use of software RAID, provided by either the operating system or Intel Rapid Storage Technology enterprise (RSTe).
For benchmarking, ION converted (8) bays in an SR-71mach5 SpeedServer to support NVMe SSDs, which included replacing standard risers with tisers that include PCIe 3.0 x16 slots and bridge cards to connect (4) NVMe SSDs with (4) PCIe lanes each. The resulting configuration was dubbed SR-71mach5+ SpeedServer and included (16) 1.6TB Intel S3610 SATA SSDs via two SAS3 RAID controllers. The slot topology of the SR-71mach5 system means that (4) NVMe SSDs were directly connected to the PCIe lanes of each of the two Intel Xeon processors used in the system. The system, running CentOS 7 Linux and MD software RAID confiugred (2) RAID 5 software RAID arrays with each array using the NVMe SSDs on just one processor socket. The result was a system with four RAID 5 arrays, two of which included four NVMe SSDs and the other two used eight SATA SSDs each.
Performance on Linux was impressive with on target results with fio doing 4kB random reads reaching 3M IOPS. 64k random reads were done at almost 300k IOPS, or more than 18GBps. Details are available here. The ability to share that performance however was limited by NFS. 64kB random reads were still shared at almost 10GBps but small random reads barely exceeded 10% of the 3M IOPS achievable ON the system.
Testing then moved to Windows Server 2012 R2 on the same hardware. Testing with IOmeter “on target”, performance was good at most block sizes, exceeding the SATA only results of the all-SATA SR-71mach5 SpeedServer at all block sizes larger than 4kB, but fell increasingly further behind the Linux results as block size was decreased. The result of 1.2M IOPS for 4kB random reads was lower than the all-SATA result and only 42% of comparable performance on Linux.
Testing with Windows Server 2012 R2 as a file server using SMB3 was even more disappointing. At 64kB blocks, random write performance and random read/write performance was 35% and 55%, respectively, better than all- SATA performance. Random reads at 32kB and 64kB block sizes were about 10% faster than all-SATA results. All other test performed slower than the all-SATA equivalent. ION’s observation was that processor performance in the SR-71mach5+ server was the limiting factor. With smaller block sizes, especially as the number of workers and the number of outstanding I/Os were increased, processor utilization in the server approached 100%.
Conversations with Intel suggested that manual tuning of processor affinity for storage and NIC drivers might offer significant performance benefits, but the practicality of that in a production environment seemed questionable.
2017 promises a number of technology “game changers” which will all likely have an impact on the current situation. Much larger NVMe SSDs based on 3D NAND technology will become available. Even more interestingly, NVMe SSDs based on 3D XPoint technology, like Intel’s Optane products, will be much larger and much faster than the NAND products available in late 2016. Furthermore, Intel expects to deliver significant improvements in their RSTe software RAID solutions. And finally, Broadcom’s Avago division, formerly LSI Logic, expects to deliver NVMe RAID hardware. That approach will obviously add latency to the NVMe performance, but by offloading most of the work from the system processors, an even higher performance system should be attainable.
For now, it would seem, that the primary application of NVMe SSDs in servers is as single-drive read caches or in RAID 1 as read/write caches. Additionally, with 2TB NVMe SSDs available, a 2TB RAID 1 array of NVMe SSDs could deliver a reasonable amount of very high performance primary storage.