SSDs in servers deliver performance, whether measured in IOPS or GBps or latency, that is just plain impossible with disks. SSD capacity is now greater than that of performance disks. Price is the factor that still keeps many organizations away from deploying servers with SSD storage when the improved performance is not a significant benefit. The better reliability of SSD might be more important.
We have written in this blog before about the performance benefits of SSD compared to rotating media. And we have even written about the reliability benefits. Technology continues to evolve and the difference now is that the capacity of NAND Flash solid state storage has actually exceeded rotating disks in all categories except perhaps the enterprise capacity product lines.
There is ample evidence of the higher reliability of SSDs than hard disk drives, HDDs – just do a quick search online. The reality though, is that anything, HDD or SSD, can fail, so every application where losing the data would be a bad thing uses RAID. RAID, or ZFS, is a great thing when it comes to adding redundancy to your storage array. It is the standard way to store information for servers. RAID 0 actually has no redundancy and a failure of one unit comes at the expense of losing all data in the array, but every other RAID level includes the capability to “rebuild” the lost data when a drive fails by using the redundant data on the remaining drives. Rebuild, however, reveals a couple of weaknesses of HDDs.
Rebuild time for a good-sized array of HDDs can take DAYS. Rebuild time for a similar array of SSDs is closer to 1 hour. Either way, the array can return itself to full redundancy. During the many hours that it takes to rebuild an array of HDDs, performance is usually heavily impacted. The reason behind that is the physics that HDDs are based on. To rebuild a replaced drive, all data from the remaining drives must be read from beginning to end in order to write the new drive from beginning to end. If the array has no other work to do, HDDs are pretty good at big, long sequential reads and writes. As soon as a workload must be serviced, each request requires some or all of the disks to move their heads to another location on the disk platters. Once that request has been serviced, the rebuild may continue, moving the heads back to where they were. Each time that rebuild is interrupted to do real work, the seek delays incurred are many milliseconds long. Latency increases and overall performance for real work is heavily impacted. SSDs have no moving parts, so there is not a seek penalty. Performance will indeed suffer during the rebuild of an SSD array, but not nearly as much and only for the hour or so it takes to do the rebuild.
Uncorrectable Bit Error Rate (UBER) or Nonrecoverable Read Errors (NRE) is the other important factor that hurts HDDs in RAID environments. The UBER (or NRE) represents how often a disk will be unable to correctly return a requested sector. These values are reported in terms of undeliverable sectors per bits read. For enterprise capacity HDDs, UBER is 1 sectors per 10^15 bits read; for enterprise performance HDDs, it jumps to 1 sector per 10^16 bits read; Data Center SSDs are 10 times better than that with 1 sector per 10^17 bits read. With RAID 1 mirroring or RAID 5 striping with parity, encountering an UBER event results in a whole in the rebuilt data on the new drive. The point of RAID 6 is to minimize the likelihood of that. RAID 6 comes with costs as well, higher redundancy means less capacity for the same spend. RAID 6 write performance with the need to calculate and write two parity blocks suffers compared to RAID 5 with one parity block.
Is UBER something that you actually need to worry about? Consider a modest array of (10) 2.4TB performance HDDs would need to read almost 2×10^13 bits meaning that the probability is 0.2% during a single rebuild. That is not terrible but it is not great, either. What is the cost of one lost RAID stripe somewhere in the data which is being recovered? Well, if that were in the middle of a database, for example, it likely means the corruption of the database. And even with less sensitive data, the file system will certainly be impacted by the missing data. If the array is instead (10) 10 TB enterprise capacity drives, that probabilty of data loss jumps to more than 7% and RAID 6 becomes a must. If the array is (10) 2TB SSDs, the probability drops to 0.02% which is much more tolerable. And by the way, if RAID 5 rebuild times with disks might be measured in days, RAID 6 rebuild times with disks might be measured in weeks.
Yes. The price of a server with a given capacity of data center SSD storage is certainly going to be higher than designing the same capacity with hard disk drives. How much higher depends somewhat on understanding your data and the endurance demands it is going to place on the Flash storage. For some applications, the higher risk of losing data or much higher impact of performance degradation during rebuild are probably acceptable. For many applications, the higher reliability of SSDs and the much lower impact of SSD RAID events make that decision an easy one. The fact that the SSD RAID is also going to deliver “disk-impossible” performance is a benefit that no user or developer or CIO is going to complain about.
Just found an interesting read at
https://www.storagenewsletter.com/2019/03/14/how-reliable-are-ssds-backblaze/
discussing the relative reliability of SSDs and HDDs.
When configuring systems with SSDs, ion always recommends that the SSDs be “over-provisioned”, made to seem a bit smaller, because of the big impact that has on both endurance and random write performance over the life of the drive.
https://www.intel.com/content/www/us/en/products/docs/memory-storage/solid-state-drives/data-center-ssds/over-provisioning-nand-based-ssds-better-endurance-whitepaper.html
backs up that recommendation with both explanation and test data.