SSD Terminology Overview Part 2 - NAND Technologies

A brief overview of some important SSD terminology - SLC, MLC, TLC, QLC, lithography, and 3D-NAND.

Feb 17, 2021 - Benjamin Wachman
Tags: #Terminology #Overview #SSD #NAND #SLC #MLC #TLC #QLC #3D-NAND

Before I publish my first review of a QLC-based SSD I want to take a step back and try to explain some of the SSD related terminology that I’ve been casually throwing around in my articles. Even though I’m only trying to cover some basics and am glossing over a lot of finer detail there’s a lot to cover here, so buckle up, adjust your eyeballs, and put your thinking caps on.

NAND Flash

The media on which SSDs store their data is known as NAND flash. At a super basic level, NAND is an array transistors divided into units known as cells which are used to store binary data. By creating immense blocks of cells, meaningful amounts of data can be stored. For example, a NAND die might have a capacity of 256Gb (gigabit), yielding a capacity of 32GB. You could then put 16 of these onto a PCB along with a controller and have either a 512GB, 500GB, or 480GB SSD depending on the amount of overprovisioning the drive had. Since the array is built entirely out of circuitry with no moving parts all accesses, whether they be reads or writes, can occur very quickly with just an electrical impulse. Ideally, a controller will try distribute reads and writes to as many flash chips as possible to maximize how many parallelization, further enhancing performance. Technically, the NAND is read in pages and erased (before either a modify or write) in blocks, which is also handled by the controller, but that’s far deeper down the well than I intend to go today.

SLC

When NAND flash first appeared in SSDs, it was of the Single Level Cell (SLC) variety. Each cell in the array held 1 bit of data, either a 0 or a 1, represented by either a high voltage or a low voltage – 2^1 voltage states (it’ll make more sense in a second as to why I’m stating it this way). Due to the simplicity of reading or writing one of two voltage states to a cell, both reads and writes are quite fast. Also, since the voltage states are relatively far from each other, even if there’s a bit of voltage drift over time, the two values can still be easily distinguished from each other which is at least part of the puzzle as to why SLC NAND has high endurance as well – on the order of 100,000 program erase cycles per cell. This also means that the controller in charge of writing data to the NAND can be relatively simple as it doesn’t have to include complex error correction since wearing out the flash isn’t going to happen quickly. The controllers do a lot more than just ECC, but that’s a topic for another day. The downside to this speed and endurance is density. Particularly due to the large lithography of the NAND at the time and the fact that 3D stacking of NAND into multiple layers was still many years away, SLC was very expensive.

MLC

Enter the Multi Level Cell (MLC). MLC improves on SLC by doubling how much data is stored per cell to two bits. Put another way, this represents a 100% increase in storage density. To keep track of two bits of data we now need 2^2 (aka 4) voltage states to track whether both the first bit and second bit are a 1 or a 0. As there are now more voltage states to account for, the controller needs to be a bit more careful when writing to each cell to ensure that the correct voltage level is written to the cell. This slows down write operations to some degree and the lower amount of acceptable voltage drift before data starts getting corrupted reduces endurance to the neighborhood of 10,000 program erase cycles. The huge decrease in cost of MLC and its relatively minor tradeoffs lead to MLC being the technology that allowed SSDs to start penetrating the storage industry for enthusiasts or those in-the-know with deeper pockets. For what it is worth, back in 2009 my first SSD, an 80GB Intel X25-M G2 was an MLC SSD.

Tangent 1: Transistor Lithography and NAND Flash

The first of three tangents which I’ll keep brief, reducing transistor size generally increases how many cells can fit in a unit area of NAND. As SSDs became more popular, there was significant demand for drives at increasingly lower price points. Unlike most other computer parts we think about where decrasing transistor lithography is mostly seen as a win-win with higher performance AND higher costs, with NAND it is a tradeoff. As NAND manufacturing matured, costs were incrementally reduced by shrinking the size of the transistors that made up the arrays of cells at the cost of lower endurance (since the cells are physically smaller they’re less durable) and lower performance. Different NAND fabs used different lithographies, but Intel’s progression for MLC was from 50nm to 34nm to 25nm to 20nm to 16nm. By the end of that progression, the 16nm MLC had significantly less endurance than the original 50nm but was on the order of 10x as dense. Since manufacturing capacity for a given fab is relatively constrained to a fixed number of wafers, a 10x density drastically increases the amount of storage the fab can output in a given amount of time.

TLC

At the same time as smaller lithography was helping reduce the cost of flash, Triple Level Cell (TLC) NAND came to the market. As the name suggests, TLC stores 3-bits per cell which represents a 50% increase in density compared to MLC. Now, 50% is still a considerable increase for TLC, but it is a far cry from the 100% increase between SLC and MLC. Since we’re basically just looking at ratios, we’ll see the rate of improvement continue to deteriorate from generation to generation. To store three bits per cell we now need to keep track of 8 (2^3) voltage states. As the number of states has doubled, the downsides to storing more bits per cell are also significantly more pronounced. TLC has significantly lower write performance and endurance (about 3,000 program erase cycles) than MLC.

Tangent 2: SLC Caching

The decrease in performance associated with TLC lead to the introduction of SLC caching. Essentially, the SSD’s controller treats a portion of the TLC array as SLC, only writing 1-bit per cell. This allows that portion of NAND to be written to much more quickly than if it were having 3-bits per cell being written to it. As nifty as this is, there’s some tradeoffs. You can’t run the whole drive as SLC, otherwise you’d end up with just 1/3 of the advertised capacity, so different mechanisms were used to determine how much TLC to treat as SLC. The details aren’t super important, but basically if the SLC cache runs out and the drive still needs to perform more writes there’s two fallback mechanisms. One is to simply stop writing to the SLC cache and just start writing directly to the TLC as TLC. This can cut write performance to a fraction of the cached speed, with some drives reaching 70MB/s or slower, though the speed is usually relatively constant until the drive has some idle time to write the data in SLC more permanently to TLC. The other mechanism can produce arguably worse results. Some drives will ONLY write to the SLC cache. Once the cache is full it will have to pause, flush some SLC to TLC, then accept more writes at SLC speeds. This causes performance to roller coaster up and down erratically and, depending on how much data needs to be written can actually take significantly more time than just writing directly to the TLC. There’s also an endurance component involved in using SLC caches and there’s a lot more nuance to how controllers manage SLC cache than this, but I’ll skip over the rest of this for now.

Tangent 3: 3D NAND

Until the advent of 3D NAND, which was shortly after TLC was introduced, all NAND was 2D also called planar. The NAND cells were arranged in a 2D plane with only an X-axis and a Y-axis. 3D NAND, first brought to the consumer market by Samsung branded as V-NAND, changed this by stacking multiple layers of NAND on top of each other and then interconnecting them. By adding a vertical component, more capacity could be manufactured per wafer WITHOUT further shrinking lithography or cramming more bits per cell. In-fact, Samsung transitioned back up from ~16/19nm to 40nm when they introduced V-NAND to get back some of the performance and endurance they’d lost on their way down when they were shrinking transistor size. Now, lithography is mostly not discussed and to the best of my knowledge it isn’t a dial that the NAND manufacturers are adjusting much. Instead, the race now is to increase the number of layers that can be stacked before issues like manufacturing difficulty and signal degradation become an issue. Unlike the mechanisms of decreasing transistor size or increasing bits-per-cell, there aren’t pronounced end-user facing detractors for having more layers. A potential indirect effect would be that as more layers are added, you need less wafer surface area to produce a die of the same capacity. This encourages making NAND dies of increasingly larger capacities. As you need fewer 512Gbit dies to make an SSD of the same capacity compared to an SSD using 256Gbit dies, this leads to decreased parallelization at a given capacity. After a certain point, there are so few dies that performance suffers, and usually that capacity is just dropped from the lineup. This is part of why 120-128GB SSDs are becoming less common and why 240-256GB SSDs may begin to become less common in the future.

3D NAND breathed fresh life into TLC. Effectively a reset for several of the less desireable attributes of the technology, allowing it to move up-market into higher-end and higher performing drives.

QLC

At some point, as the balance of benefit vs detriment tips when continuing down a trend without some sort of paradigm shift. At a certain point the term “advancement” may only apply loosely.

In many ways, this is the case for Quad Level Cell (QLC) NAND. Like MLC and TLC NAND before it, QLC NAND stores one more bit per cell compared to its immediate predecessor in exchange for significantly lower speeds and write endurance. Both of these compromises should be fine for most consumer workloads, just note that for some situations you’ll definitely want to avoid QLC. While TLC received a mid-life boost from going from 2D to 3D NAND, QLC NAND has started out as 3D-only, and out of the gate people are pretty underwhelmed by it.

Unfortunately, due to the interplay of the various tradeoffs, in a lot of cases QLC may not be worth the cost savings. QLC NAND holds 4 bits per cell as opposed to 3 bits per cell for TLC. This means that QLC holds a third (33%) more data than TLC. All other factors being equal, this means that drives replacing TLC with QLC have a maximum of 33% cost savings on the bill of materials (BOM) for just the NAND. Admittedly, the NAND is one of the larger material costs in producing an SSD, particularly as capacity increases, but this 33% advantage already limits the potential upside of drives that use QLC. In order to store 4 bits per cell, there must now be 16 (2^4) voltage states. This leaves exceedingly small margins between voltage levels which takes time and care to implement correctly and avoid data corruption.

QLC is definitely here to stay despite the fuss the enthusiast community is making about it. One of the ways that drive manufacturers have been counteracting the negatives of QLC is by only making drives that have a TON of it. QLC may have write speeds, but an 8TB QLC-based drive is going to have a huge SLC cache that should allow most consumer write-based workloads to happen entirely in SLC. As for endurance, even if the drive is only warrantied for 225 complete drive fills, that’s still 1800TB of data written. Most casual consumers, even those with this much data, won’t be writing that much data within the expected lifetime of the drive. Then again, my example 8TB QLC drive costs an eye-watering $1300, so it doesn’t really fit with the kind of purchase an average consumer would make, but the hyperbole at least explains the point. For those looking for ultra-high capacity drives, QLC enables that usecase. Prior to QLC, the largest SSDs you could get in the consumer space were 4TB drives. Now, if you’re willing to pay for it you can get 8TB drives, and at significantly lower costs than if trying to source gray market enterprise drives. In the end, QLC is the same as so many other things in tech, as long as the price is right and you’re using the product in a way that makes sense based on its capabilities, even less-than-perfect products can have great potential in the market.

<-- Back to Blogs