eMMC/SSD A Brief Introduction
I find it useful to understand, that from the low-level point of view, an eMMC and SSD are essentially
the same. An SSD is basically a huge eMMC, but where the NAND chips are used in parallel, similar to
a Raid-0 configuration, but with an added DRAM cache buffer and a SATA interface operating at 5V.
So, apart from the more advanced microcontroller, the wear-leveling etc. works in the same way.
The most important and relevant documents are those of the JEDEC
However, our device conforms to (JESD84) v4.41 and not v4.51, AFAIK.
"JEDEC: Embedded MultiMediaCard(eMMC) Product Standard...
"JEDEC: Embedded MultiMediaCard(eMMC) Electrical Standard
"eMMC v4.41 and v4.5
" (JDEC presentation by Victor Tsai)
DataLight on Bad Block Management
"Bad block management (BBM) is a critical component of NAND flash drivers to
improve the reliability and endurance of the flash. NAND is shipped from the
factory with 'mostly good' cells, meaning there are some cells that are
non-functional even when the flash is new. Blocks can also go bad over time,
causing loss of data stored in the flash memory or even a bricked device."
NAND Flash Longvity
"Flash life is limited to the number of erase cycles for which your part is
rated. By distributing write/erase cycles evenly throughout the flash, a
properly executed wear-leveling algorithm can more than double the life of
your product. FlashFX Pro uses both static and dynamic wear-leveling to
achieve 133% longer life than MSFlash, the flash manager found in Windows CE
and WindowsMobile. The charts below show a test comparison between a FlashFX
Pro disk and one using MSFlash. Flash disks read and write data in a grid of
erase blocks. Once a block reaches its maximum rated erase count, the flash is
at risk of lost or corrupted data, becoming a "broken" device. For this test,
we recorded the erase counts by block and applied a heat map ranging from
white (lowest use) to green (medium use), to black (highest use). As the
heatmap shows, the MSFlash disk contains many blocks that are well over their
rated lifespan, while other blocks are barely used. The FlashFX Pro disk shows
what happens when proper wear-leveling algorithms are employed. All blocks are
evenly worn and within a tight range of erase counts, making your handheld
last more than twice as long, and protecting the reputation for durability
you've worked hard for."
"Flash parts are commonly divided into partitions, which allows multiple
operations to occur simultaneously (erasing one partition while reading from
another). Partitions are further divided into blocks (commonly 64KB or 128KB
in size). The only Write operation permitted on a flash memory device is to
change a bit from a one to a zero. If the reverse operation is needed, then
the block must be erased (to reset all bits to the one state)
. NOR flash
memory can typically be programmed a byte at a time, whereas NAND flash memory
must be programmed in multi-byte bursts (typically, 512 bytes)"
Basic Wear Leveling
MLC devices typically support fewer than 10,000 program/erase (PE) cycles. So
if you erased and reprogrammed a block every minute, you would exceed the 10K
cycling limit in just 7 days!
60 × 24 × 7 = 10,080 (cycles/block)
So rather than cycling (re-programming) the same block, wear-leveling moves
data around to other blocks so that blocks are more evenly cycled.
An 8GB eMMC MLC-based device
This device has 4096 independent blocks. So if we took the previous example
and distributed the cycles over all 4,096 blocks, each block would have been
programmed fewer than three times. (10,000/4096 = 2.44 [cycles/block/per
week]) (versus the 10,800 cycles when you cycle the same block)
So if we cycle some block once every minute, we have:
1 [cycles/min] × 60 [min/hr] × 24 [hr/day] × 365 [day/year] = 525,600 [cycles/year]
But with the new block cycling restraint (mechanism), we have that each data block:
Max data block-cycles =
4096 [blocks] × 10,000 [cycles/block] = 40,960,000 [cycles]
So that the total time to use up all cycles is:
40,960,000 [cycles] / 525,600 [cycles/year] = 77.9 [years]
So if we have perfect wear leveling on a 4,096 block device, we could could
erase and program a block every minute, every day, for 77 years.
[Examples taken from Cooke WinHEC presentation
However, this is far from what can be expected. For example, the guaranteed
cycle count may apply only to block zero (as is the case with TSOP NAND
devices). And accrding to WikiPedia, "MLC NAND flash used to be rated at about
5–10K cycles (Samsung K9G8G08U0M) but is now typically 1–3K cycles"
According to THIS
very informative page, "34nm MLC NAND is good for 5,000
write cycles, while 25nm MLC NAND lasts for only 3,000 write cycles."
Then there is the possibility of "read disturb
", The method used to read NAND
flash memory can cause nearby cells to change over time if the surrounding
cells of the block are not rewritten. This is generally on the order of ~100K
reads without a rewrite of those cells. The error does not appear when reading
the original cell, but shows up when finally reading one of the surrounding
Then there is Write Amplification
(WA): [for SSD but also applicable to us]
"An undesirable phenomenon associated with flash memory and solid-state drives
(SSDs) where the actual amount of physical information written is a multiple
of the logical amount intended to be written. Because flash memory must be
erased before it can be rewritten, the process to perform these operations
results in moving (or rewriting) user data and metadata more than once. This
multiplying effect increases the number of writes required over the life of
the SSD which shortens the time it can reliably operate. The increased writes
also consume bandwidth to the flash memory which mainly reduces random write
performance to the SSD."
Write amplification is typically measured by the ratio of writes coming from
the host system and the writes going to the flash memory. A lower write
amplification is more desirable, as it corresponds to a reduced number of P/E
cycles on the flash memory and thereby to an increased NAND life,
Then there is Over-provisioning
(OP), which is the difference between the
physical capacity of the flash memory and the logical capacity presented through
the operating system as available for the user. During the garbage collection,
wear-leveling, and bad block mapping operations on the SSD, the additional space
from over-provisioning helps lower the write amplification when the controller
writes to the flash memory.
MLC = Multi Level Cell
: NAND stores four states per memory cell and enables two bits programmed/read per memory cell
SLC = Single Level Cell
: NAND stores two states per memory cell and enables one bit programmed/read per memory cellenables cell
What does all this mean?
Well, it means a lot! Here are just a few things:
- We have to use host-based disk encryption to ensure we don't leave private data on eMMC/SSD.
(Re-formatting and erasure just doesn't work, as ensured by internal wear-leveling, unless
secure erase is enabled permanently. But this is not yet supported in older JEDEC!)
- We should always choose the largest available memory device to maximize life.
- We should have the source code and eMMC specifications to verify device specifications
and the proper handling and quick resolution of future bugs.