Storage System

Been quite some time since the last post, life has been busy, thanks to the NAND flash chips from Toshiba & Samsung. Ironic enough their seemingly naive data sheets introduce NAND as an angelic technology. Simple protocols, even more simple hardware interface. A totally reasonable requirement placed on driver to fix one bit errors and detect two bit errors (which is not supposed to happen but still for some unknown reason vendors mention this requirement too, would be ecstatic to know why). A touch of complexity is felt only when bad blocks are encountered, which is totally fair considering the cost effectiveness of NANDs.

My initial impression of NAND being a fairly simple fixed hassle free storage media was progressively crushed to shreds during the last one year of NAND torments. Have worked only on SLC NANDs from Toshiba & Samsung, they are extensively used on mobile handset platforms. So, MLC is an unknown inferno to me. Hopefully the below mentioned points might help the posterity from enduring the crisis. Always remember to religiously follow the Data sheet (henceforth referred to as “the book”) for NAND salvation.

Keep innovative operation sequences for hobby projects.

Do not try stuff like NAND reset command during NAND busy unless the book clearly explains the behavior of its effect on read, program and erase operation with a CLEAR timing diagram.
Do NOT use read back check to detect bad blocks unless that is mentioned as one of the methods in the book
MORAL : Follow ONLY what is written in the book, do not infer or even worse assume.

Read wear leveling cannot prevent bit errors nor can erase refresh solve bit errors.

I have managed to induce bit errors on Samsung NAND flash when partial page writes are executed beyond the maximum number specified for a page and also by executing multiple partial page reads. Interestingly, even after continuous block erases, the single bit read errors refused to disappear.
Any deviation from the strict protocol mentioned in the book can result in manifestation of strange symptoms.
BTW: A deterministic read wear count is a myth unless it is mentioned in the book.
MORAL : Symptoms and root causes never have 1:1 ratio.

Never go back and correct mistakes within a block

Samsung NAND flashes “prohibits” going back to a lesser numbered page in a block and reprogramming it (for Eg: Do not program page 10 after programming page 20 within a block) the effect of such an operation is not documented so you do not know the symptoms which can incarnate in any form.
Go ahead and question the logic of any file system which does random page programming in a block to mark dirty pages!.
MORAL:Do not question what the book says, just blindly follow it.

Embedded Sense

System Software

NAND flash musings