FORUMS
Remove All Ads from XDA

eMMC sudden death research

53 posts
Thanks Meter: 262
 
By Oranav, Member on 12th January 2013, 06:26 PM
Post Reply Email Thread
6th February 2013, 08:20 PM |#51  
OP Member
Thanks Meter: 262
 
More
Quote:
Originally Posted by Entropy512

SDS seems to be a case of some function potentially returning 0 (maybe due to integer overflow? The statistics of the issue and how it suddenly "spiked" after a number of months of usage screams overflow to me), and that 0 then being treated as data instead of an error, corrupting data structures right and left.

It doesn't seem like an integer overflow, at least not a straightforward one.
This is the function they patch:
Code:
int __fastcall f_to_be_patched_function(_DWORD *out, int val)
{
  int ret; // [email protected]

  ret = 0;
  if ( *off_5FC60 == val )
  {
    *out = off_5FC60;
    return 1;
  }
  if ( *off_5FC64 == val )
  {
    *out = off_5FC64;
    return 1;
  }
  *out = 0;
  return ret;
}
Both off_5FC60 and off_5FC64 point to some FTL related contexts.
This is the wrapper function they write to the RAM:
Code:
void __fastcall f_new_function_by_patch(_DWORD *out, int val)
{
  if ( !f_to_be_patched_function(out, val) )
  {
    while ( 1 )
      ;
  }
}
The BL instruction that is being patched used to call the old function (f_to_be_patched_function), without checking its return value, hence the bug.
What's so strange about it is that "f_to_be_patched_function" is called from many other locations in the code, without checking the return value! So the bug exists in other locations as well.
Either the other locations don't cause internal metadata corruption, or they are just so rare that Samsung didn't even bother to patch them.

Quote:
Originally Posted by Entropy512

Unfortunately, as we don't have a security-dropped IBL that is signed for Exynos 4210, there is no SDCard or USB recovery available for 4210 devices like there is for Exynos3 and for 4412 devices. If you kill the bootloaders, JTAG is it.

Wait, so we do have a way to boot Exynos 4412 devices (Galaxy S3) from the mmc1 bus?
If so, why isn't SDS fixable?
The Following 6 Users Say Thank You to Oranav For This Useful Post: [ View ] Gift Oranav Ad-Free
7th February 2013, 02:27 AM |#52  
E:V:A's Avatar
Inactive Recognized Developer
Flag -∇ϕ
Thanks Meter: 2,217
 
More
Quote:
Originally Posted by Oranav

I think it is possible to update the firmware.
Except for CMD62, there are 2 more vendor specific commands (CMD60 and CMD64). I think I saw somewhere a command which updates the firmware on the NAND; I'm not sure now but I'll check it later.

There is no CMD64, because CMDs go from 0-63. CMD's 60-63 are
"Reserved for Manufacturers" and belong to the reserved Class-11.

But I agree, there have to be a way to update eMMC firmware. Although Entropy may be right about factory programming, I don't think this "interface" would only be available at that time. I have a strong belief that it should be possible to update. We know all the eMMC pins, and we know the basic interface and the basic technology within, but we don't know the firmware! Samsung's SSD firmwares can certainly be updated!

(We could look for the firmware in there.)


Quote:
Originally Posted by E:V:A

I'll just start filling in this myself...

a) what exact devices are having problems?
- GT-I9300/3 with 16 GB MoviNAND
b) What exact eMMC cards do they have? (Samsung part-no/name)
c) Is that leaked datasheet in OP, for any of those in (b)?
- <unknown>
d) What die size "technology" are these eMMC's using? (25 nm, 34 nm or other?)
e) Do you know anything about how people with eMMC problems use their devices?
f) The Linux kernel version for problematic devices...
Code:
Model:          Samsung GT-I9300
Chip:           KMVTU000LM-B503
Part No:        1108-000424 ?
Size:           eMMC(16GB)+MDDR(64MB) 
eMMC ID:        VTU00M 
eMMC FW Rev.:   0xF1

Quote:
Originally Posted by Oranav

It doesn't seem like an integer overflow, at least not a straightforward one. This is the function they patch:
...
Wait, so we do have a way to boot Exynos 4412 devices (Galaxy S3) from the mmc1 bus? If so, why isn't SDS fixable?

Unless you can somehow provide something more substantial than that reversed pseudo-C stuff, I cannot help much. (Or if you can post that module so that we can look for ourselves!)

We can certainly unbrick anything supported by Adam Outlers/Rebellos/Ralekdevs unbrickable mods. They also have the Boot from SD card mod. In theory we should be able to unbrick I9100 in the same way, but no one want to waste more energy on that PoS device! (I know, because I have one...with the VYL00M brick bug!)
The Following 3 Users Say Thank You to E:V:A For This Useful Post: [ View ] Gift E:V:A Ad-Free
7th February 2013, 03:00 AM |#53  
E:V:A's Avatar
Inactive Recognized Developer
Flag -∇ϕ
Thanks Meter: 2,217
 
More
In case someone else like to join in on this, here are some eMMC basics for
reference. (That I cut and pasted from various sources.)

Also, I found it useful to understand, that from the low-level point of view, an eMMC and SSD are
essentially the same. An SSD is basically a huge eMMC, but where the NAND chips are used in
parallel with an added DRAM cache buffer and a SATA interface operating at 5V. So the
wear-leveling etc. works in the same way, eventhough the microcontroller in an SSD is much
more advanced. (I.e. For a Samsung SSD 840 Pro, there is an 3-core Cortex R-4 running @
300MHz!) Thus, any problem you encounter in the FTL of an eMMC, you will likeely also have
in an SSD if using similar NANDs, and vice versa.


The most important and relevant documents are those of the JEDEC standard.
However, our device conforms to (JESD84) v4.41 and not v4.51, AFAIK.
"JEDEC: Embedded MultiMediaCard(eMMC) Product Standard..." (JESD84-A441)
"JEDEC: Embedded MultiMediaCard(eMMC) Electrical Standard" (JESD84-B451)
"eMMC v4.41 and v4.5" (JDEC presentation by Victor Tsai)


2013-02-09: ORIGINAL POST MOVED!

As was pointed out in the subsequent post, this is somewhat OT,
so I decided that a better home for it would be HERE.
Attached Thumbnails
Click image for larger version

Name:	flash.jpg
Views:	103138
Size:	54.0 KB
ID:	1709944   Click image for larger version

Name:	Flash%20memory?action=AttachFile&do=get&target=flash+file+system+arch.gif
Views:	102794
Size:	7.3 KB
ID:	1709947   Click image for larger version

Name:	a6cc11c498225275abcff1fefdd53de6.png
Views:	102686
Size:	2.1 KB
ID:	1709974   Click image for larger version

Name:	1e20844b6c83913c2a9bb6c376e01995.png
Views:	102482
Size:	1.7 KB
ID:	1709975  
The Following 11 Users Say Thank You to E:V:A For This Useful Post: [ View ] Gift E:V:A Ad-Free
7th February 2013, 03:02 AM |#54  
DualJoe's Avatar
Senior Member
de
Thanks Meter: 869
 
More
Quote:
Originally Posted by Oranav

it's easy to find exactly where the firmware is stored on the NAND.

Correct me if i'm wrong... the firmware NAND you're talking about is the same like the eMMC (not a separate one), right?
If so, can you provide some signature bytes (maybe the first 32bytes) and the firmware length so we can dump the whole NAND with a Riffbox (AdamOutler?) and extract the firmware ourself?
7th February 2013, 05:49 AM |#55  
AndreiLux's Avatar
Senior Member
Thanks Meter: 14,750
 
Donate to Me
More
Re: eMMC sudden death research
EVA, what exactly are you trying to achieve? Seems more off-topic than actual on-topic discussion to me.

And SSDs have absolutely nothing in common with eMMC chips, there's a wholly independent controller on SSDs which simply doesn't exist in embedded devices. The firmwares we're talking about here are not even in the same device category.
The Following 5 Users Say Thank You to AndreiLux For This Useful Post: [ View ] Gift AndreiLux Ad-Free
7th February 2013, 02:24 PM |#56  
E:V:A's Avatar
Inactive Recognized Developer
Flag -∇ϕ
Thanks Meter: 2,217
 
More
@AndreiLux: Yes, you're right, that was a bit over ambitious OT. But I'm also preventing more OT by people who will eventually post speculations about wear-leveling, and giving them the document and links to go research the topic by themselves. Increasing public knowledge will hopefully up the level speed of this discussion.

Also, the above can help explain why there are often large "empy" (non-user) partitions on Samsung phones. It could be that these act as moving "holes" to improve eMMC life. Thus if we remove them or keep our eMMC maxed out, we'll get problems much sooner than someone who has lots of space left.

But more importantly, I'm showing you that with a P/E of ~3000, it could very well be easily reached by any excessive writes, especially with eMMC firmware bugs. Also, I completely disagree that "SSDs have absolutely nothing in common with eMMC chips", they certainly do have much in common, as I stated above. An SSD basically consist of N x M Raid-0 like array of MLC NAND's, and each of those conforms to the exact same criteria as our eMMC in question. At the low-level the individual wear-leveling must be the same or very similar. (Mind you, I'm ignoring the SATA "controller" + cache memory.)

I could of course be completely wrong, but then I suggest that you provide some backup of your statement...

---

We should make a comparison of the "Smart Reports" from a working and a problematic eMMC. If these are very different, we could learn more...

Could someone dump such a report?
The Following 4 Users Say Thank You to E:V:A For This Useful Post: [ View ] Gift E:V:A Ad-Free
9th February 2013, 09:12 PM |#57  
E:V:A's Avatar
Inactive Recognized Developer
Flag -∇ϕ
Thanks Meter: 2,217
 
More
Not sure if this helps, but if there is any dependence on kernel version, we might figure it from this list of kernel emmc patches...

Code:
2.6.36 
        • ERASE, SECURE ERASE, TRIM, and SECURE TRIM operations (JEDEC 4.4)
        • mmc_block: Discard and secure discard support
        • SD-combo (IO+mem) support
        • Performance tests
2.6.37 
        • New sdhci-pxa driver for Marvell SoCs
        • MMC 4.4 DDR support
        • sdhci-pltfm: Platform driver for imx35/51
        • USB SD host controller (USHC) driver
2.6.39 
        • mxs-mmc: MMC host driver for i.MX23/28
3.0 
        • MMC CMD+ACMD passthrough IOCTL reliable write support
        • MMC boot partition support
        • New VUB300 USB-to-SD/SDIO/MMC driver
        • SD: Support for signal voltage switch procedure
3.2 
        • Enabled HPI for MMC cards that support this feature
        • Cache control for e·MMC 4.5 devices
        • e·MMC hardware reset support
        • Random fault injection
        • General-purpose MMC partition support (JEDEC 4.4)
        • SDHCI: e·MMC hardware reset support
        • sdhci-pci: Runtime PM support
        • mmc-test: e·MMC hardware reset test
In the meantime I'm waiting with great expectations on the code for that kernel module...
9th February 2013, 09:32 PM |#58  
Senior Member
Nowhere
Thanks Meter: 42
 
More
I am sorry yes I know I am not spouse to post here as I am not a developer but I thought I ll share my little finding about SDS,

Users having SDS always confirming with the Red sensor LED staying on, As I have been a bit worried about SDS (4.1.1 UK BTU no update as of now) Everytime I rebooted (or started from switched off) I can see that bootloader checks HW as this red led comes on for about 0.6second and boot sequence continuous. But now I couldnt wait any longer for the BTU OTA update and now updated to N7100XXDMA6 N7100OJVDMA2 TURKEY rom and I can clearly see that during the startup process Red LED does not come on or HW is possibly not checked ! I hope I dont sound daft !! and SDS can be malfunctioning of Sensor board ?
10th February 2013, 02:30 PM |#59  
E:V:A's Avatar
Inactive Recognized Developer
Flag -∇ϕ
Thanks Meter: 2,217
 
More
The possibility of eMMC firmware updates is determined by "Update_Disable" bit-0
of the FW_CONFIG field, which is located in CSD-slice [169] of the CSD register.
11th February 2013, 04:49 PM |#60  
Senior Recognized Developer
Flag Owego, NY
Thanks Meter: 25,477
 
Donate to Me
More
Quote:
Originally Posted by Oranav

It doesn't seem like an integer overflow, at least not a straightforward one.
This is the function they patch:

Code:
int __fastcall f_to_be_patched_function(_DWORD *out, int val)
{
  int ret; // [email protected]

  ret = 0;
  if ( *off_5FC60 == val )
  {
    *out = off_5FC60;
    return 1;
  }
  if ( *off_5FC64 == val )
  {
    *out = off_5FC64;
    return 1;
  }
  *out = 0;
  return ret;
}
Both off_5FC60 and off_5FC64 point to some FTL related contexts.
This is the wrapper function they write to the RAM:
Code:
void __fastcall f_new_function_by_patch(_DWORD *out, int val)
{
  if ( !f_to_be_patched_function(out, val) )
  {
    while ( 1 )
      ;
  }
}
The BL instruction that is being patched used to call the old function (f_to_be_patched_function), without checking its return value, hence the bug.
What's so strange about it is that "f_to_be_patched_function" is called from many other locations in the code, without checking the return value! So the bug exists in other locations as well.
Either the other locations don't cause internal metadata corruption, or they are just so rare that Samsung didn't even bother to patch them.

So it sorta looks like in the original firmware, it's (bear with me, this is really fugly pseudocode)
Code:
if( !some_sanity_check_here())
{
  crater_the_chip();
}
(where, obviously, crater_the_chip() is not actually a function, but it is what happens if that sanity check ever fails when called from that part of the code...)

Now it's
Code:
if( !some_sanity_check_here())
{
  hang_chip_until_reset();
}
Quote:

Wait, so we do have a way to boot Exynos 4412 devices (Galaxy S3) from the mmc1 bus?
If so, why isn't SDS fixable?

It's possible to boot from the MMC1 bus.

SDS is still not fixable since at this point, the internal eMMC is hosed at a very low level - unless we can figure out how to do a full reset/wipe of the eMMC chip from the main eMMC interface (we know that this is theoretically possible as Ken Sumrall of Google had access to such a procedure but was not able to provide us the info on it due to NDAs, but do not have any examples of performing this procedure due to aforementioned NDAs). Same reason Superbricked devices can't even be repaired using JTAG.

Some SDSed devices behaved similarly to how many Superbricked devices behaved - parts of the chip worked OK (including the bootloader), others were hosed. Quite a few people who suffered from SDS were able to boot into download mode but not write to any part of the chip.
The Following User Says Thank You to Entropy512 For This Useful Post: [ View ]
11th February 2013, 07:39 PM |#61  
Senior Member
Thanks Meter: 22
 
More
Hi, my S3 bricked and even a JTAG could not save it. Yes, the eMMC was bricked at the very low level.
Samsung replaced my board and i checked it is now running 0xf7 revision, the sammy engineer also told me this is a safe fw immune to that superbrick. After further questioning and hardcore probing - the engineer revealed that the eMMC fw of 0xf1 has a bug in its wear leveling algorithm, which causes the sector containing the BIOS to be damaged, and this fw will fix that.
Will a dump of my firmware help you guys?
I tried a df,dmesg,lsof and other commands and could not find the mount point for the eMMC, so if you can tell me how to dump the fw, i will glady do it and hopefully it should help you.

PS: other threads on xda say samsung replaced their boards with the same defective ones; it seems that in my country, samsung actually replaces it with a newer revision.
The Following 3 Users Say Thank You to exge For This Useful Post: [ View ] Gift exge Ad-Free
Post Reply Subscribe to Thread
Previous Thread Next Thread
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes