eMMC sudden death research

Search This thread

Oranav

Senior Member
Oct 9, 2010
53
265
As far is it seems right now, it isn't caused by flash wear or anything like that. It seems that it's caused by a bug which is triggered in a very specific case. Then, it causes the device to corrupt its inner structures or its firmware - I'm not sure which one yet.

The specific bug is that they don't check the return value of some function returning a pointer, which may be NULL. It then leads to a NULL pointer dereference which corrupts things.

So, as far as it seems currently, there is no negative effect of using an unpatched kernel (except for the risk of it suddenly dying, of course).


By the way, it's worthy to note that the firmware actually resides on the flash itself. There is a very small boot ROM (which is probably a mask ROM) that loads the firmware out of the NAND device.
Why am I mentioning this? It means that a bug in the firmware may actually corrupt the firmware itself, bricking the device.

Sent from my GT-I9300 using xda app-developers app
 

liamR

Senior Member
Feb 14, 2007
854
146
That is awesome research. Assuming that samsung just made a quick "fix" with the new kernels (and it does causing random freezes), Do you think that they can make a proper fix without side effects ?

Assuming they know all about it since SGS2 and it still effects SGS3 this makes samsung a terrible company.
 

Rob2222

Senior Member
Feb 18, 2008
413
306
@Oranav:
Do you know, if the fix is applied in download mode, too?
I assume that the download mode does _not_ load a kernel or recovery, so the following assumption would be, that in download mode the eMMC is not protected.
Could that be?

BR
Rob
 

Product F(RED)

Senior Member
Sep 6, 2010
9,883
2,105
Brooklyn, NY
@Oranav:
Do you know, if the fix is applied in download mode, too?
I assume that the download mode does _not_ load a kernel or recovery, so the following assumption would be, that in download mode the eMMC is not protected.
Could that be?

BR
Rob

You have to have a kernel. I'm sure it shares the recovery kernel since the recovery kernel is basically a backup/fail-safe kernel.
 

Rob2222

Senior Member
Feb 18, 2008
413
306
You have to have a kernel. I'm sure it shares the recovery kernel since the recovery kernel is basically a backup/fail-safe kernel.

I am not sure about this. From my understanding the (second?) bootloader already has eMMC and display driver. So there are enough parts already initialized to make the eMMC aviable for USB access. No real need to load the kernel for that.

If download mode would need kernel/recovery, it would not be aviable if you flash a wrong kernel/recovery. And if I remember right I've seen wrong kernel and wrong recovery flashs got repaired by just flashing the correct kernel/recovery, so download mode was still working.

BR
Rob
 

Product F(RED)

Senior Member
Sep 6, 2010
9,883
2,105
Brooklyn, NY
I am not sure about this. From my understanding the (second?) bootloader already has eMMC and display driver. So there are enough parts already initialized to make the eMMC aviable for USB access. No real need to load the kernel for that.

If download mode would need kernel/recovery, it would not be aviable if you flash a wrong kernel/recovery. And if I remember right I've seen wrong kernel and wrong recovery flashs got repaired by just flashing the correct kernel/recovery, so download mode was still working.

BR
Rob

You could be right but I know that recovery mode has its own separate kernel. That's why I thought maybe download mode shared it.

Sent from my GT-I9300 using Tapatalk 2
 

Oranav

Senior Member
Oct 9, 2010
53
265
Download mode has nothing in common with the recovery partition. It is implemented in sboot (the device's bootloader).
It has its own implementation of hardware drivers. If it doesn't patch the eMMC RAM, then it isn't safe!

However, I haven't checked it enough yet to conclude whether it's safe or not. Right now, I'd recommend anyone to avoid flashing via download mode. Recovery and Mobile Odin (or just dd) are good enough.

Sent from my GT-I9300 using xda app-developers app
 

AndreiLux

Senior Member
Jul 9, 2011
3,209
14,598
Download mode has nothing in common with the recovery partition. It is implemented in sboot (the device's bootloader).
It has its own implementation of hardware drivers. If it doesn't patch the eMMC RAM, then it isn't safe!

However, I haven't checked it enough yet to conclude whether it's safe or not. Right now, I'd recommend anyone to avoid flashing via download mode. Recovery and Mobile Odin (or just dd) are good enough.

Sent from my GT-I9300 using xda app-developers app
Makes sense into why they upgraded the bootloader with LLA then, the increased modification detection would be just a side-effect of a newer bootloader version which already had heightened warranty enforcements on the 9305 and the Note 2's.
 

Entropy512

Senior Recognized Developer
Aug 31, 2007
14,088
25,086
Owego, NY
Makes sense into why they upgraded the bootloader with LLA then, the increased modification detection would be just a side-effect of a newer bootloader version which already had heightened warranty enforcements on the 9305 and the Note 2's.

That alone would be enough to upgrade bootloader...

I don't believe SBOOT is encrypted or compressed - just run strings on it. If you see lots of recognizable strings, but don't see the eMMC model number, you can be fairly certain the BL doesn't contain a fix.
 

drakester09

Senior Member
Jan 10, 2012
1,292
876
That alone would be enough to upgrade bootloader...

I don't believe SBOOT is encrypted or compressed - just run strings on it. If you see lots of recognizable strings, but don't see the eMMC model number, you can be fairly certain the BL doesn't contain a fix.

I ran strings in the following BL versions:
- My current EMA1
- First changed release ELLA
- ICS BL ALEF

I'm attaching the results and a diff for each version, a lot of the content is gibberish but there are some quite interesting differences, mostly around line 1100.

Maybe it can help us understand a bit more or determine if BL plays a role.
 

Attachments

  • Comparison_Strings_BL_ALEF_ELLA_EMA1.zip
    137 KB · Views: 158

AndreiLux

Senior Member
Jul 9, 2011
3,209
14,598
I ran strings in the following BL versions:
- My current EMA1
- First changed release ELLA
- ICS BL ALEF

I'm attaching the results and a diff for each version, a lot of the content is gibberish but there are some quite interesting differences, mostly around line 1100.

Maybe it can help us understand a bit more or determine if BL plays a role.
There are some bootloader MMC changes but if they're related to SDS is to be determined... no VTU00M string in there at least, but that still doesn't rule it out.

usb_write reg, val
Read the usb ic register
sdcard test command
+mmcdtest
They added that mmcdtest into the bootloader utility commands, I wonder what it does.
 
  • Like
Reactions: Rob2222

Oranav

Senior Member
Oct 9, 2010
53
265
I'm reversing sboot to see what have changed (no "VTU00M" string doesn't mean there's no fix).
It should be very easy since we have kernel sources (we know how to communicate with the eMMC controller - MMIO addresses etc.).


* If someone has a BinDiff license and wants to help, it'd be great! :)


They added that mmcdtest into the bootloader utility commands, I wonder what it does.
It reads 0xFFC00 bytes from the eMMC boot partition and copies them to 0x50000000 (maybe this is an output buffer? I don't know yet).
I also think 0xFFC00 is the boot partition size, so it just reads it all...
 
Last edited:

AndreiLux

Senior Member
Jul 9, 2011
3,209
14,598
I'm reversing sboot to see what have changed (no "VTU00M" string doesn't mean there's no fix).
It should be very easy since we have kernel sources (we know how to communicate with the eMMC controller - MMIO addresses etc.).


* If someone has a BinDiff license and wants to help, it'd be great! :)



It reads 0xFFC00 bytes from the eMMC boot partition and copies them to 0x50000000 (maybe this is an output buffer? I don't know yet).
I also think 0xFFC00 is the boot partition size, so it just reads it all...
The U-Boot sources might interesst you, to get the general idea of the bootloader buildup. I don't know what the practical differences between U-Boot and S-Boot will be though, especially since the former will be outdated.
 

Rob2222

Senior Member
Feb 18, 2008
413
306
@Oranav/All:

We have some news regarding to the freezes that occur on some S3's with the new firmware...
Someone posted that he waited until the freeze is over and I asked how long. He said after 10-15 minutes the phone was back to normal without reboot.

So as my phone froze with screen on I waited and I really was suprised that after around 23 minutes freeze the phone just continued to work as it had never frozen.

Maybe the eMMC has a watchdog after all.

Maybe it is a interesing point that the phone is able to continue after a long time freeze.
Maybe we can get some infromation out of some log files?

Since 2-3 days my S3 has 5-20 freezes per day. I am on stock, unrooted XXELL5.

BR
Rob
 
Last edited:
  • Like
Reactions: ponepo

Rob2222

Senior Member
Feb 18, 2008
413
306
Got a kernel log from just after such a freeze.

I was about to power on the screen but nothing happen. Then I waited around 10 minutes and the screen came finally up and I dumped the log.

Is this interesting? :D

Full log is attached.

Code:
U/ 4002.738352  c0 [keys]PWR 1
U/ 4002.983296  c0 [keys]PWR 0
...
U/ 4587.514100  c0 mshci: ===========================================
W/ 4587.514336  c0 mmc0: it occurs a critical error on eMMC it'll try to recover eMMC to normal state
....
V/ 4587.850296  c0 mmc0: recovering eMMC has been done
...
W/ 4587.850849  c0 mmcblk0: unknown error -131 sending read/write command, card status 0x900
W/ 4587.851982  c0 end_request: I/O error, dev mmcblk0, sector 3126872
W/ 4587.852174  c0 end_request: I/O error, dev mmcblk0, sector 3126880
W/ 4587.852330  c0 end_request: I/O error, dev mmcblk0, sector 3126888


EDIT: Added another log. Will add more, if I get more.


BR
Rob
 

Attachments

  • 2013-01-30_16-39-42.txt
    126.9 KB · Views: 149
  • 2013-01-31_12-47-45.txt
    126.7 KB · Views: 68
Last edited:

Entropy512

Senior Recognized Developer
Aug 31, 2007
14,088
25,086
Owego, NY
So I decided to do a small RAM dump after all.

Before the patch, 0x5C7EA reads FD F7 C2 FA, which is "BL 0x59D72".
As I thought, they replace a function call to the new one.

I will dump function 0x59D72 later this week.

Could you perhaps post the RAM dump? I might be able to hand it to someone with a copy of hex-rays decompiler. :)

How big is the RAM?

I'm thinking of maybe dumping VYL00M fwrev 0x19 while I'm at it, and maybe seeing if someone else can dump 0x25. :p
 

Oranav

Senior Member
Oct 9, 2010
53
265
I have a Hex-Rays license. I actually reverse most of the time using it; I posted assembly code since it's easier to understand with these short snippets (in my point of view).

I won't post a RAM dump since it contains (probably?) licensed code.
I can however post the memory map:
0x00000000 - 0x00020000 BootROM (I guess it's a mask ROM)
0x00040000 - 0x00060000 Firmware (resides in RAM, the BootROM reads it from the NAND chip itself so it's upgradable!)
0x00060000 - 0x00080000 Data (no dynamic memory there BTW)
0x20000000 - 0x20028000 eMMC interface MMIO
0x20080000 - 0x20080400 I don't know, maybe another eMMC interface MMIO?
0x40000000 - 0x40010000 NAND interface MMIO


I can send you my RAM dump over IRC if you'd like. Besides that, I contemplate posting a .ko which exports the RAM over a character device (this is how I dumped it).

And, yes, dumping the new firmwares to see what has changed is super-cool :p
 
Last edited:
  • Like
Reactions: bangkokguy

Top Liked Posts

  • There are no posts matching your filters.
  • 53
    Update from Feb 17th:
    Samsung has started to upgrade eMMC firmwares on the field - only for GT-I9100 for now.
    See post #79 for additional details.

    Update from Feb 13th:
    If you want to dump the eMMC's RAM yourself, go ahead to post #72.
    I'm looking for a dump of firmware revision 0xf7 if you've got one.
    -----------------------


    Since it's very likely that the recent eMMC firmware patch by Samsung is their patch for the "sudden death" issue, it would be very nice to understand what is really going on there.

    According to a leaked moviNAND datasheet, it seems that MMC CMD62 is vendor-specific command that moviNAND implements.
    If you issue CMD62(0xEFAC62EC), then CMD62(0xCCEE) - you can read a "Smart report". To exit this mode, issue CMD62(0xEFAC62EC), then CMD62(0xDECCEE).


    So what are they doing in their patch?

    1. Whenever an MMC is attached:
    a. If it is "VTU00M", revision 0xf1, they read a Smart report.
    b. The DWORD at Smart[324:328] represents a date (little-endian); if it is not 0x20120413, they don't patch the firmware. (Maybe only chips from 2012/04/13 are buggy?)
    2. If the chip is buggy, whenever an MMC is attached or the device is resumed:
    a. Issue CMD62(0xEFAC62EC) CMD62(0x10210000) to enter RAM write mode. Now you can write to RAM by issuing MMC_ERASE_GROUP_START(Address to write) MMC_ERASE_GROUP_END(Value to be written) MMC_ERASE(0).
    b. *(0x40300) = 10 B5 03 4A 90 47 00 28 00 D1 FE E7 10 BD 00 00 73 9D 05 00
    c. *(0x5C7EA) = E3 F7 89 FD
    d. Exit RAM write mode by issuing CMD62(0xEFAC62EC) CMD62(0xDECCEE).
    10 B5 looks like a common Thumb push (in ARM architecture). Disassembling the bytes that they write to 0x40300 yields the following code:
    Code:
    ROM:00040300                 PUSH    {R4,LR}
    ROM:00040302                 LDR     R2, =0x59D73
    ROM:00040304                 BLX     R2
    ROM:00040306                 CMP     R0, #0
    ROM:00040308                 BNE     locret_4030C
    ROM:0004030A
    ROM:0004030A loc_4030A                               ; CODE XREF: ROM:loc_4030Aj
    ROM:0004030A                 B       loc_4030A
    ROM:0004030C ; ---------------------------------------------------------------------------
    ROM:0004030C
    ROM:0004030C locret_4030C                            ; CODE XREF: ROM:00040308j
    ROM:0004030C                 POP     {R4,PC}
    ROM:0004030C ; ---------------------------------------------------------------------
    Disassembling what they write to 0x5C7EA yields this:
    Code:
    ROM:0005C7EA                 BL      0x40300
    Looks like it is indeed Thumb code.
    If we could dump the eMMC RAM, we would understand what has been changed.


    By inspecting some code, it seems that we know how to dump the eMMC RAM:
    Look at the function mmc_set_wearlevel_page in line 206. It patches the RAM (using the method mentioned before), then it validates what it has written (in lines 255-290). Seems that the procedure to read the RAM is as following:
    1. CMD62(0xEFAC62EC) CMD62(0x10210002) to enter RAM reading mode
    2. MMC_ERASE_GROUP_START(Address to read) MMC_ERASE_GROUP_END(Length to read) MMC_ERASE(0)
    3. MMC_READ_SINGLE_BLOCK to read the data
    4. CMD62(0xEFAC62EC) CMD62(0xDECCEE) to exit RAM reading mode


    I don't want to run this on my device, because I'm afraid - messing with the eMMC doesn't sound like a very good idea on my device (I don't have a spare one).
    Does someone have a development device which he doesn't mind to risk, and want to dump the eMMC firmware from it? :)
    28
    Okay, got a RAM dump :)
    I won't post it here (or anywhere else for that matter) because I don't want to get sued by Samsung.

    I might release a kernel which allows you to dump the RAM yourself if there's enough demand, but I don't want to right now, because:
    1. The code is ugly as hell, not implemented as a kernel module, not thread-safe etc.
    2. It is highly dangerous (messing with the eMMC chip - I really don't know how much stable this thing is), so if you want to do it on your device, you should be an expert. In that case, you can write the code yourself (with little effort) :)


    Anyway, I hope the FTL is Whimory, since I'm familiar with it. Would be easier.
    I'll let you know if I find anything interesting.


    PS I've attached a little teaser. (Yes, this is the patched function. 0x40300 is red because I've opened a partial RAM dump.)



    EDIT - Some initial results:
    0. The CPU is a Cortex-M3.
    1. No strings at all :( Just some uninteresting release asserts ("REL_ASSERT")
    2. Found the Smart Report generator function -> found the MMC command handlers.
    3. Most MMC commands handlers are stored in a function table. There are 3 special commands: MMC60, MMC62, MMC64. Depends on the arguments these special commands are provided, they modify the function table (this is the so called "vendor mode").
    4. There are a lot of possible arguments for MMC62, not the only ones we know.
    5. If you trace back the function they patch all the way up the call stack, you get to MMC24 and MMC25 handler. These commands are MMC_WRITE_BLOCK and MMC_WRITE_MULTIPLE_BLOCK. Since the function they patch is deep down the call stack, it's very likely that it is the wear level.

    Anyway, because of the lack of strings I guess it would be very hard to truly understand the SDS bug we're facing :(
    18
    Just a quick update: thanks to a kernel compiled by AndreiLux, and thanks to artesea for doing an eMMC RAM dump on his device, we've got the 0xf7 firmware!

    It seems that it is runnable on the same hardware. It means that we can probably field upgrade I9300 devices, just as Samsung does with I9100.
    The interesting question is whether we're able to preserve the data on the eMMC during the process. If the answer is no, a firmware upgrade would require PIT repartitioning and reflashing of SBOOT so that the device won't become a brick.
    16
    So I decided to do a small RAM dump after all.

    Before the patch, 0x5C7EA reads FD F7 C2 FA, which is "BL 0x59D72".
    As I thought, they replace a function call to the new one.

    I will dump function 0x59D72 later this week.
    16
    Got a kernel log from just after such a freeze.

    I was about to power on the screen but nothing happen. Then I waited around 10 minutes and the screen came finally up and I dumped the log.

    Is this interesting? :D

    Full log is attached.

    Code:
    U/ 4002.738352  c0 [keys]PWR 1
    U/ 4002.983296  c0 [keys]PWR 0
    ...
    U/ 4587.514100  c0 mshci: ===========================================
    W/ 4587.514336  c0 mmc0: it occurs a critical error on eMMC it'll try to recover eMMC to normal state
    ....
    V/ 4587.850296  c0 mmc0: recovering eMMC has been done
    ...
    W/ 4587.850849  c0 mmcblk0: unknown error -131 sending read/write command, card status 0x900
    W/ 4587.851982  c0 end_request: I/O error, dev mmcblk0, sector 3126872
    W/ 4587.852174  c0 end_request: I/O error, dev mmcblk0, sector 3126880
    W/ 4587.852330  c0 end_request: I/O error, dev mmcblk0, sector 3126888


    EDIT: Added another log. Will add more, if I get more.


    BR
    Rob