eMMC sudden death research

Search This thread

exge

Senior Member
Apr 25, 2010
121
24
Hi, my S3 bricked and even a JTAG could not save it. Yes, the eMMC was bricked at the very low level.
Samsung replaced my board and i checked it is now running 0xf7 revision, the sammy engineer also told me this is a safe fw immune to that superbrick. After further questioning and hardcore probing - the engineer revealed that the eMMC fw of 0xf1 has a bug in its wear leveling algorithm, which causes the sector containing the BIOS to be damaged, and this fw will fix that.
Will a dump of my firmware help you guys?
I tried a df,dmesg,lsof and other commands and could not find the mount point for the eMMC, so if you can tell me how to dump the fw, i will glady do it and hopefully it should help you.

PS: other threads on xda say samsung replaced their boards with the same defective ones; it seems that in my country, samsung actually replaces it with a newer revision.
 

Product F(RED)

Senior Member
Sep 6, 2010
9,883
2,105
Brooklyn, NY
Hi, my S3 bricked and even a JTAG could not save it. Yes, the eMMC was bricked at the very low level.
Samsung replaced my board and i checked it is now running 0xf7 revision, the sammy engineer also told me this is a safe fw immune to that superbrick. After further questioning and hardcore probing - the engineer revealed that the eMMC fw of 0xf1 has a bug in its wear leveling algorithm, which causes the sector containing the BIOS to be damaged, and this fw will fix that.
Will a dump of my firmware help you guys?
I tried a df,dmesg,lsof and other commands and could not find the mount point for the eMMC, so if you can tell me how to dump the fw, i will glady do it and hopefully it should help you.

PS: other threads on xda say samsung replaced their boards with the same defective ones; it seems that in my country, samsung actually replaces it with a newer revision.

Finally, a response from Samsung. Do you still have contact with the employee? Maybe he can guide us in the right direction for updating the firmware.
 

exge

Senior Member
Apr 25, 2010
121
24
it wasn't exactly an official response from samsung, but because the repair guy and me are both computer engineers, i used that "brotherhood" to mine this data off him.
 

Product F(RED)

Senior Member
Sep 6, 2010
9,883
2,105
Brooklyn, NY
it wasn't exactly an official response from samsung, but because the repair guy and me are both computer engineers, i used that "brotherhood" to mine this data off him.

Gotcha. Do you think you could contact him at all, off the record? Tell him that we're trying to fix the issue on XDA (he'll definitely know about XDA) and we could use some [anonymous] help.
 

exge

Senior Member
Apr 25, 2010
121
24
well .. i can try .. but how do i go to the service center and find that guy lol..
for now lets not put too much hope on him :(
just try to use the clues he told me :)
 

Rebellos

Senior Recognized Developer
May 13, 2009
1,353
3,428
Gdańsk
Some phone repair guy from Samsung service won't know for sure what can help us, what you said is nothing we didn't know for quite long time. ;P
Unless he can tell if Sammy does something with the returned chips back in factory, or just reflow boards with new eMMCs or so. But even if they do repair them - I'm sure it's done out of his scope.
 
  • Like
Reactions: sinancetinkaya

E:V:A

Inactive Recognized Developer
Dec 6, 2011
1,447
2,222
-∇ϕ
...SDS is still not fixable since at this point, the internal eMMC is hosed at a very low level - unless we can figure out how to do a full reset/wipe of the eMMC chip from the main eMMC interface ...
There are several ways to trigger a reset and at least one of the resets also triggers a full wipe... One is from JEDEC std CMD, another from bootloader, another (probably) from GPIO, another from mmc-tools...

Anyway, since we're not seeing any *.ko modules coming out of here, perhaps someone could provide some pseudo code for the fw dumper code? (We can compile & dump ourselves!)
 

Rebellos

Senior Recognized Developer
May 13, 2009
1,353
3,428
Gdańsk
There are several ways to trigger a reset and at least one of the resets also triggers a full wipe... One is from JEDEC std CMD, another from bootloader, another (probably) from GPIO, another from mmc-tools
You have misunderstood something here. There are no documented nor known ways for formatting Samsung eMMC internal data. And this is what Entropy's talking about. Wiping user data from emmc is not any help.
 

Oranav

Senior Member
Oct 9, 2010
53
265
So it sorta looks like in the original firmware, it's (bear with me, this is really fugly pseudocode)
...
Actually, to be more precise, it used to be:
Code:
void *val;
set_val_and_return_whether_succeeded(&val);
crater_the_chip_if_val_is_null(val);
Now it is:
Code:
void *val;
if (!set_val_and_return_whether_succeeded(&val))
    halt();
crater_the_chip_if_val_is_null(val);
It's possible to boot from the MMC1 bus.
SDS is still not fixable since at this point, the internal eMMC is hosed at a very low level - unless we can figure out how to do a full reset/wipe of the eMMC chip from the main eMMC interface (we know that this is theoretically possible as Ken Sumrall of Google had access to such a procedure but was not able to provide us the info on it due to NDAs, but do not have any examples of performing this procedure due to aforementioned NDAs). Same reason Superbricked devices can't even be repaired using JTAG.

Some SDSed devices behaved similarly to how many Superbricked devices behaved - parts of the chip worked OK (including the bootloader), others were hosed. Quite a few people who suffered from SDS were able to boot into download mode but not write to any part of the chip.
The Movi's BootROM is small enough to reverse entirely. I also saw some eMMC command handling there, so I think we can somehow talk with the BootROM even on superbricked devices.
This can be very interesting!

Anyway, since we're not seeing any *.ko modules coming out of here, perhaps someone could provide some pseudo code for the fw dumper code? (We can compile & dump ourselves!)
Excuse my laziness. For now, I don't mind to send you my own RAM dump; I'll write that .ko when I have a little more time.
 

Rebellos

Senior Recognized Developer
May 13, 2009
1,353
3,428
Gdańsk
To give a little sum-up:
Theoretical solution for superbricked devices would cover disabling eMMC for a while to trigger MMC1 boot (tweezers/paperclip/pencil should do with some luck and steady hand)
Then code from external sd would do factory format and eventually permanent patch of eMMC.
For devices where eMMC is capable of booting up to the kernel we'd probably use some kexec.
 

Oranav

Senior Member
Oct 9, 2010
53
265
WARNING: The attached patch is dangerous as it sends low-level commands to your eMMC chip. Please, use it ONLY if you know what you're doing. I'm not responsible for anything!

I've attached a kernel patch which allows you to read the eMMC RAM.

Usage:
Code:
# cat /proc/devices | grep mmcram
248 mmcram
# mknod -m 0444 /dev/mmcram c 248 0
# dd if=/dev/mmcram of=/storage/extSdCard/mmcram.bin bs=4K count=128
128+0 records in
128+0 records out
524288 bytes (512.0KB) copied, 31.813499 seconds, 16.1KB/s
No lseek support if you wondered.
There's a memory hole at 0x20000-0x40000, so you'll get zeroes. Just ignore this part of the dump.

The patch also includes a rewrite of Samsung's patch to properly use the MMC quirks mechanism. I'll submit this part of the patch to CM someday.


* I'd like to get a dump of firmware 0xf7. If someone has a test device, you should be able to dump firmware 0xf7 if you move the "init_mmc_ram(card);" call to the top of the "mmc_movi_sds_add_quirk" function. If someone dumps firmware 0xf7, send it to me please. :)
 

Attachments

  • mmc.patch
    23 KB · Views: 1,741
Last edited:

AndreiLux

Senior Member
Jul 9, 2011
3,209
14,598
Works here, extracted the 0xf1 firmware without issues. 0xf7 firmwares seem to be mostly prevalent in newer phones and non-white and non-blue models, so not many devs might have access. If somebody with 0xf7 wants to contribute, feel free to PM me for the kernel.

Oranav, mind if I keep the patch as is in the kernel with just the char device disabled for the public?
 
Last edited:
  • Like
Reactions: duttyend

Oranav

Senior Member
Oct 9, 2010
53
265
AndreiLux - no problem.
However, note that:
1. I didn't test the "other" part (quirk mechanism) thoroughly. After I test it enough, I'll submit it to CM. There might be some changes, so maybe you'd like to pull it instead of using the one I've attached.
Besides that, I saw that the SGS2 brickbug fix was merged into mainline; maybe I'll submit it to mainline as well.
2. The chardev code is very ugly (no conventions at all) and somewhat buggy (e.g. g_page is not thread-safe). Maybe it's better to fix it before merging it into your kernel, even if the code is disabled...
 

Entropy512

Senior Recognized Developer
Aug 31, 2007
14,088
25,086
Owego, NY
Could there be a connection between the SDS and the Samsung Laptop Bug:
http://mjg59.dreamwidth.org/22855.html
No, no connection at all, other than Samsung doesn't seem to care about breaking interfaces and leaving flaws open that can kill devices.

They will always point the finger at someone else, even though (as Alan Cox mentioned at https://plus.google.com/111104121194250082892/posts/6Fo89YW6zGW ) - Anything that can be triggered by open source software could also be triggered by malware!

To give a little sum-up:
Theoretical solution for superbricked devices would cover disabling eMMC for a while to trigger MMC1 boot (tweezers/paperclip/pencil should do with some luck and steady hand)
Then code from external sd would do factory format and eventually permanent patch of eMMC.
For devices where eMMC is capable of booting up to the kernel we'd probably use some kexec.
To be clear, this would be for devices bricked by SDS. We have no way to boot from MMC1 on Exynos 4210 (aka devices vulnerable to "classic Superbrick")

That said, it's starting to look like there might be a way to JTAG resurrect Superbricked devices.
 

E:V:A

Inactive Recognized Developer
Dec 6, 2011
1,447
2,222
-∇ϕ
As much as it could be useful, I don't think we need to dump the eMMC
firmware, unless you are on a custom ROM or have problems with a non-Samsung
device. The eMMC firmware is available as long HEX structure in a firmware
header file in the GT-I9100, Jelly Bean open source, as part of the eMMC Field
Firmware Update (FFU)
module. You can download these sources from the Samsung
Open Source Release Center. Unzip and extract the Kernel and go to:
./drivers/mmc/ffu/fw.h

I haven't checked the FW from the I9300 sources, but it could be worth
comparing it, since the eMMC chip may be different.

The data structure consists of 0x203E0 (132064 = ~128 KiB) hex bytes and
look like this:
Code:
[SIZE=2]const unsigned char movinand_fw[] = {
 0x00,0x00,0x08,0x00,0x01,0x04,0x04,0x00,0xd1,0xb3,0x04,0x00,0xd3,0xb3,0x04,0x00,
 ...
 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00
};[/SIZE]
To convert this to a binary file, just pipe it through xxd like this:
Code:
cat fw.h |xxd -r -p >fw.bin
HERE are fw.h and fw.bin for the GT-I9100 for your convenience.

I was not successful in trying to replicate Oranav's results from Post#12. Not
sure what I did wrong. Also, not clear to where this piece fits in to the
memory map of Post#40, if it does at all.
 

Rebellos

Senior Recognized Developer
May 13, 2009
1,353
3,428
Gdańsk
Hex you pasted looks indeed like beginning of samsung eMMC FW. I hate downloading stuff from Samsung OSRC. Could you extract their whole drivers/mmc dir?
 

Oranav

Senior Member
Oct 9, 2010
53
265
E:V:A, this is just awesome; this code is rather new I suppose (I haven't seen any JB SGS2 source before).

Samsung actually field upgrades eMMC firmwares. It uses mostly vendor-specific and undocumented MMC commands, but we already knew much of this information thanks to the reversing process.

For those who are interested:
1. They validate that you've got a broken FW by reading a special extended Smart Report (reading address 0x1000 instead of 0 gets you an extended report).
2. If you do have a broken FW, they use CMD62(0xEFAC62EC) CMD62(0x10210001), which enters "SRAM writing mode". When you issue write commands, it just writes them to the SRAM (0x20000000).
3. They now write the whole buffer @ fw.h to the SRAM.
4. They then enter RAM reading mode by using CMD62(0xEFAC62EC) CMD62(0x10210002). Note that this is the exact command I've used to dump the RAM on my device, you can read about it on the old posts on this thread :)
5. They validate the FW upgrade payload by reading the SRAM.
6. If all is okay, they enter "RAM execute mode" by issuing CMD62(0xEFAC62EC) CMD62(0x10210010). They then jump to 0x20020001, which is just the Thumb2 function @ 0x20020000.

This is why the payload in fw.h has more than 0x20000 bytes (which is the size of the firmware). First 0x20000 bytes are the actual firmware, the extra bytes are the FW upgrade payload.


This is somewhat strange as they do have a built-in vendor specific MMC command to upgrade the FW, but they don't use it. I don't know why they do this. I guess time will tell.

I expect to see a similar FW upgrade for the SGS3 in the near time.
 

Entropy512

Senior Recognized Developer
Aug 31, 2007
14,088
25,086
Owego, NY
E:V:A, this is just awesome; this code is rather new I suppose (I haven't seen any JB SGS2 source before).

Samsung actually field upgrades eMMC firmwares. It uses mostly vendor-specific and undocumented MMC commands, but we already knew much of this information thanks to the reversing process.

For those who are interested:
1. They validate that you've got a broken FW by reading a special extended Smart Report (reading address 0x1000 instead of 0 gets you an extended report).
2. If you do have a broken FW, they use CMD62(0xEFAC62EC) CMD62(0x10210001), which enters "SRAM writing mode". When you issue write commands, it just writes them to the SRAM (0x20000000).
3. They now write the whole buffer @ fw.h to the SRAM.
4. They then enter RAM reading mode by using CMD62(0xEFAC62EC) CMD62(0x10210002). Note that this is the exact command I've used to dump the RAM on my device, you can read about it on the old posts on this thread :)
5. They validate the FW upgrade payload by reading the SRAM.
6. If all is okay, they enter "RAM execute mode" by issuing CMD62(0xEFAC62EC) CMD62(0x10210010). They then jump to 0x20020001, which is just the Thumb2 function @ 0x20020000.

This is why the payload in fw.h has more than 0x20000 bytes (which is the size of the firmware). First 0x20000 bytes are the actual firmware, the extra bytes are the FW upgrade payload.


This is somewhat strange as they do have a built-in vendor specific MMC command to upgrade the FW, but they don't use it. I don't know why they do this. I guess time will tell.

I expect to see a similar FW upgrade for the SGS3 in the near time.
Maybe whomever did it didn't know that command was there. They're pretty disorganized.

It's interesting as that appears to be a "upgrade without reformat". Perhaps the vendor specific command will always fullreset the eMMC?

This is actually the third or fourth I9100 JB source drop - HK got it in December, China a bit later.
 
  • Like
Reactions: duttyend

Top Liked Posts

  • There are no posts matching your filters.
  • 53
    Update from Feb 17th:
    Samsung has started to upgrade eMMC firmwares on the field - only for GT-I9100 for now.
    See post #79 for additional details.

    Update from Feb 13th:
    If you want to dump the eMMC's RAM yourself, go ahead to post #72.
    I'm looking for a dump of firmware revision 0xf7 if you've got one.
    -----------------------


    Since it's very likely that the recent eMMC firmware patch by Samsung is their patch for the "sudden death" issue, it would be very nice to understand what is really going on there.

    According to a leaked moviNAND datasheet, it seems that MMC CMD62 is vendor-specific command that moviNAND implements.
    If you issue CMD62(0xEFAC62EC), then CMD62(0xCCEE) - you can read a "Smart report". To exit this mode, issue CMD62(0xEFAC62EC), then CMD62(0xDECCEE).


    So what are they doing in their patch?

    1. Whenever an MMC is attached:
    a. If it is "VTU00M", revision 0xf1, they read a Smart report.
    b. The DWORD at Smart[324:328] represents a date (little-endian); if it is not 0x20120413, they don't patch the firmware. (Maybe only chips from 2012/04/13 are buggy?)
    2. If the chip is buggy, whenever an MMC is attached or the device is resumed:
    a. Issue CMD62(0xEFAC62EC) CMD62(0x10210000) to enter RAM write mode. Now you can write to RAM by issuing MMC_ERASE_GROUP_START(Address to write) MMC_ERASE_GROUP_END(Value to be written) MMC_ERASE(0).
    b. *(0x40300) = 10 B5 03 4A 90 47 00 28 00 D1 FE E7 10 BD 00 00 73 9D 05 00
    c. *(0x5C7EA) = E3 F7 89 FD
    d. Exit RAM write mode by issuing CMD62(0xEFAC62EC) CMD62(0xDECCEE).
    10 B5 looks like a common Thumb push (in ARM architecture). Disassembling the bytes that they write to 0x40300 yields the following code:
    Code:
    ROM:00040300                 PUSH    {R4,LR}
    ROM:00040302                 LDR     R2, =0x59D73
    ROM:00040304                 BLX     R2
    ROM:00040306                 CMP     R0, #0
    ROM:00040308                 BNE     locret_4030C
    ROM:0004030A
    ROM:0004030A loc_4030A                               ; CODE XREF: ROM:loc_4030Aj
    ROM:0004030A                 B       loc_4030A
    ROM:0004030C ; ---------------------------------------------------------------------------
    ROM:0004030C
    ROM:0004030C locret_4030C                            ; CODE XREF: ROM:00040308j
    ROM:0004030C                 POP     {R4,PC}
    ROM:0004030C ; ---------------------------------------------------------------------
    Disassembling what they write to 0x5C7EA yields this:
    Code:
    ROM:0005C7EA                 BL      0x40300
    Looks like it is indeed Thumb code.
    If we could dump the eMMC RAM, we would understand what has been changed.


    By inspecting some code, it seems that we know how to dump the eMMC RAM:
    Look at the function mmc_set_wearlevel_page in line 206. It patches the RAM (using the method mentioned before), then it validates what it has written (in lines 255-290). Seems that the procedure to read the RAM is as following:
    1. CMD62(0xEFAC62EC) CMD62(0x10210002) to enter RAM reading mode
    2. MMC_ERASE_GROUP_START(Address to read) MMC_ERASE_GROUP_END(Length to read) MMC_ERASE(0)
    3. MMC_READ_SINGLE_BLOCK to read the data
    4. CMD62(0xEFAC62EC) CMD62(0xDECCEE) to exit RAM reading mode


    I don't want to run this on my device, because I'm afraid - messing with the eMMC doesn't sound like a very good idea on my device (I don't have a spare one).
    Does someone have a development device which he doesn't mind to risk, and want to dump the eMMC firmware from it? :)
    28
    Okay, got a RAM dump :)
    I won't post it here (or anywhere else for that matter) because I don't want to get sued by Samsung.

    I might release a kernel which allows you to dump the RAM yourself if there's enough demand, but I don't want to right now, because:
    1. The code is ugly as hell, not implemented as a kernel module, not thread-safe etc.
    2. It is highly dangerous (messing with the eMMC chip - I really don't know how much stable this thing is), so if you want to do it on your device, you should be an expert. In that case, you can write the code yourself (with little effort) :)


    Anyway, I hope the FTL is Whimory, since I'm familiar with it. Would be easier.
    I'll let you know if I find anything interesting.


    PS I've attached a little teaser. (Yes, this is the patched function. 0x40300 is red because I've opened a partial RAM dump.)



    EDIT - Some initial results:
    0. The CPU is a Cortex-M3.
    1. No strings at all :( Just some uninteresting release asserts ("REL_ASSERT")
    2. Found the Smart Report generator function -> found the MMC command handlers.
    3. Most MMC commands handlers are stored in a function table. There are 3 special commands: MMC60, MMC62, MMC64. Depends on the arguments these special commands are provided, they modify the function table (this is the so called "vendor mode").
    4. There are a lot of possible arguments for MMC62, not the only ones we know.
    5. If you trace back the function they patch all the way up the call stack, you get to MMC24 and MMC25 handler. These commands are MMC_WRITE_BLOCK and MMC_WRITE_MULTIPLE_BLOCK. Since the function they patch is deep down the call stack, it's very likely that it is the wear level.

    Anyway, because of the lack of strings I guess it would be very hard to truly understand the SDS bug we're facing :(
    18
    Just a quick update: thanks to a kernel compiled by AndreiLux, and thanks to artesea for doing an eMMC RAM dump on his device, we've got the 0xf7 firmware!

    It seems that it is runnable on the same hardware. It means that we can probably field upgrade I9300 devices, just as Samsung does with I9100.
    The interesting question is whether we're able to preserve the data on the eMMC during the process. If the answer is no, a firmware upgrade would require PIT repartitioning and reflashing of SBOOT so that the device won't become a brick.
    16
    So I decided to do a small RAM dump after all.

    Before the patch, 0x5C7EA reads FD F7 C2 FA, which is "BL 0x59D72".
    As I thought, they replace a function call to the new one.

    I will dump function 0x59D72 later this week.
    16
    Got a kernel log from just after such a freeze.

    I was about to power on the screen but nothing happen. Then I waited around 10 minutes and the screen came finally up and I dumped the log.

    Is this interesting? :D

    Full log is attached.

    Code:
    U/ 4002.738352  c0 [keys]PWR 1
    U/ 4002.983296  c0 [keys]PWR 0
    ...
    U/ 4587.514100  c0 mshci: ===========================================
    W/ 4587.514336  c0 mmc0: it occurs a critical error on eMMC it'll try to recover eMMC to normal state
    ....
    V/ 4587.850296  c0 mmc0: recovering eMMC has been done
    ...
    W/ 4587.850849  c0 mmcblk0: unknown error -131 sending read/write command, card status 0x900
    W/ 4587.851982  c0 end_request: I/O error, dev mmcblk0, sector 3126872
    W/ 4587.852174  c0 end_request: I/O error, dev mmcblk0, sector 3126880
    W/ 4587.852330  c0 end_request: I/O error, dev mmcblk0, sector 3126888


    EDIT: Added another log. Will add more, if I get more.


    BR
    Rob