eMMC sudden death research

Search This thread

Entropy512

Senior Recognized Developer
Aug 31, 2007
14,088
25,086
Owego, NY
I have a Hex-Rays license. I actually reverse most of the time using it; I posted assembly code since it's easier to understand with these short snippets (in my point of view).

I won't post a RAM dump since it contains (probably?) licensed code.
I can however post the memory map:
0x00000000 - 0x00020000 BootROM (I guess it's a mask ROM)
0x00040000 - 0x00060000 Firmware (resides in RAM, the BootROM reads it from the NAND chip itself so it's upgradable!)
0x00060000 - 0x00080000 Data (no dynamic memory there BTW)
0x20000000 - 0x20028000 eMMC interface MMIO
0x20080000 - 0x20080400 I don't know, maybe another eMMC interface MMIO?
0x40000000 - 0x40010000 NAND interface MMIO


I can send you my RAM dump over IRC if you'd like. Besides that, I contemplate posting a .ko which exports the RAM over a character device (this is how I dumped it).

And, yes, dumping the new firmwares to see what has changed is super-cool :p
A .ko (preferably with source :) ) would be great. It would save me a lot of time implementing the dumper myself. Ideally I'd like to get dumps of VYL00M or MAG4FA 0x19, along with 0x25.

We do know these chips are upgradable, however, Samsung claims that:
1) To upgrade the firmware, you must completely wipe the chip including all bootloaders. (Interestingly enough, this fullwipe will resurrect Superbricked devices) - I believe this
2) The process is so dangerous that it fails frequently in a way that makes the chip 100% unrepairable - I'm a bit skeptical about this one, at least the claims of an absurdly high failure rate.

#2 is why we have no way to repair Superbricked devices.
 

E:V:A

Inactive Recognized Developer
Dec 6, 2011
1,447
2,222
-∇ϕ
At first I didn't quite understand what was going on, but now when I see what is happening... Excellent!

I'd love to see some tool to come out of this, to read eMMC RAM. I can see several cross-platform applications for this! Something like "viewmem", but "viewemmc" instead. Would this be feasible?

@ Entropy512: Could this be incorporated into your "Got Brickbug?" app, or something similar?
 

Entropy512

Senior Recognized Developer
Aug 31, 2007
14,088
25,086
Owego, NY
At first I didn't quite understand what was going on, but now when I see what is happening... Excellent!

I'd love to see some tool to come out of this, to read eMMC RAM. I can see several cross-platform applications for this! Something like "viewmem", but "viewemmc" instead. Would this be feasible?

@ Entropy512: Could this be incorporated into your "Got Brickbug?" app, or something similar?

That's not my app. It also probably couldn't be integrated into any detection app, BUT it would be interesting to see what the differences are between MAG4FA 0x19 (BAD) and MAG4FA 0x25 (good other than a far less nasty "wear leveller randomly inserts 32kb of zeros" bug).

Maybe even if there were a way to write the RAM back to NAND, then the chip could be reset/wiped - we know this is possible but dangerous. It could only be researched by someone who has JTAG access to an affected device, since no affected device has any known way to boot from USB or SDCard.
 

Product F(RED)

Senior Member
Sep 6, 2010
9,883
2,105
Brooklyn, NY
What's sad about this is that you guys are probably doing more work than Samsung, trying to get to the bottom of the problem. But I guess that's also a good thing a way; I still have faith in humanity.
 

E:V:A

Inactive Recognized Developer
Dec 6, 2011
1,447
2,222
-∇ϕ
Why doesn't someone just email whoever made the patch? Perhaps he/she could at least explain the reasoning behind, without giving out all Samsung "secrets".

...
In our local forum we get some reports about a rising count of locks and restarts on S3's in the last time. Some like my freeze.
It also seems that after a while this problems gets better and even disappear completely.
Cause of that I am thinking, if it could be, that the fix maybe locks the eMMC if it finds a bad data structure, then this locks maybe could bring a phone-freeze (already stated that), and in the same time it repairs the data structure in this block with the bad data structure.
At least this would explain some rising count of freezes with the fix and the point, that the freezes become less and less over time...

...I have no clue how the algorithms work, but maybe it uses some sort of pseudo-random data to do whatever, with the same seed on all eMMCs... and thus all of them go through the same series of numbers. And now imagine the error condition is only triggered by a specific number or number set (say someone screwed up a boundary condition). Under this theory the error condition wouldn't appear randomly, but after a certain amount of write ops (or something).

I'm thinking the same thing. Just like the "bad sectors" on a good old HD, perhaps the bad "sectors" on an eMMC is getting avoided or tagged as bad by the wear leveling algorithm. These "tags" also have to be written somewhere, so if these function was screwed up somehow, I guess you'd get corruption although most part still functioning. The "patch-fix" then slowly discovers these errors and avoids them, causing us to see a decrease of problems.

There must be something written somewhere about wear-leveling of eMMC's...


* If someone has a BinDiff license and wants to help, it'd be great!

Zynamics BinDiff seem very nice...but with(out) a price tag.
(Again, write the guys and ask for an 99% XDA Developer discount. After all, the company has been acquired by Google and we're working for Android!)

But you can also try some free a HEX based ones
VBinDiff
Another BinDiff
DiffNow (for source code/text and web based)

My bad. Sleepy eyes. (Chainfire's app)
 

Oranav

Senior Member
Oct 9, 2010
53
265
That's not my app. It also probably couldn't be integrated into any detection app, BUT it would be interesting to see what the differences are between MAG4FA 0x19 (BAD) and MAG4FA 0x25 (good other than a far less nasty "wear leveller randomly inserts 32kb of zeros" bug).

Maybe even if there were a way to write the RAM back to NAND, then the chip could be reset/wiped - we know this is possible but dangerous. It could only be researched by someone who has JTAG access to an affected device, since no affected device has any known way to boot from USB or SDCard.

I think it is possible to update the firmware.
Except for CMD62, there are 2 more vendor specific commands (CMD60 and CMD64). I think I saw somewhere a command which updates the firmware on the NAND; I'm not sure now but I'll check it later. The BootROM is also very small so it's easy to find exactly where the firmware is stored on the NAND.
About the danger with this process, I think it's mostly due to the risk of having no bootloader or no Movi firmware. However, I think the Movi BootROM has a recovery mode, so if we're somehow able to boot the device from the mmc1 bus (SD card), we're okay.

Anyway, later this week I'll write that .ko (currently I just edited the Linux MMC subsystem code :eek:) and push to Github.
 
Last edited:

E:V:A

Inactive Recognized Developer
Dec 6, 2011
1,447
2,222
-∇ϕ
@Oranav: Can you PM me a memory dump?

I'd like to see the Smart Report for a failing device versus a working one...

I have a bad feeling that this problem can be much greater than what Samsung like to admit. At least if this bug have anything to do with wear-leveling...

Also, can "someone" help me "fill in" the following:
a) what exact devices are having problems?
b) What exact eMMC cards do they have? (And size)
c) Is that leaked datasheet in OP, for any of those in (b)?
d) What die size "technology" are these eMMC's using? (25 nm, 34 nm or other?)
e) Do you know anything about how people with eMMC problems use their devices?
f) The Linux kernel version for problematic devices...
 
Last edited:

Entropy512

Senior Recognized Developer
Aug 31, 2007
14,088
25,086
Owego, NY
Why doesn't someone just email whoever made the patch? Perhaps he/she could at least explain the reasoning behind, without giving out all Samsung "secrets".
Not possible, since the patch was excised from a MASSIVE kernel update with thousands of lines of changes. There is zero commit history for the tarball drop. The tagging of Andrei's patch as "Samsung OSRC" was some custom hackery by him - he diffed two kernels, split up the commits, and set authorship to "Samsung OSRC".


I'm thinking the same thing. Just like the "bad sectors" on a good old HD, perhaps the bad "sectors" on an eMMC is getting avoided or tagged as bad by the wear leveling algorithm. These "tags" also have to be written somewhere, so if these function was screwed up somehow, I guess you'd get corruption although most part still functioning. The "patch-fix" then slowly discovers these errors and avoids them, causing us to see a decrease of problems.

There must be something written somewhere about wear-leveling of eMMC's...
How an eMMC does internal wear levelling is up to the manufacturer - eMMC only defines the external interface.

Wear levelling algorithms are typically considered highly proprietary by the manufacturer.

So far, historically every "catastrophic" MMC failure we've dealt with on Samsung eMMCs has had nothing to do with bad/corrupt sectors - it has to do with bad/corrupt internal data. Think of it as a lower-level version of a corrupt ext4 filesystem... The underlying disk is fine, but the filesystem is useless without a reformat. Problem is, there's no documented way outside of a factory to completely reset an eMMC (e.g. "low level format").

In the case of Superbrick, a secure erase command issued to a region that contains sectors in a certain state (associated with a performance optimization, not with failure handling/recovery) would corrupt the wear leveller's internal data. Then, any time you tried to access nearby memory, the wear leveller would simply crash.

SDS seems to be a case of some function potentially returning 0 (maybe due to integer overflow? The statistics of the issue and how it suddenly "spiked" after a number of months of usage screams overflow to me), and that 0 then being treated as data instead of an error, corrupting data structures right and left.


I think it is possible to update the firmware.
Except for CMD62, there are 2 more vendor specific commands (CMD60 and CMD64). I think I saw somewhere a command which updates the firmware on the NAND; I'm not sure now but I'll check it later. The BootROM is also very small so it's easy to find exactly where the firmware is stored on the NAND.
About the danger with this process, I think it's mostly due to the risk of having no bootloader or no Movi firmware. However, I think the Movi BootROM has a recovery mode, so if we're somehow able to boot the device from the mmc1 bus (SD card), we're okay.

Anyway, later this week I'll write that .ko (currently I just edited the Linux MMC subsystem code :eek:) and push to Github.
No internal firmware seems to be the risk that Samsung was most concerned about when they decided not to release Superbrick repair code - Supposedly if the firmware update doesn't go perfectly the chip is 100% toast. (However, language barrier and such could have really meant just that the device's bootloaders were hosed...)

There might also be some interface that allows the MMC to be programmed in the factory that isn't exposed once soldered to a board.

Unfortunately, as we don't have a security-dropped IBL that is signed for Exynos 4210, there is no SDCard or USB recovery available for 4210 devices like there is for Exynos3 and for 4412 devices. If you kill the bootloaders, JTAG is it.
 

Oranav

Senior Member
Oct 9, 2010
53
265
SDS seems to be a case of some function potentially returning 0 (maybe due to integer overflow? The statistics of the issue and how it suddenly "spiked" after a number of months of usage screams overflow to me), and that 0 then being treated as data instead of an error, corrupting data structures right and left.
It doesn't seem like an integer overflow, at least not a straightforward one.
This is the function they patch:
Code:
int __fastcall f_to_be_patched_function(_DWORD *out, int val)
{
  int ret; // r2@1

  ret = 0;
  if ( *off_5FC60 == val )
  {
    *out = off_5FC60;
    return 1;
  }
  if ( *off_5FC64 == val )
  {
    *out = off_5FC64;
    return 1;
  }
  *out = 0;
  return ret;
}
Both off_5FC60 and off_5FC64 point to some FTL related contexts.
This is the wrapper function they write to the RAM:
Code:
void __fastcall f_new_function_by_patch(_DWORD *out, int val)
{
  if ( !f_to_be_patched_function(out, val) )
  {
    while ( 1 )
      ;
  }
}
The BL instruction that is being patched used to call the old function (f_to_be_patched_function), without checking its return value, hence the bug.
What's so strange about it is that "f_to_be_patched_function" is called from many other locations in the code, without checking the return value! So the bug exists in other locations as well.
Either the other locations don't cause internal metadata corruption, or they are just so rare that Samsung didn't even bother to patch them.

Unfortunately, as we don't have a security-dropped IBL that is signed for Exynos 4210, there is no SDCard or USB recovery available for 4210 devices like there is for Exynos3 and for 4412 devices. If you kill the bootloaders, JTAG is it.

Wait, so we do have a way to boot Exynos 4412 devices (Galaxy S3) from the mmc1 bus?
If so, why isn't SDS fixable?
 

E:V:A

Inactive Recognized Developer
Dec 6, 2011
1,447
2,222
-∇ϕ
I think it is possible to update the firmware.
Except for CMD62, there are 2 more vendor specific commands (CMD60 and CMD64). I think I saw somewhere a command which updates the firmware on the NAND; I'm not sure now but I'll check it later.

There is no CMD64, because CMDs go from 0-63. CMD's 60-63 are
"Reserved for Manufacturers" and belong to the reserved Class-11.

But I agree, there have to be a way to update eMMC firmware. Although Entropy may be right about factory programming, I don't think this "interface" would only be available at that time. I have a strong belief that it should be possible to update. We know all the eMMC pins, and we know the basic interface and the basic technology within, but we don't know the firmware! Samsung's SSD firmwares can certainly be updated!

(We could look for the firmware in there.)


I'll just start filling in this myself...
a) what exact devices are having problems?
- GT-I9300/3 with 16 GB MoviNAND
b) What exact eMMC cards do they have? (Samsung part-no/name)
c) Is that leaked datasheet in OP, for any of those in (b)?
- <unknown>
d) What die size "technology" are these eMMC's using? (25 nm, 34 nm or other?)
e) Do you know anything about how people with eMMC problems use their devices?
f) The Linux kernel version for problematic devices...
Code:
Model:          Samsung GT-I9300
Chip:           KMVTU000LM-B503
Part No:        1108-000424 ?
Size:           eMMC(16GB)+MDDR(64MB) 
eMMC ID:        VTU00M 
eMMC FW Rev.:   0xF1

It doesn't seem like an integer overflow, at least not a straightforward one. This is the function they patch:
...
Wait, so we do have a way to boot Exynos 4412 devices (Galaxy S3) from the mmc1 bus? If so, why isn't SDS fixable?

Unless you can somehow provide something more substantial than that reversed pseudo-C stuff, I cannot help much. (Or if you can post that module so that we can look for ourselves!)

We can certainly unbrick anything supported by Adam Outlers/Rebellos/Ralekdevs unbrickable mods. They also have the Boot from SD card mod. In theory we should be able to unbrick I9100 in the same way, but no one want to waste more energy on that PoS device! (I know, because I have one...with the VYL00M brick bug!)
 

E:V:A

Inactive Recognized Developer
Dec 6, 2011
1,447
2,222
-∇ϕ
In case someone else like to join in on this, here are some eMMC basics for
reference. (That I cut and pasted from various sources.)

Also, I found it useful to understand, that from the low-level point of view, an eMMC and SSD are
essentially the same. An SSD is basically a huge eMMC, but where the NAND chips are used in
parallel with an added DRAM cache buffer and a SATA interface operating at 5V. So the
wear-leveling etc. works in the same way, eventhough the microcontroller in an SSD is much
more advanced. (I.e. For a Samsung SSD 840 Pro, there is an 3-core Cortex R-4 running @
300MHz!) Thus, any problem you encounter in the FTL of an eMMC, you will likeely also have
in an SSD if using similar NANDs, and vice versa.


The most important and relevant documents are those of the JEDEC standard.
However, our device conforms to (JESD84) v4.41 and not v4.51, AFAIK.
"JEDEC: Embedded MultiMediaCard(eMMC) Product Standard..." (JESD84-A441)
"JEDEC: Embedded MultiMediaCard(eMMC) Electrical Standard" (JESD84-B451)
"eMMC v4.41 and v4.5" (JDEC presentation by Victor Tsai)


2013-02-09: ORIGINAL POST MOVED!

As was pointed out in the subsequent post, this is somewhat OT,
so I decided that a better home for it would be HERE.
 

Attachments

  • flash.jpg
    flash.jpg
    54 KB · Views: 103,647
  • a6cc11c498225275abcff1fefdd53de6.png
    a6cc11c498225275abcff1fefdd53de6.png
    2.1 KB · Views: 103,192
  • 1e20844b6c83913c2a9bb6c376e01995.png
    1e20844b6c83913c2a9bb6c376e01995.png
    1.7 KB · Views: 102,994
Last edited:

DualJoe

Senior Member
Oct 12, 2011
2,194
1,095
de
it's easy to find exactly where the firmware is stored on the NAND.
Correct me if i'm wrong... the firmware NAND you're talking about is the same like the eMMC (not a separate one), right?
If so, can you provide some signature bytes (maybe the first 32bytes) and the firmware length so we can dump the whole NAND with a Riffbox (AdamOutler?) and extract the firmware ourself?
 
Last edited:

AndreiLux

Senior Member
Jul 9, 2011
3,209
14,598
EVA, what exactly are you trying to achieve? Seems more off-topic than actual on-topic discussion to me.

And SSDs have absolutely nothing in common with eMMC chips, there's a wholly independent controller on SSDs which simply doesn't exist in embedded devices. The firmwares we're talking about here are not even in the same device category.
 
Last edited:

E:V:A

Inactive Recognized Developer
Dec 6, 2011
1,447
2,222
-∇ϕ
@AndreiLux: Yes, you're right, that was a bit over ambitious OT. But I'm also preventing more OT by people who will eventually post speculations about wear-leveling, and giving them the document and links to go research the topic by themselves. Increasing public knowledge will hopefully up the level speed of this discussion.

Also, the above can help explain why there are often large "empy" (non-user) partitions on Samsung phones. It could be that these act as moving "holes" to improve eMMC life. Thus if we remove them or keep our eMMC maxed out, we'll get problems much sooner than someone who has lots of space left.

But more importantly, I'm showing you that with a P/E of ~3000, it could very well be easily reached by any excessive writes, especially with eMMC firmware bugs. Also, I completely disagree that "SSDs have absolutely nothing in common with eMMC chips", they certainly do have much in common, as I stated above. An SSD basically consist of N x M Raid-0 like array of MLC NAND's, and each of those conforms to the exact same criteria as our eMMC in question. At the low-level the individual wear-leveling must be the same or very similar. (Mind you, I'm ignoring the SATA "controller" + cache memory.)

I could of course be completely wrong, but then I suggest that you provide some backup of your statement...

---

We should make a comparison of the "Smart Reports" from a working and a problematic eMMC. If these are very different, we could learn more...

Could someone dump such a report?
 
Last edited:

E:V:A

Inactive Recognized Developer
Dec 6, 2011
1,447
2,222
-∇ϕ
Not sure if this helps, but if there is any dependence on kernel version, we might figure it from this list of kernel emmc patches...

Code:
[SIZE=2]2.6.36 
        • ERASE, SECURE ERASE, TRIM, and SECURE TRIM operations (JEDEC 4.4)
        • mmc_block: Discard and secure discard support
        • SD-combo (IO+mem) support
        • Performance tests
2.6.37 
        • New sdhci-pxa driver for Marvell SoCs
        • MMC 4.4 DDR support
        • sdhci-pltfm: Platform driver for imx35/51
        • USB SD host controller (USHC) driver
2.6.39 
        • mxs-mmc: MMC host driver for i.MX23/28
3.0 
        • MMC CMD+ACMD passthrough IOCTL reliable write support
        • MMC boot partition support
        • New VUB300 USB-to-SD/SDIO/MMC driver
        • SD: Support for signal voltage switch procedure
3.2 
        • Enabled HPI for MMC cards that support this feature
        • Cache control for e·MMC 4.5 devices
        • e·MMC hardware reset support
        • Random fault injection
        • General-purpose MMC partition support (JEDEC 4.4)
        • SDHCI: e·MMC hardware reset support
        • sdhci-pci: Runtime PM support
        • mmc-test: e·MMC hardware reset test
[/SIZE]

In the meantime I'm waiting with great expectations on the code for that kernel module...
 

koalauk

Senior Member
Jan 23, 2006
343
42
Nowhere
I am sorry yes I know I am not spouse to post here as I am not a developer but I thought I ll share my little finding about SDS,

Users having SDS always confirming with the Red sensor LED staying on, As I have been a bit worried about SDS (4.1.1 UK BTU no update as of now) Everytime I rebooted (or started from switched off) I can see that bootloader checks HW as this red led comes on for about 0.6second and boot sequence continuous. But now I couldnt wait any longer for the BTU OTA update and now updated to N7100XXDMA6 N7100OJVDMA2 TURKEY rom and I can clearly see that during the startup process Red LED does not come on or HW is possibly not checked ! I hope I dont sound daft !! and SDS can be malfunctioning of Sensor board ?
 

E:V:A

Inactive Recognized Developer
Dec 6, 2011
1,447
2,222
-∇ϕ
The possibility of eMMC firmware updates is determined by "Update_Disable" bit-0
of the FW_CONFIG field, which is located in CSD-slice [169] of the CSD register.
 

Entropy512

Senior Recognized Developer
Aug 31, 2007
14,088
25,086
Owego, NY
It doesn't seem like an integer overflow, at least not a straightforward one.
This is the function they patch:
Code:
int __fastcall f_to_be_patched_function(_DWORD *out, int val)
{
  int ret; // r2@1

  ret = 0;
  if ( *off_5FC60 == val )
  {
    *out = off_5FC60;
    return 1;
  }
  if ( *off_5FC64 == val )
  {
    *out = off_5FC64;
    return 1;
  }
  *out = 0;
  return ret;
}
Both off_5FC60 and off_5FC64 point to some FTL related contexts.
This is the wrapper function they write to the RAM:
Code:
void __fastcall f_new_function_by_patch(_DWORD *out, int val)
{
  if ( !f_to_be_patched_function(out, val) )
  {
    while ( 1 )
      ;
  }
}
The BL instruction that is being patched used to call the old function (f_to_be_patched_function), without checking its return value, hence the bug.
What's so strange about it is that "f_to_be_patched_function" is called from many other locations in the code, without checking the return value! So the bug exists in other locations as well.
Either the other locations don't cause internal metadata corruption, or they are just so rare that Samsung didn't even bother to patch them.
So it sorta looks like in the original firmware, it's (bear with me, this is really fugly pseudocode)
Code:
if( !some_sanity_check_here())
{
  crater_the_chip();
}
(where, obviously, crater_the_chip() is not actually a function, but it is what happens if that sanity check ever fails when called from that part of the code...)

Now it's
Code:
if( !some_sanity_check_here())
{
  hang_chip_until_reset();
}

Wait, so we do have a way to boot Exynos 4412 devices (Galaxy S3) from the mmc1 bus?
If so, why isn't SDS fixable?
It's possible to boot from the MMC1 bus.

SDS is still not fixable since at this point, the internal eMMC is hosed at a very low level - unless we can figure out how to do a full reset/wipe of the eMMC chip from the main eMMC interface (we know that this is theoretically possible as Ken Sumrall of Google had access to such a procedure but was not able to provide us the info on it due to NDAs, but do not have any examples of performing this procedure due to aforementioned NDAs). Same reason Superbricked devices can't even be repaired using JTAG.

Some SDSed devices behaved similarly to how many Superbricked devices behaved - parts of the chip worked OK (including the bootloader), others were hosed. Quite a few people who suffered from SDS were able to boot into download mode but not write to any part of the chip.
 
  • Like
Reactions: Rob2222

Top Liked Posts

  • There are no posts matching your filters.
  • 53
    Update from Feb 17th:
    Samsung has started to upgrade eMMC firmwares on the field - only for GT-I9100 for now.
    See post #79 for additional details.

    Update from Feb 13th:
    If you want to dump the eMMC's RAM yourself, go ahead to post #72.
    I'm looking for a dump of firmware revision 0xf7 if you've got one.
    -----------------------


    Since it's very likely that the recent eMMC firmware patch by Samsung is their patch for the "sudden death" issue, it would be very nice to understand what is really going on there.

    According to a leaked moviNAND datasheet, it seems that MMC CMD62 is vendor-specific command that moviNAND implements.
    If you issue CMD62(0xEFAC62EC), then CMD62(0xCCEE) - you can read a "Smart report". To exit this mode, issue CMD62(0xEFAC62EC), then CMD62(0xDECCEE).


    So what are they doing in their patch?

    1. Whenever an MMC is attached:
    a. If it is "VTU00M", revision 0xf1, they read a Smart report.
    b. The DWORD at Smart[324:328] represents a date (little-endian); if it is not 0x20120413, they don't patch the firmware. (Maybe only chips from 2012/04/13 are buggy?)
    2. If the chip is buggy, whenever an MMC is attached or the device is resumed:
    a. Issue CMD62(0xEFAC62EC) CMD62(0x10210000) to enter RAM write mode. Now you can write to RAM by issuing MMC_ERASE_GROUP_START(Address to write) MMC_ERASE_GROUP_END(Value to be written) MMC_ERASE(0).
    b. *(0x40300) = 10 B5 03 4A 90 47 00 28 00 D1 FE E7 10 BD 00 00 73 9D 05 00
    c. *(0x5C7EA) = E3 F7 89 FD
    d. Exit RAM write mode by issuing CMD62(0xEFAC62EC) CMD62(0xDECCEE).
    10 B5 looks like a common Thumb push (in ARM architecture). Disassembling the bytes that they write to 0x40300 yields the following code:
    Code:
    ROM:00040300                 PUSH    {R4,LR}
    ROM:00040302                 LDR     R2, =0x59D73
    ROM:00040304                 BLX     R2
    ROM:00040306                 CMP     R0, #0
    ROM:00040308                 BNE     locret_4030C
    ROM:0004030A
    ROM:0004030A loc_4030A                               ; CODE XREF: ROM:loc_4030Aj
    ROM:0004030A                 B       loc_4030A
    ROM:0004030C ; ---------------------------------------------------------------------------
    ROM:0004030C
    ROM:0004030C locret_4030C                            ; CODE XREF: ROM:00040308j
    ROM:0004030C                 POP     {R4,PC}
    ROM:0004030C ; ---------------------------------------------------------------------
    Disassembling what they write to 0x5C7EA yields this:
    Code:
    ROM:0005C7EA                 BL      0x40300
    Looks like it is indeed Thumb code.
    If we could dump the eMMC RAM, we would understand what has been changed.


    By inspecting some code, it seems that we know how to dump the eMMC RAM:
    Look at the function mmc_set_wearlevel_page in line 206. It patches the RAM (using the method mentioned before), then it validates what it has written (in lines 255-290). Seems that the procedure to read the RAM is as following:
    1. CMD62(0xEFAC62EC) CMD62(0x10210002) to enter RAM reading mode
    2. MMC_ERASE_GROUP_START(Address to read) MMC_ERASE_GROUP_END(Length to read) MMC_ERASE(0)
    3. MMC_READ_SINGLE_BLOCK to read the data
    4. CMD62(0xEFAC62EC) CMD62(0xDECCEE) to exit RAM reading mode


    I don't want to run this on my device, because I'm afraid - messing with the eMMC doesn't sound like a very good idea on my device (I don't have a spare one).
    Does someone have a development device which he doesn't mind to risk, and want to dump the eMMC firmware from it? :)
    28
    Okay, got a RAM dump :)
    I won't post it here (or anywhere else for that matter) because I don't want to get sued by Samsung.

    I might release a kernel which allows you to dump the RAM yourself if there's enough demand, but I don't want to right now, because:
    1. The code is ugly as hell, not implemented as a kernel module, not thread-safe etc.
    2. It is highly dangerous (messing with the eMMC chip - I really don't know how much stable this thing is), so if you want to do it on your device, you should be an expert. In that case, you can write the code yourself (with little effort) :)


    Anyway, I hope the FTL is Whimory, since I'm familiar with it. Would be easier.
    I'll let you know if I find anything interesting.


    PS I've attached a little teaser. (Yes, this is the patched function. 0x40300 is red because I've opened a partial RAM dump.)



    EDIT - Some initial results:
    0. The CPU is a Cortex-M3.
    1. No strings at all :( Just some uninteresting release asserts ("REL_ASSERT")
    2. Found the Smart Report generator function -> found the MMC command handlers.
    3. Most MMC commands handlers are stored in a function table. There are 3 special commands: MMC60, MMC62, MMC64. Depends on the arguments these special commands are provided, they modify the function table (this is the so called "vendor mode").
    4. There are a lot of possible arguments for MMC62, not the only ones we know.
    5. If you trace back the function they patch all the way up the call stack, you get to MMC24 and MMC25 handler. These commands are MMC_WRITE_BLOCK and MMC_WRITE_MULTIPLE_BLOCK. Since the function they patch is deep down the call stack, it's very likely that it is the wear level.

    Anyway, because of the lack of strings I guess it would be very hard to truly understand the SDS bug we're facing :(
    18
    Just a quick update: thanks to a kernel compiled by AndreiLux, and thanks to artesea for doing an eMMC RAM dump on his device, we've got the 0xf7 firmware!

    It seems that it is runnable on the same hardware. It means that we can probably field upgrade I9300 devices, just as Samsung does with I9100.
    The interesting question is whether we're able to preserve the data on the eMMC during the process. If the answer is no, a firmware upgrade would require PIT repartitioning and reflashing of SBOOT so that the device won't become a brick.
    16
    So I decided to do a small RAM dump after all.

    Before the patch, 0x5C7EA reads FD F7 C2 FA, which is "BL 0x59D72".
    As I thought, they replace a function call to the new one.

    I will dump function 0x59D72 later this week.
    16
    Got a kernel log from just after such a freeze.

    I was about to power on the screen but nothing happen. Then I waited around 10 minutes and the screen came finally up and I dumped the log.

    Is this interesting? :D

    Full log is attached.

    Code:
    U/ 4002.738352  c0 [keys]PWR 1
    U/ 4002.983296  c0 [keys]PWR 0
    ...
    U/ 4587.514100  c0 mshci: ===========================================
    W/ 4587.514336  c0 mmc0: it occurs a critical error on eMMC it'll try to recover eMMC to normal state
    ....
    V/ 4587.850296  c0 mmc0: recovering eMMC has been done
    ...
    W/ 4587.850849  c0 mmcblk0: unknown error -131 sending read/write command, card status 0x900
    W/ 4587.851982  c0 end_request: I/O error, dev mmcblk0, sector 3126872
    W/ 4587.852174  c0 end_request: I/O error, dev mmcblk0, sector 3126880
    W/ 4587.852330  c0 end_request: I/O error, dev mmcblk0, sector 3126888


    EDIT: Added another log. Will add more, if I get more.


    BR
    Rob