You can dump all of your "live" RAM with Lime forensic tool:
http://code.google.com/p/lime-forensics/
But it may be overkill...
http://code.google.com/p/lime-forensics/
But it may be overkill...
A .ko (preferably with source ) would be great. It would save me a lot of time implementing the dumper myself. Ideally I'd like to get dumps of VYL00M or MAG4FA 0x19, along with 0x25.I have a Hex-Rays license. I actually reverse most of the time using it; I posted assembly code since it's easier to understand with these short snippets (in my point of view).
I won't post a RAM dump since it contains (probably?) licensed code.
I can however post the memory map:
0x00000000 - 0x00020000 BootROM (I guess it's a mask ROM)
0x00040000 - 0x00060000 Firmware (resides in RAM, the BootROM reads it from the NAND chip itself so it's upgradable!)
0x00060000 - 0x00080000 Data (no dynamic memory there BTW)
0x20000000 - 0x20028000 eMMC interface MMIO
0x20080000 - 0x20080400 I don't know, maybe another eMMC interface MMIO?
0x40000000 - 0x40010000 NAND interface MMIO
I can send you my RAM dump over IRC if you'd like. Besides that, I contemplate posting a .ko which exports the RAM over a character device (this is how I dumped it).
And, yes, dumping the new firmwares to see what has changed is super-cool
At first I didn't quite understand what was going on, but now when I see what is happening... Excellent!
I'd love to see some tool to come out of this, to read eMMC RAM. I can see several cross-platform applications for this! Something like "viewmem", but "viewemmc" instead. Would this be feasible?
@ Entropy512: Could this be incorporated into your "Got Brickbug?" app, or something similar?
...
In our local forum we get some reports about a rising count of locks and restarts on S3's in the last time. Some like my freeze.
It also seems that after a while this problems gets better and even disappear completely.
Cause of that I am thinking, if it could be, that the fix maybe locks the eMMC if it finds a bad data structure, then this locks maybe could bring a phone-freeze (already stated that), and in the same time it repairs the data structure in this block with the bad data structure.
At least this would explain some rising count of freezes with the fix and the point, that the freezes become less and less over time...
...I have no clue how the algorithms work, but maybe it uses some sort of pseudo-random data to do whatever, with the same seed on all eMMCs... and thus all of them go through the same series of numbers. And now imagine the error condition is only triggered by a specific number or number set (say someone screwed up a boundary condition). Under this theory the error condition wouldn't appear randomly, but after a certain amount of write ops (or something).
* If someone has a BinDiff license and wants to help, it'd be great!
My bad. Sleepy eyes. (Chainfire's app)
That's not my app. It also probably couldn't be integrated into any detection app, BUT it would be interesting to see what the differences are between MAG4FA 0x19 (BAD) and MAG4FA 0x25 (good other than a far less nasty "wear leveller randomly inserts 32kb of zeros" bug).
Maybe even if there were a way to write the RAM back to NAND, then the chip could be reset/wiped - we know this is possible but dangerous. It could only be researched by someone who has JTAG access to an affected device, since no affected device has any known way to boot from USB or SDCard.
Not possible, since the patch was excised from a MASSIVE kernel update with thousands of lines of changes. There is zero commit history for the tarball drop. The tagging of Andrei's patch as "Samsung OSRC" was some custom hackery by him - he diffed two kernels, split up the commits, and set authorship to "Samsung OSRC".Why doesn't someone just email whoever made the patch? Perhaps he/she could at least explain the reasoning behind, without giving out all Samsung "secrets".
How an eMMC does internal wear levelling is up to the manufacturer - eMMC only defines the external interface.I'm thinking the same thing. Just like the "bad sectors" on a good old HD, perhaps the bad "sectors" on an eMMC is getting avoided or tagged as bad by the wear leveling algorithm. These "tags" also have to be written somewhere, so if these function was screwed up somehow, I guess you'd get corruption although most part still functioning. The "patch-fix" then slowly discovers these errors and avoids them, causing us to see a decrease of problems.
There must be something written somewhere about wear-leveling of eMMC's...
No internal firmware seems to be the risk that Samsung was most concerned about when they decided not to release Superbrick repair code - Supposedly if the firmware update doesn't go perfectly the chip is 100% toast. (However, language barrier and such could have really meant just that the device's bootloaders were hosed...)I think it is possible to update the firmware.
Except for CMD62, there are 2 more vendor specific commands (CMD60 and CMD64). I think I saw somewhere a command which updates the firmware on the NAND; I'm not sure now but I'll check it later. The BootROM is also very small so it's easy to find exactly where the firmware is stored on the NAND.
About the danger with this process, I think it's mostly due to the risk of having no bootloader or no Movi firmware. However, I think the Movi BootROM has a recovery mode, so if we're somehow able to boot the device from the mmc1 bus (SD card), we're okay.
Anyway, later this week I'll write that .ko (currently I just edited the Linux MMC subsystem code ) and push to Github.
It doesn't seem like an integer overflow, at least not a straightforward one.SDS seems to be a case of some function potentially returning 0 (maybe due to integer overflow? The statistics of the issue and how it suddenly "spiked" after a number of months of usage screams overflow to me), and that 0 then being treated as data instead of an error, corrupting data structures right and left.
int __fastcall f_to_be_patched_function(_DWORD *out, int val)
{
int ret; // r2@1
ret = 0;
if ( *off_5FC60 == val )
{
*out = off_5FC60;
return 1;
}
if ( *off_5FC64 == val )
{
*out = off_5FC64;
return 1;
}
*out = 0;
return ret;
}
void __fastcall f_new_function_by_patch(_DWORD *out, int val)
{
if ( !f_to_be_patched_function(out, val) )
{
while ( 1 )
;
}
}
Unfortunately, as we don't have a security-dropped IBL that is signed for Exynos 4210, there is no SDCard or USB recovery available for 4210 devices like there is for Exynos3 and for 4412 devices. If you kill the bootloaders, JTAG is it.
I think it is possible to update the firmware.
Except for CMD62, there are 2 more vendor specific commands (CMD60 and CMD64). I think I saw somewhere a command which updates the firmware on the NAND; I'm not sure now but I'll check it later.
I'll just start filling in this myself...
a) what exact devices are having problems?
- GT-I9300/3 with 16 GB MoviNAND
b) What exact eMMC cards do they have? (Samsung part-no/name)
c) Is that leaked datasheet in OP, for any of those in (b)?
- <unknown>
d) What die size "technology" are these eMMC's using? (25 nm, 34 nm or other?)
e) Do you know anything about how people with eMMC problems use their devices?
f) The Linux kernel version for problematic devices...
Code:Model: Samsung GT-I9300 Chip: KMVTU000LM-B503 Part No: 1108-000424 ? Size: eMMC(16GB)+MDDR(64MB) eMMC ID: VTU00M eMMC FW Rev.: 0xF1
It doesn't seem like an integer overflow, at least not a straightforward one. This is the function they patch:
...
Wait, so we do have a way to boot Exynos 4412 devices (Galaxy S3) from the mmc1 bus? If so, why isn't SDS fixable?
Correct me if i'm wrong... the firmware NAND you're talking about is the same like the eMMC (not a separate one), right?it's easy to find exactly where the firmware is stored on the NAND.
[SIZE=2]2.6.36
• ERASE, SECURE ERASE, TRIM, and SECURE TRIM operations (JEDEC 4.4)
• mmc_block: Discard and secure discard support
• SD-combo (IO+mem) support
• Performance tests
2.6.37
• New sdhci-pxa driver for Marvell SoCs
• MMC 4.4 DDR support
• sdhci-pltfm: Platform driver for imx35/51
• USB SD host controller (USHC) driver
2.6.39
• mxs-mmc: MMC host driver for i.MX23/28
3.0
• MMC CMD+ACMD passthrough IOCTL reliable write support
• MMC boot partition support
• New VUB300 USB-to-SD/SDIO/MMC driver
• SD: Support for signal voltage switch procedure
3.2
• Enabled HPI for MMC cards that support this feature
• Cache control for e·MMC 4.5 devices
• e·MMC hardware reset support
• Random fault injection
• General-purpose MMC partition support (JEDEC 4.4)
• SDHCI: e·MMC hardware reset support
• sdhci-pci: Runtime PM support
• mmc-test: e·MMC hardware reset test
[/SIZE]
So it sorta looks like in the original firmware, it's (bear with me, this is really fugly pseudocode)It doesn't seem like an integer overflow, at least not a straightforward one.
This is the function they patch:
Both off_5FC60 and off_5FC64 point to some FTL related contexts.Code:int __fastcall f_to_be_patched_function(_DWORD *out, int val) { int ret; // r2@1 ret = 0; if ( *off_5FC60 == val ) { *out = off_5FC60; return 1; } if ( *off_5FC64 == val ) { *out = off_5FC64; return 1; } *out = 0; return ret; }
This is the wrapper function they write to the RAM:
The BL instruction that is being patched used to call the old function (f_to_be_patched_function), without checking its return value, hence the bug.Code:void __fastcall f_new_function_by_patch(_DWORD *out, int val) { if ( !f_to_be_patched_function(out, val) ) { while ( 1 ) ; } }
What's so strange about it is that "f_to_be_patched_function" is called from many other locations in the code, without checking the return value! So the bug exists in other locations as well.
Either the other locations don't cause internal metadata corruption, or they are just so rare that Samsung didn't even bother to patch them.
if( !some_sanity_check_here())
{
crater_the_chip();
}
if( !some_sanity_check_here())
{
hang_chip_until_reset();
}
It's possible to boot from the MMC1 bus.Wait, so we do have a way to boot Exynos 4412 devices (Galaxy S3) from the mmc1 bus?
If so, why isn't SDS fixable?
ROM:00040300 PUSH {R4,LR}
ROM:00040302 LDR R2, =0x59D73
ROM:00040304 BLX R2
ROM:00040306 CMP R0, #0
ROM:00040308 BNE locret_4030C
ROM:0004030A
ROM:0004030A loc_4030A ; CODE XREF: ROM:loc_4030Aj
ROM:0004030A B loc_4030A
ROM:0004030C ; ---------------------------------------------------------------------------
ROM:0004030C
ROM:0004030C locret_4030C ; CODE XREF: ROM:00040308j
ROM:0004030C POP {R4,PC}
ROM:0004030C ; ---------------------------------------------------------------------
ROM:0005C7EA BL 0x40300
U/ 4002.738352 c0 [keys]PWR 1
U/ 4002.983296 c0 [keys]PWR 0
...
U/ 4587.514100 c0 mshci: ===========================================
W/ 4587.514336 c0 mmc0: it occurs a critical error on eMMC it'll try to recover eMMC to normal state
....
V/ 4587.850296 c0 mmc0: recovering eMMC has been done
...
W/ 4587.850849 c0 mmcblk0: unknown error -131 sending read/write command, card status 0x900
W/ 4587.851982 c0 end_request: I/O error, dev mmcblk0, sector 3126872
W/ 4587.852174 c0 end_request: I/O error, dev mmcblk0, sector 3126880
W/ 4587.852330 c0 end_request: I/O error, dev mmcblk0, sector 3126888