eMMC sudden death research

Search This thread

Oranav

Senior Member
Oct 9, 2010
53
265
Just a quick update: thanks to a kernel compiled by AndreiLux, and thanks to artesea for doing an eMMC RAM dump on his device, we've got the 0xf7 firmware!

It seems that it is runnable on the same hardware. It means that we can probably field upgrade I9300 devices, just as Samsung does with I9100.
The interesting question is whether we're able to preserve the data on the eMMC during the process. If the answer is no, a firmware upgrade would require PIT repartitioning and reflashing of SBOOT so that the device won't become a brick.
 

E:V:A

Inactive Recognized Developer
Dec 6, 2011
1,447
2,222
-∇ϕ
As requested, HERE are the ./drivers/mmc sources (GT-I9100 JB).


[EDIT] (According to Entropy512, the stuff below is not related to FFU...)

To get an idea what chips they're fixing, we look in:
./drivers/mmc/card/block.c
Code:
...
[SIZE=2]static const struct mmc_fixup blk_fixups[] =
{
    MMC_FIXUP("SEM02G", 0x2, 0x100, add_quirk, MMC_QUIRK_INAND_CMD38),
    MMC_FIXUP("SEM04G", 0x2, 0x100, add_quirk, MMC_QUIRK_INAND_CMD38),
    MMC_FIXUP("SEM08G", 0x2, 0x100, add_quirk, MMC_QUIRK_INAND_CMD38),
    MMC_FIXUP("SEM16G", 0x2, 0x100, add_quirk, MMC_QUIRK_INAND_CMD38),
    MMC_FIXUP("SEM32G", 0x2, 0x100, add_quirk, MMC_QUIRK_INAND_CMD38),
    /*
     * Some MMC cards experience performance degradation with CMD23
     * instead of CMD12-bounded multiblock transfers. For now we'll
     * black list what's bad...
     * - Certain Toshiba cards.
     *
     * N.B. This doesn't affect SD cards.
     */
    MMC_FIXUP("MMC08G", 0x11, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_BLK_NO_CMD23),
    MMC_FIXUP("MMC16G", 0x11, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_BLK_NO_CMD23),
    MMC_FIXUP("MMC32G", 0x11, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_BLK_NO_CMD23),
    /*Some issue about secure erase/secure trim for Samsung MoviNAND*/
    MMC_FIXUP("M8G2FA", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_MOVINAND_SECURE),
    MMC_FIXUP("MAG4FA", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_MOVINAND_SECURE),
    MMC_FIXUP("MBG8FA", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_MOVINAND_SECURE),
    MMC_FIXUP("MCGAFA", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_MOVINAND_SECURE),
    MMC_FIXUP("VAL00M", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_MOVINAND_SECURE),
    MMC_FIXUP("VYL00M", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_MOVINAND_SECURE),
    MMC_FIXUP("KYL00M", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_MOVINAND_SECURE),
    MMC_FIXUP("VZL00M", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_MOVINAND_SECURE),

    END_FIXUP
};
[/SIZE]...
To decipher that you may need to look at ./include/linux/mmc/card.h HERE.

Code:
[SIZE=2]
...
#define _FIXUP_EXT(_name, _manfid, _oemid, _rev_start, _rev_end, _cis_vendor, _cis_device, _fixup, _data) \
    {                                   \
        .name = (_name),                \
        .manfid = (_manfid),            \
        .oemid = (_oemid),              \
        .rev_start = (_rev_start),      \
        .rev_end = (_rev_end),          \
        .cis_vendor = (_cis_vendor),    \
        .cis_device = (_cis_device),    \
        .vendor_fixup = (_fixup),       \
        .data = (_data),                \
     }

#define MMC_FIXUP_REV(_name, _manfid, _oemid, _rev_start, _rev_end, _fixup, _data) \
        _FIXUP_EXT(_name, _manfid, _oemid, _rev_start, _rev_end, SDIO_ANY_ID, SDIO_ANY_ID, _fixup, _data)
#define MMC_FIXUP(_name, _manfid, _oemid, _fixup, _data) \
        MMC_FIXUP_REV(_name, _manfid, _oemid, 0, -1ull, _fixup, _data)
[/SIZE]...
 
Last edited:

Entropy512

Senior Recognized Developer
Aug 31, 2007
14,088
25,086
Owego, NY
Just a quick update: thanks to a kernel compiled by AndreiLux, and thanks to artesea for doing an eMMC RAM dump on his device, we've got the 0xf7 firmware!

It seems that it is runnable on the same hardware. It means that we can probably field upgrade I9300 devices, just as Samsung does with I9100.
The interesting question is whether we're able to preserve the data on the eMMC during the process. If the answer is no, a firmware upgrade would require PIT repartitioning and reflashing of SBOOT so that the device won't become a brick.
Yeah. My guess is that the new FFU firmware is specially designed to fix Superbrick without requiring a low-level reformat (like most firmware upgrades do. I know at least back a year ago, there was no known way to go from 0x19 to 0x25 without a full reset.)

As requested, HERE are the ./drivers/mmc sources (GT-I9100 JB).

To get an idea what chips they're fixing, we look in:
./drivers/mmc/card/block.c
Code:
...
[SIZE=2]static const struct mmc_fixup blk_fixups[] =
{
    MMC_FIXUP("SEM02G", 0x2, 0x100, add_quirk, MMC_QUIRK_INAND_CMD38),
    MMC_FIXUP("SEM04G", 0x2, 0x100, add_quirk, MMC_QUIRK_INAND_CMD38),
    MMC_FIXUP("SEM08G", 0x2, 0x100, add_quirk, MMC_QUIRK_INAND_CMD38),
    MMC_FIXUP("SEM16G", 0x2, 0x100, add_quirk, MMC_QUIRK_INAND_CMD38),
    MMC_FIXUP("SEM32G", 0x2, 0x100, add_quirk, MMC_QUIRK_INAND_CMD38),
    /*
     * Some MMC cards experience performance degradation with CMD23
     * instead of CMD12-bounded multiblock transfers. For now we'll
     * black list what's bad...
     * - Certain Toshiba cards.
     *
     * N.B. This doesn't affect SD cards.
     */
    MMC_FIXUP("MMC08G", 0x11, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_BLK_NO_CMD23),
    MMC_FIXUP("MMC16G", 0x11, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_BLK_NO_CMD23),
    MMC_FIXUP("MMC32G", 0x11, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_BLK_NO_CMD23),
    /*Some issue about secure erase/secure trim for Samsung MoviNAND*/
    MMC_FIXUP("M8G2FA", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_MOVINAND_SECURE),
    MMC_FIXUP("MAG4FA", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_MOVINAND_SECURE),
    MMC_FIXUP("MBG8FA", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_MOVINAND_SECURE),
    MMC_FIXUP("MCGAFA", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_MOVINAND_SECURE),
    MMC_FIXUP("VAL00M", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_MOVINAND_SECURE),
    MMC_FIXUP("VYL00M", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_MOVINAND_SECURE),
    MMC_FIXUP("KYL00M", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_MOVINAND_SECURE),
    MMC_FIXUP("VZL00M", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_MOVINAND_SECURE),

    END_FIXUP
};
[/SIZE]...
To decipher that you may need to look at ./include/linux/mmc/card.h HERE.

Code:
[SIZE=2]
...
#define _FIXUP_EXT(_name, _manfid, _oemid, _rev_start, _rev_end, _cis_vendor, _cis_device, _fixup, _data) \
    {                                   \
        .name = (_name),                \
        .manfid = (_manfid),            \
        .oemid = (_oemid),              \
        .rev_start = (_rev_start),      \
        .rev_end = (_rev_end),          \
        .cis_vendor = (_cis_vendor),    \
        .cis_device = (_cis_device),    \
        .vendor_fixup = (_fixup),       \
        .data = (_data),                \
     }

#define MMC_FIXUP_REV(_name, _manfid, _oemid, _rev_start, _rev_end, _fixup, _data) \
        _FIXUP_EXT(_name, _manfid, _oemid, _rev_start, _rev_end, SDIO_ANY_ID, SDIO_ANY_ID, _fixup, _data)
#define MMC_FIXUP(_name, _manfid, _oemid, _fixup, _data) \
        MMC_FIXUP_REV(_name, _manfid, _oemid, 0, -1ull, _fixup, _data)
[/SIZE]...
That's a completely different patch than the FFU stuff. That's Samsung's original Superbrick patch, which was merged to mainline Linux in early September, showed up in I9300 JB (despite being of no use on I9100), and first showed up on an affected device with the I9100 HK JB source drop (which did NOT have FFU).

The original patch would just take all secure erase calls and translate them into nonsecure erase.

The FFU code is completely self-contained and has no interaction with the quirks code.
 
  • Like
Reactions: E:V:A

Entropy512

Senior Recognized Developer
Aug 31, 2007
14,088
25,086
Owego, NY
Hmm. I need to take a closer look at this FFU stuff.

When running on my N7000 (VYL00M 0x19):
Code:
<1>[  338.705203] c0 Found platform driver dw_mmc : c0cf8db8
<1>[  338.705294] c0 Found platform device dw_mmc : c0cc8fb8
<1>[  338.705310] c0 FFU getting host data success.
<1>[  338.705333] c0 mmc0: clock 33333333Hz busmode 2 powermode 2 cs 0 Vdd 21 width 3 timing 5
<1>[  338.720523] c0 FFU It's not a VHX. 0000

So whatever it's meant to update, it appears to be something other than a Superbrick-vulnerable eMMC.

I'll take a deeper look at this tonight - such as modifying it to display what it DID get from the Smart Report.
 

E:V:A

Inactive Recognized Developer
Dec 6, 2011
1,447
2,222
-∇ϕ
Does someone know what CMD60 and CMD61 does?

According to THIS old (2006) EE-Times article, it seem that they are connected
with CE-ATA interface, to provide backward compatibility when the eMMC chips
are used in a CF+ configuration. I guess it could also be involved when they
are used in a SSD RAID configuration. But this should be confirmed... Here's
what they say. [Attention, the EE-Times bastards only allow you to see 2-pages!]
CE-ATA is based primarily on a combination of the MMC electrical interface
standard and on an optimized subset of the ATA command protocol. CE-ATA
specifications include:

  • Scalable transfer rates up to 52 MB/s
  • Low pin count with six or ten interface signals, depending on data transfer rate requirements
  • A streamlined ATA command set — only five commands
  • Five MMC commands, two exclusively developed for CE-ATA
  • A simple protocol
MMC commands in CE-ATA scope

CE-ATA utilizes five MMC commands during the course of normal execution.
Resets are performed using the “GO_IDLE_STATE (CMD0)”, aborting an ATA command
is done by issuing “STOP_TRANSMISSION (CMD12)”, byte-wise access to the ATA
taskfile register is achieved using “FAST_IO (CMD39)”, issuance of an ATA
command or access to the status and control registers is executed by
“RW_MULTIPLE_REGISTER (CMD60)”, and ATA command data transfer is achieved
using “RW_MULTIPLE_BLOCK (CMD61)”.

CMD60 and CMD61 are MMC commands newly defined by CE-ATA for efficient ATA
command execution. CE-ATA utilizes the same MMC command sequence for
initialization as traditional MMC devices. The ATA operation occurs within the
MMC “TRAN” state.

ATA processing on MMC

CE-ATA maps the streamlined ATA command set onto the MMC interface. The ATA
taskfile is mapped onto the MMC register space starting at MMC address zero as
outlined in Figure 1. To accommodate future capacity growth, and to ensure
that large transfers can be done, 48-bit addressing is accommodated from the
start. Transferring a single song from the device typically requires more than
128KB of data, which is the maximum transfer size with only 28-bit LBA
addressing.

...

CMD60 : RW multiple register

CMD60 allows multiple registers to be read/written in a single transaction
sequence. Using CMD60, an ATA command can be issued using a single MMC
command. Alternatively, software may have to execute up to 13 individual
taskfile register writes with "FAST_IO" in order to issue a single ATA
command.

CMD61 : RW multiple block

CMD61 is used to transfer the data for an ATA command (like READ DMA EXT).
Data transfer for media access commands must be multiples of the CE-ATA sector
size that is a minimum of 4KB. Each transfer is broken into multiple MMC data
blocks that are 512 bytes, 1KB, or 4KB in size as negotiated between the host
and the device.

The question is why they have included CMD60 in movinand.c, since it is never
actually used anywhere, as far as I can see.
So perhaps it has a different use altogether...

Code:
[SIZE=2]int mmc_vendor_cmd60(unsigned int arg) {
    int err;
    struct mmc_host *host = ffu_mmc_host();
    unsigned int claim = 0;
    int resp;
    struct mmc_command cmd = {0};

    if( host ) {
        claim = host->claimed;
        cmd.opcode = 60;
        cmd.arg = arg;
        cmd.flags = MMC_RSP_SPI_R1B;

        host->claimed = 1;
        err = mmc_wait_for_cmd(host, &cmd, 3);
        host->claimed = claim;
        if (err) {
            printk(KERN_ALERT"FFU : CMD60 error with arg : %x\n",arg);
            return err;
        }

        do{ mmc_send_status(&resp); }while(resp != 0x900);
        return 0;
    }
    return -1;
}[/SIZE]
 
Last edited:

E:V:A

Inactive Recognized Developer
Dec 6, 2011
1,447
2,222
-∇ϕ
@ Entropy512:

Yes, the FW binary in fw.h seem to be for the "VHX" microcontroller. However,
there are two problems that could occur here. One is that the I9300 is using a
different chip and thus probably also using a different µ-Controller, and/or
µC FW version (as we have seen). That's why I asked earlier to check what
exact eMMC they use in these phones. The best way to check is to open and
look! Second, is that if you're on an older chip, that has not been updated,
you'll get hit in the face by the "Catch-22" bug. See below.

Check HERE:
Code:
[SIZE=2]Size    Cont*   Part Number             PKG             PKG Dimension   MMC Version     Class
--------------------------------------------------------------------------------------------------
4GB     VFX_U   KLM4G1HE3F-B00x         153-ball BGA    11.5mm x 13mm   eMMC4.41        Class50         
8GB     VFX_U   KLM8G2FE3B-B00x         153-ball BGA    11.5mm x 13mm   eMMC4.41        Class100        
16GB    [B]VHX[/B]     KLMAG2GE4A-A00x         169-ball BGA    12mm x 16mm     eMMC4.41        Class400        
32GB    [B]VHX[/B]     KLMBG4GE4A-A00x         169-ball BGA    12mm x 16mm     eMMC4.41        Class400        
64GB    [B]VHX[/B]     KLMCG8GE4A-A00x         169-ball BGA    12mm x 16mm     eMMC4.41        Class400        
16GB    VHX2    KLMAG2GE2A-A00x         169-ball BGA    12mm x 16mm     eMMC 4.5        Class1500       
32GB    VHX2    KLMBG4GE2A-A00x         169-ball BGA    12mm x 16mm     eMMC 4.5        Class1500       
64GB    VHX2    KLMCG8GE2A-A00x         169-ball BGA    12mm x 16mm     eMMC 4.5        Class1500       
--------------------------------------------------------------------------------------------------
Cont* = Controller[/SIZE]
The check is done in movinand.c, by comparing the first 4-bytes of the 16-byte
"FW Patch Version" field of the eMMC Smart Report [327:312]. This is located
at offset 0x1c in the fw.bin. It should be noted that the Smart Report data
structure, as specified for the MoviNAND document (in OP) was changed with
newer chips. So this could potentially cause problems for updating, since an
older FW version would also have a different structure without the "FW Patch
Version" data, than the newer version, potentially causing a Catch-22 error!

Code:
int mmc_smartreport(void)
{
...
if( !(buff[312] == 'V' && buff[313] == 'H' && buff[314] == 'X' && buff[315] == '0') ) {
    printk(KERN_ALERT"FFU It's not a VHX. %x%x%x%x", buff[312], buff[313], buff[314], buff[315]);
    return -1;
}
...
Perhaps you can dump your FW binary, so that we can have a look and compare?

The "old" MoviNAND Smart Report structure:
Code:
[SIZE=2]#-------------------------------------------------------------------------------
# Smart Report Output Data:     (from: http://tiny.cc/2jacsw )
#-------------------------------------------------------------------------------
#DataSlice  Width   Field           Remark
#-------------------------------------------------------------------------------
#[3:0]          4   Error Mode              Normal:             0xD2D2D2D2
#                                           OpenFatalError:     0x37373737
#                                           RuntimeFatalError:  0x5C5C5C5C
#                                           MetaBrokenError:    0xE1E1E1E1
#[7:4]          4   Super Block Size                [1] Total Size of simultaneously erasable physical blocks
#[11:8]         4   Super Page Size                 [2] Total Size of simultaneously programmable physical pages
#[15:12]        4   Optimal Write Size              [3] Write size at which the device performs best
#[19:16]        4   Number Of Banks                 Number of banks connecting to each NAND flash
#[23:20]        4   Bank0 Init Bad Block            Number of initial defective physical blocks
#[27:24]        4   Bank0 Runtime Bad Block         Number of runtime defective physical blocks
#[31:28]        4   Bank0 Remain Reserved Block     Number of remain reserved physical blocks
#[35:32]        4   Bank1 Init Bad Block            "
#[39:36]        4   Bank1 Runtime Bad Block         "
#[43:40]        4   Bank1 Remain Reserved Block     "
#[47:44]        4   Bank2 Init Bad Block            "
#[51:48]        4   Bank2 Runtime Bad Block         "
#[55:52]        4   Bank2 Remain Reserved Block     "
#[59:56]        4   Bank3 Init Bad Block            "
#[63:60]        4   Bank3 Runtime Bad Block         "
#[67:64]        4   Bank3 Reserved Block            "
#[71:68]        4   Max. Erase Count                    Maximum erase count from among all physical blocks
#[75:72]        4   Min. Erase Count                    Minimum erase count from among all physical blocks
#[79:76]        4   Avg. Erase Count                    Average erase count of all physical blocks
#*****************  CHANGES FROM HERE ON  **************************************
#[83:80]        4   Number of ECC Uncorrectable Error
#[143:84]    30x2    ECC Uncorrectable Error Location    Physical Block Address of ECC Uncorrectable Error
#[203:144]   30x2    ECC Uncorrectable Error Location    Physical Page Offset of ECC Uncorrectable Error
#[219:204]   (16)    Reserved
#[223:220]      4   Read Reclaim Count                  Number of Read Reclaim Count
#[511:224]  (288)   Reserved
#-------------------------------------------------------------------------------
#[1] Number of Channel * N-way Interleaving * physical block size
#[2] Number of Channel * physical page size
#[3] Super Page Size * N-way Interleaving
#-------------------------------------------------------------------------------
[/SIZE]
The "new" structure:
Code:
[SIZE=2]...
# [83:80]       4               Read Reclaim Count
# [87:84]       4               Optimal Trim Size
# [119:88]      32              Firmware Hash Code
# [123:120]     4               SLC Erase Count Max. 
# [127:124]     4               SLC Erase Count Min. 
# [131:128]     4               SLC Erase Count Avg. 
# [135:132]     4               MLC Erase Count Max. 
# [139:136]     4               MLC Erase Count Min. 
# [143:140]     4               MLC Erase Count Avg. 
# [147:144]     4               ECC Uncorrectable Errors (ECC_UE)
# [307:148]     10x2x8          ECC_UE Location (physical block address)
# [311:308]     4               Erase Unit Size
# [B][327:312]     16              FW Patch Version[/B]
# [331:328]     4               FCB Scan Result
# [335:332]     4               FTL Open Count
# [511:336]     (176)           Reserved[/SIZE]
 

Oranav

Senior Member
Oct 9, 2010
53
265
Does someone know what CMD60 and CMD61 does?

According to THIS old (2006) EE-Times article...

No, this documentation is irrelevant.
As I said before on this thread, there are 2 vendor-specific MMC commands Samsung has implemented: CMD60 and CMD62. It's their own implementation, you won't see any documentation for it unless you sign an NDA.
I have reversed most of the CMD60 and CMD62 interface though. There are some interesting features there.
MoviNAND doesn't have any CMD61 implementation.

They included it probably because they just copy-pasted some engineering code.
Besides that, an interesting fact is that CMD60 has an interface for doing firmware upgrades (so you don't need a stub as they've done here); they don't use it however.

Yes, the FW binary in fw.h seem to be for the "VHX" microcontroller.
Right, thanks. I didn't know it was the microcontroller type.

This is located at offset 0x1c in the fw.bin.
Actually, offset 0x1c is the 4 reserved DWORDs of the Cortex-M3 vector table.
Note that it is only located there for this firmware; in our RAM dumps, it's located elsewhere (hard-coded to the smart report generate function).

It should be noted that the Smart Report data
structure, as specified for the MoviNAND document (in OP) was changed with
newer chips. So this could potentially cause problems for updating, since an
older FW version would also have a different structure without the "FW Patch
Version" data, than the newer version, potentially causing a Catch-22 error!
This isn't true. As I've said, there is more than one kind of smart report, and it depends on which "block" you read.
If you read block 0 (as in the MoviNAND documentation), you get the regular report.
If you read block 0x1000 (as in the FFU and in the SDS temporary patch), you get the extended report.
There are more possible values. One, for example, is 0x2000 - to read the FTL context structure (0x3E4C bytes long).
 

E:V:A

Inactive Recognized Developer
Dec 6, 2011
1,447
2,222
-∇ϕ
... This isn't true. As I've said, there is more than one kind of smart report, and it depends on which "block" you read.
If you read block 0 (as in the MoviNAND documentation), you get the regular report. If you read block 0x1000 (as in the FFU and in the SDS temporary patch), you get the extended report. There are more possible values. One, for example, is 0x2000 - to read the FTL context structure (0x3E4C bytes long).

You sound awfully sure about what you're saying. Are you really that sure? I
think you might be confused by the 3 partitions (2 boot partitions and 1 USER
area) or possibly the 4-banks, which are shown (but not necessarily present)
in the Smart Report section of the "moviNAND" datasheet (Jan 2010). [I assume
that the datasheet you refer to is the one marked "KLMxGxxEHx".] This is quite
different from the 4-partitions layout that I'm looking at. There are a lot of
data-sheets out there, and the one I'm looking at is for the KMKUS000VM-B410
(March 2012).

Certainly it would not make any sense to have an "extended" Smart Report for
each, and indeed there is no such report mentioned in any of the Samsung
eMMC/NAND documents I have found to date. So I'm sure you're wrong.

If you feel different and that I'm in error, please provide a better
explanation of what's going on. Also tell us what is your exact eMMC chip,
that you're basing your analysis on. And most importantly, please provide the
Smart Report binary dump, as you have found it. (It should not be longer than
0x200 bytes!) It will clearly show you how many banks your chip is using, among
other things. I have a script which parses this for you. The results look like this:
Code:
[SIZE=2]$ ./smartview.pl smart.bin
Parsing Smart Records from: smart.bin

Error Mode: Normal

004: 4  : Super Block Size [1]              :00200000
008: 4  : Super Page Size [2]               :00004000
012: 4  : Optimal Write Size [3]            :00004000
016: 4  : Number Of Banks                   :00000001
020: 4  : Bank0 Initial bad blocks          :00000004
024: 4  : Bank0 Runtime bad blocks          : <zero>
028: 4  : Bank0 Remaining reserved blocks   :00000038
032: 4  : Bank1 Initial bad blocks          : <zero>
036: 4  : Bank1 Runtime bad blocks          : <zero>
040: 4  : Bank1 Remaining reserved blocks   : <zero>
044: 4  : Bank2 Initial bad blocks          : <zero>
048: 4  : Bank2 Runtime bad blocks          : <zero>
052: 4  : Bank2 Remaining reserved blocks   : <zero>
056: 4  : Bank3 Initial bad blocks          : <zero>
060: 4  : Bank3 Runtime bad blocks          : <zero>
064: 4  : Bank3 Remaining reserved blocks   : <zero>
068: 4  : Max. Erase Count                  :00000079
072: 4  : Min. Erase Count                  : <zero>
076: 4  : Avg. Erase Count                  :0000002c
080: 4  : Read Reclaim Count                : <zero>
084: 4  : Optimal Trim Size                 :00002000
088: 32 : Firmware Hash Code                :

          30aedf3e2295c9241457415f0f7c29a5
          4cdb54ee338ab1dd96ab785f9e0b80bd

120: 4  : SLC Erase Count Max.              :00000054
124: 4  : SLC Erase Count Min.              : <zero>
128: 4  : SLC Erase Count Avg.              :0000001e
132: 4  : MLC Erase Count Max.              :00000079
136: 4  : MLC Erase Count Min.              :00000001
140: 4  : MLC Erase Count Avg.              :0000002c
144: 4  : ECC Uncorrectable Errors          : <zero>
148: 160: ECC_UEL Physical Block Address [4]: <zero>
308: 4  : Erase Unit Size                   : <zero>
312: 16 : FW Patch Version                  : <zero>
328: 4  : FCB Scan Result                   : <zero>
332: 4  : FTL Open Count                    : <zero>
336: 176: Reserved                          : <zero>[/SIZE]
[This is not for our chip!]

Also, it should not be necessary to use any kernel module to do the dumping.
It should suffice with a ioctl for the eMMC CMDs... (The PoC is the result above.)

It's pointless to talk around the bush, if you have some data/code to show,
dump it somewhere for others to see, so that we can compare notes.

Finally it can't be emphasized enough, that Samsung's documentation is often
full of errors!
 
Last edited:

Oranav

Senior Member
Oct 9, 2010
53
265
You sound awfully sure about what you're saying. Are you really that sure? ...
Yes, I'm 100% sure.
This is a pseudo-code of MMC_READ_SINGLE_BLOCK command handler after you issue CMD62(0xEFAC62EC) CMD62(0x0000CCEE):
Code:
void __fastcall f_smart_report_send(mmc_command *cmd)
{
  uint32_t arg; // r1@1
  int arg_high_byte; // r4@1
  int arg_low_byte; // r5@1
...
  arg = cmd->arg;
  arg_high_byte = arg & 0xFF00;
  arg_low_byte = (unsigned __int8)arg;
...
  if ( arg_high_byte == 0x4100 )
  {
    sub_41954((int)g_output, arg_low_byte);
    goto done;
  }
...
    if ( arg_high_byte == 0x2000 )
    {
      f_ftl_get_context(&ftl);
      f_memcpy(g_output, ftl, 0x3E4Cu);
    }
...
          if ( arg_high_byte == 0x1000 )
          {
            f_memcpy(g_output, &g_smart_report_output, 0x200u);
          }
...
        if ( arg_high_byte == 0 )
        {
          f_memcpy(g_output, &g_smart_report_output, 0x90u);
        }
...
}
It was obtained using Hex-Rays decompiler over the 0xf1 firmware. It's exactly the same in 0xf7.
There are more possible values for arg, which I didn't include.
 

Rob2222

Senior Member
Feb 18, 2008
413
306
Hello,

Sourcecode update 8 is out and has some differences in the MMC driver compared to update 7.
They seem to use a feature called Power Off Notification (PON) now.
Do they maybe use it to detect locked state (by sd-fix) faster?
However, the sd-fix patch bytes they are writing to the eMMC are still the same.

BR
Rob
 
Last edited:

AndreiLux

Senior Member
Jul 9, 2011
3,209
14,598
Hello,

Sourcecode update 8 is out and has some differences in the MMC driver compared to update 7.
They seem to use a feature called Power Off Notification (PON) now.
Do they maybe use it to detect locked state (by sd-fix) faster?
However, the sd-fix patch bytes they are writing to the eMMC are still the same.

BR
Rob
We're aware of it. Although no absolute confirmation, they enable the chip's notification to communicate to the kernel of a restart; it'll probably avoid the 10 minute lockups.
 
  • Like
Reactions: ponepo and Rob2222

Rob2222

Senior Member
Feb 18, 2008
413
306
We're aware of it. Although no absolute confirmation, they enable the chip's notification to communicate to the kernel of a restart; it'll probably avoid the 10 minute lockups.

Hello,
but isn't this notification send on power-down from the host to the eMMC?
At least thats the way I understood the code. But I am not sure.

BR
Robert
 

E:V:A

Inactive Recognized Developer
Dec 6, 2011
1,447
2,222
-∇ϕ
Hello,but isn't this notification send on power-down from the host to the eMMC?At least thats the way I understood the code. But I am not sure.

We don't think this is very interesting, as it just give time for the device to make proper shutdown in case watchdog wanna shut it down. It's probably the watchdog that is responsible for shutting down the eMMC when reaching the oo-loop mentioned earlier.

You can read all about it in JEDEC JESD84-B451 on page 185 (207 in PDF).
Code:
[SIZE=2]7.4.86 [B]POWER_OFF_NOTIFICATION[/B] [34]

This field allows host to notify the device before the device is powered off. 
Values not in the below table are invalid and setting them will result in 
SWITCH_ERROR.

Value   Name                    Description
-----------------------------------------------------------
0x00    NO_POWER_NOTIFICATION   Power off notification is not supported by host, device shall not assume any notification
0x01    POWERED_ON              Host shall notify before powering off the device, and keep power supplies alive and active until then
0x02    POWER_OFF_SHORT         Host is going to power off the device, The device shall respond within GENERIC_CMD_6_TIME.
0x03    POWER_OFF_LONG          Host is going to power off the device The device shall respond within POWER_OFF_LONG_TIME.
[/SIZE]
 

phoenixdigital

Senior Member
Aug 8, 2010
127
0
I have just had SDS this morning....

the morning after the update..

My friend was also running a ROM with the 'patch' from Samsung and his phone still died of SDS.

I know that when my phone came back from repair it was still vulnerable to SDS. I am no longer content that this supposed 'fix' in the kernel and recovery is enough to protect my phone. I am almost certain this will happen again.

Has there been any more research into patching the firmware on the eMMC ourselves?
 

MSK1

Senior Member
Nov 15, 2012
823
75
Huawei Mate 20 X
My friend was also running a ROM with the 'patch' from Samsung and his phone still died of SDS.

I know that when my phone came back from repair it was still vulnerable to SDS. I am no longer content that this supposed 'fix' in the kernel and recovery is enough to protect my phone. I am almost certain this will happen again.

Has there been any more research into patching the firmware on the eMMC ourselves?

Oh no!!!!!!!!!

Can anyone verify this. .... !!!!!!
 

Entropy512

Senior Recognized Developer
Aug 31, 2007
14,088
25,086
Owego, NY
You know, there are plenty of other ways a device can fail with the same symptoms as SDS other than SDS itself... We have no idea if that device actually did have SDS occur. Considering that the number of damage reports has plummeted to almost nothing since the patch was deployed, I'm suspecting he had some other sort of failure... It happens, bad luck.
 

Product F(RED)

Senior Member
Sep 6, 2010
9,883
2,105
Brooklyn, NY
You know, there are plenty of other ways a device can fail with the same symptoms as SDS other than SDS itself... We have no idea if that device actually did have SDS occur. Considering that the number of damage reports has plummeted to almost nothing since the patch was deployed, I'm suspecting he had some other sort of failure... It happens, bad luck.

Basically, unless Samsung comes out and says something (like Sony did a few days ago for the same exact issue), we can't do much. I haven't had SDS, but all I know is from now on if I buy another Samsung device, it's going to be a US model. I don't care about the specs anymore. We've reached a point where the Exynos is only marginally faster than the competition. It's not worth the risk of ending up with a dead device that won't be repaired overseas (since I'm in the US). I've already had my S3 repaired once for a separate issue and it cost me $80 to ship it and a month of waiting.
 

AndreiLux

Senior Member
Jul 9, 2011
3,209
14,598
Basically, unless Samsung comes out and says something (like Sony did a few days ago for the same exact issue), we can't do much. I haven't had SDS, but all I know is from now on if I buy another Samsung device, it's going to be a US model. I don't care about the specs anymore. We've reached a point where the Exynos is only marginally faster than the competition. It's not worth the risk of ending up with a dead device that won't be repaired overseas (since I'm in the US). I've already had my S3 repaired once for a separate issue and it cost me $80 to ship it and a month of waiting.
What does this have anything to do with Exynos/Qualcomm?
 

Top Liked Posts

  • There are no posts matching your filters.
  • 53
    Update from Feb 17th:
    Samsung has started to upgrade eMMC firmwares on the field - only for GT-I9100 for now.
    See post #79 for additional details.

    Update from Feb 13th:
    If you want to dump the eMMC's RAM yourself, go ahead to post #72.
    I'm looking for a dump of firmware revision 0xf7 if you've got one.
    -----------------------


    Since it's very likely that the recent eMMC firmware patch by Samsung is their patch for the "sudden death" issue, it would be very nice to understand what is really going on there.

    According to a leaked moviNAND datasheet, it seems that MMC CMD62 is vendor-specific command that moviNAND implements.
    If you issue CMD62(0xEFAC62EC), then CMD62(0xCCEE) - you can read a "Smart report". To exit this mode, issue CMD62(0xEFAC62EC), then CMD62(0xDECCEE).


    So what are they doing in their patch?

    1. Whenever an MMC is attached:
    a. If it is "VTU00M", revision 0xf1, they read a Smart report.
    b. The DWORD at Smart[324:328] represents a date (little-endian); if it is not 0x20120413, they don't patch the firmware. (Maybe only chips from 2012/04/13 are buggy?)
    2. If the chip is buggy, whenever an MMC is attached or the device is resumed:
    a. Issue CMD62(0xEFAC62EC) CMD62(0x10210000) to enter RAM write mode. Now you can write to RAM by issuing MMC_ERASE_GROUP_START(Address to write) MMC_ERASE_GROUP_END(Value to be written) MMC_ERASE(0).
    b. *(0x40300) = 10 B5 03 4A 90 47 00 28 00 D1 FE E7 10 BD 00 00 73 9D 05 00
    c. *(0x5C7EA) = E3 F7 89 FD
    d. Exit RAM write mode by issuing CMD62(0xEFAC62EC) CMD62(0xDECCEE).
    10 B5 looks like a common Thumb push (in ARM architecture). Disassembling the bytes that they write to 0x40300 yields the following code:
    Code:
    ROM:00040300                 PUSH    {R4,LR}
    ROM:00040302                 LDR     R2, =0x59D73
    ROM:00040304                 BLX     R2
    ROM:00040306                 CMP     R0, #0
    ROM:00040308                 BNE     locret_4030C
    ROM:0004030A
    ROM:0004030A loc_4030A                               ; CODE XREF: ROM:loc_4030Aj
    ROM:0004030A                 B       loc_4030A
    ROM:0004030C ; ---------------------------------------------------------------------------
    ROM:0004030C
    ROM:0004030C locret_4030C                            ; CODE XREF: ROM:00040308j
    ROM:0004030C                 POP     {R4,PC}
    ROM:0004030C ; ---------------------------------------------------------------------
    Disassembling what they write to 0x5C7EA yields this:
    Code:
    ROM:0005C7EA                 BL      0x40300
    Looks like it is indeed Thumb code.
    If we could dump the eMMC RAM, we would understand what has been changed.


    By inspecting some code, it seems that we know how to dump the eMMC RAM:
    Look at the function mmc_set_wearlevel_page in line 206. It patches the RAM (using the method mentioned before), then it validates what it has written (in lines 255-290). Seems that the procedure to read the RAM is as following:
    1. CMD62(0xEFAC62EC) CMD62(0x10210002) to enter RAM reading mode
    2. MMC_ERASE_GROUP_START(Address to read) MMC_ERASE_GROUP_END(Length to read) MMC_ERASE(0)
    3. MMC_READ_SINGLE_BLOCK to read the data
    4. CMD62(0xEFAC62EC) CMD62(0xDECCEE) to exit RAM reading mode


    I don't want to run this on my device, because I'm afraid - messing with the eMMC doesn't sound like a very good idea on my device (I don't have a spare one).
    Does someone have a development device which he doesn't mind to risk, and want to dump the eMMC firmware from it? :)
    28
    Okay, got a RAM dump :)
    I won't post it here (or anywhere else for that matter) because I don't want to get sued by Samsung.

    I might release a kernel which allows you to dump the RAM yourself if there's enough demand, but I don't want to right now, because:
    1. The code is ugly as hell, not implemented as a kernel module, not thread-safe etc.
    2. It is highly dangerous (messing with the eMMC chip - I really don't know how much stable this thing is), so if you want to do it on your device, you should be an expert. In that case, you can write the code yourself (with little effort) :)


    Anyway, I hope the FTL is Whimory, since I'm familiar with it. Would be easier.
    I'll let you know if I find anything interesting.


    PS I've attached a little teaser. (Yes, this is the patched function. 0x40300 is red because I've opened a partial RAM dump.)



    EDIT - Some initial results:
    0. The CPU is a Cortex-M3.
    1. No strings at all :( Just some uninteresting release asserts ("REL_ASSERT")
    2. Found the Smart Report generator function -> found the MMC command handlers.
    3. Most MMC commands handlers are stored in a function table. There are 3 special commands: MMC60, MMC62, MMC64. Depends on the arguments these special commands are provided, they modify the function table (this is the so called "vendor mode").
    4. There are a lot of possible arguments for MMC62, not the only ones we know.
    5. If you trace back the function they patch all the way up the call stack, you get to MMC24 and MMC25 handler. These commands are MMC_WRITE_BLOCK and MMC_WRITE_MULTIPLE_BLOCK. Since the function they patch is deep down the call stack, it's very likely that it is the wear level.

    Anyway, because of the lack of strings I guess it would be very hard to truly understand the SDS bug we're facing :(
    18
    Just a quick update: thanks to a kernel compiled by AndreiLux, and thanks to artesea for doing an eMMC RAM dump on his device, we've got the 0xf7 firmware!

    It seems that it is runnable on the same hardware. It means that we can probably field upgrade I9300 devices, just as Samsung does with I9100.
    The interesting question is whether we're able to preserve the data on the eMMC during the process. If the answer is no, a firmware upgrade would require PIT repartitioning and reflashing of SBOOT so that the device won't become a brick.
    16
    So I decided to do a small RAM dump after all.

    Before the patch, 0x5C7EA reads FD F7 C2 FA, which is "BL 0x59D72".
    As I thought, they replace a function call to the new one.

    I will dump function 0x59D72 later this week.
    16
    Got a kernel log from just after such a freeze.

    I was about to power on the screen but nothing happen. Then I waited around 10 minutes and the screen came finally up and I dumped the log.

    Is this interesting? :D

    Full log is attached.

    Code:
    U/ 4002.738352  c0 [keys]PWR 1
    U/ 4002.983296  c0 [keys]PWR 0
    ...
    U/ 4587.514100  c0 mshci: ===========================================
    W/ 4587.514336  c0 mmc0: it occurs a critical error on eMMC it'll try to recover eMMC to normal state
    ....
    V/ 4587.850296  c0 mmc0: recovering eMMC has been done
    ...
    W/ 4587.850849  c0 mmcblk0: unknown error -131 sending read/write command, card status 0x900
    W/ 4587.851982  c0 end_request: I/O error, dev mmcblk0, sector 3126872
    W/ 4587.852174  c0 end_request: I/O error, dev mmcblk0, sector 3126880
    W/ 4587.852330  c0 end_request: I/O error, dev mmcblk0, sector 3126888


    EDIT: Added another log. Will add more, if I get more.


    BR
    Rob