eMMC sudden death research

Oranav · Feb 17, 2013

Just a quick update: thanks to a kernel compiled by AndreiLux, and thanks to artesea for doing an eMMC RAM dump on his device, we've got the 0xf7 firmware!

It seems that it is runnable on the same hardware. It means that we can probably field upgrade I9300 devices, just as Samsung does with I9100.
The interesting question is whether we're able to preserve the data on the eMMC during the process. If the answer is no, a firmware upgrade would require PIT repartitioning and reflashing of SBOOT so that the device won't become a brick.

E:V:A · Feb 18, 2013

As requested, HERE are the ./drivers/mmc sources (GT-I9100 JB).

[EDIT] (According to Entropy512, the stuff below is not related to FFU...)

To get an idea what chips they're fixing, we look in:
./drivers/mmc/card/block.c

Code:

...
[SIZE=2]static const struct mmc_fixup blk_fixups[] =
{
    MMC_FIXUP("SEM02G", 0x2, 0x100, add_quirk, MMC_QUIRK_INAND_CMD38),
    MMC_FIXUP("SEM04G", 0x2, 0x100, add_quirk, MMC_QUIRK_INAND_CMD38),
    MMC_FIXUP("SEM08G", 0x2, 0x100, add_quirk, MMC_QUIRK_INAND_CMD38),
    MMC_FIXUP("SEM16G", 0x2, 0x100, add_quirk, MMC_QUIRK_INAND_CMD38),
    MMC_FIXUP("SEM32G", 0x2, 0x100, add_quirk, MMC_QUIRK_INAND_CMD38),
    /*
     * Some MMC cards experience performance degradation with CMD23
     * instead of CMD12-bounded multiblock transfers. For now we'll
     * black list what's bad...
     * - Certain Toshiba cards.
     *
     * N.B. This doesn't affect SD cards.
     */
    MMC_FIXUP("MMC08G", 0x11, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_BLK_NO_CMD23),
    MMC_FIXUP("MMC16G", 0x11, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_BLK_NO_CMD23),
    MMC_FIXUP("MMC32G", 0x11, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_BLK_NO_CMD23),
    /*Some issue about secure erase/secure trim for Samsung MoviNAND*/
    MMC_FIXUP("M8G2FA", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_MOVINAND_SECURE),
    MMC_FIXUP("MAG4FA", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_MOVINAND_SECURE),
    MMC_FIXUP("MBG8FA", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_MOVINAND_SECURE),
    MMC_FIXUP("MCGAFA", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_MOVINAND_SECURE),
    MMC_FIXUP("VAL00M", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_MOVINAND_SECURE),
    MMC_FIXUP("VYL00M", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_MOVINAND_SECURE),
    MMC_FIXUP("KYL00M", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_MOVINAND_SECURE),
    MMC_FIXUP("VZL00M", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_MOVINAND_SECURE),

    END_FIXUP
};
[/SIZE]...

To decipher that you may need to look at ./include/linux/mmc/card.h HERE.

Code:

[SIZE=2]
...
#define _FIXUP_EXT(_name, _manfid, _oemid, _rev_start, _rev_end, _cis_vendor, _cis_device, _fixup, _data) \
    {                                   \
        .name = (_name),                \
        .manfid = (_manfid),            \
        .oemid = (_oemid),              \
        .rev_start = (_rev_start),      \
        .rev_end = (_rev_end),          \
        .cis_vendor = (_cis_vendor),    \
        .cis_device = (_cis_device),    \
        .vendor_fixup = (_fixup),       \
        .data = (_data),                \
     }

#define MMC_FIXUP_REV(_name, _manfid, _oemid, _rev_start, _rev_end, _fixup, _data) \
        _FIXUP_EXT(_name, _manfid, _oemid, _rev_start, _rev_end, SDIO_ANY_ID, SDIO_ANY_ID, _fixup, _data)
#define MMC_FIXUP(_name, _manfid, _oemid, _fixup, _data) \
        MMC_FIXUP_REV(_name, _manfid, _oemid, 0, -1ull, _fixup, _data)
[/SIZE]...

Entropy512 · Feb 18, 2013

Oranav said:
Just a quick update: thanks to a kernel compiled by AndreiLux, and thanks to artesea for doing an eMMC RAM dump on his device, we've got the 0xf7 firmware!

It seems that it is runnable on the same hardware. It means that we can probably field upgrade I9300 devices, just as Samsung does with I9100.
The interesting question is whether we're able to preserve the data on the eMMC during the process. If the answer is no, a firmware upgrade would require PIT repartitioning and reflashing of SBOOT so that the device won't become a brick.

Yeah. My guess is that the new FFU firmware is specially designed to fix Superbrick without requiring a low-level reformat (like most firmware upgrades do. I know at least back a year ago, there was no known way to go from 0x19 to 0x25 without a full reset.)

E:V:A said:

As requested, HERE are the ./drivers/mmc sources (GT-I9100 JB).

To get an idea what chips they're fixing, we look in:
./drivers/mmc/card/block.c

Code:

...
[SIZE=2]static const struct mmc_fixup blk_fixups[] =
{
    MMC_FIXUP("SEM02G", 0x2, 0x100, add_quirk, MMC_QUIRK_INAND_CMD38),
    MMC_FIXUP("SEM04G", 0x2, 0x100, add_quirk, MMC_QUIRK_INAND_CMD38),
    MMC_FIXUP("SEM08G", 0x2, 0x100, add_quirk, MMC_QUIRK_INAND_CMD38),
    MMC_FIXUP("SEM16G", 0x2, 0x100, add_quirk, MMC_QUIRK_INAND_CMD38),
    MMC_FIXUP("SEM32G", 0x2, 0x100, add_quirk, MMC_QUIRK_INAND_CMD38),
    /*
     * Some MMC cards experience performance degradation with CMD23
     * instead of CMD12-bounded multiblock transfers. For now we'll
     * black list what's bad...
     * - Certain Toshiba cards.
     *
     * N.B. This doesn't affect SD cards.
     */
    MMC_FIXUP("MMC08G", 0x11, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_BLK_NO_CMD23),
    MMC_FIXUP("MMC16G", 0x11, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_BLK_NO_CMD23),
    MMC_FIXUP("MMC32G", 0x11, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_BLK_NO_CMD23),
    /*Some issue about secure erase/secure trim for Samsung MoviNAND*/
    MMC_FIXUP("M8G2FA", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_MOVINAND_SECURE),
    MMC_FIXUP("MAG4FA", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_MOVINAND_SECURE),
    MMC_FIXUP("MBG8FA", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_MOVINAND_SECURE),
    MMC_FIXUP("MCGAFA", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_MOVINAND_SECURE),
    MMC_FIXUP("VAL00M", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_MOVINAND_SECURE),
    MMC_FIXUP("VYL00M", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_MOVINAND_SECURE),
    MMC_FIXUP("KYL00M", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_MOVINAND_SECURE),
    MMC_FIXUP("VZL00M", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_MOVINAND_SECURE),

    END_FIXUP
};
[/SIZE]...

To decipher that you may need to look at ./include/linux/mmc/card.h HERE.

Code:

[SIZE=2]
...
#define _FIXUP_EXT(_name, _manfid, _oemid, _rev_start, _rev_end, _cis_vendor, _cis_device, _fixup, _data) \
    {                                   \
        .name = (_name),                \
        .manfid = (_manfid),            \
        .oemid = (_oemid),              \
        .rev_start = (_rev_start),      \
        .rev_end = (_rev_end),          \
        .cis_vendor = (_cis_vendor),    \
        .cis_device = (_cis_device),    \
        .vendor_fixup = (_fixup),       \
        .data = (_data),                \
     }

#define MMC_FIXUP_REV(_name, _manfid, _oemid, _rev_start, _rev_end, _fixup, _data) \
        _FIXUP_EXT(_name, _manfid, _oemid, _rev_start, _rev_end, SDIO_ANY_ID, SDIO_ANY_ID, _fixup, _data)
#define MMC_FIXUP(_name, _manfid, _oemid, _fixup, _data) \
        MMC_FIXUP_REV(_name, _manfid, _oemid, 0, -1ull, _fixup, _data)
[/SIZE]...

That's a completely different patch than the FFU stuff. That's Samsung's original Superbrick patch, which was merged to mainline Linux in early September, showed up in I9300 JB (despite being of no use on I9100), and first showed up on an affected device with the I9100 HK JB source drop (which did NOT have FFU).

The original patch would just take all secure erase calls and translate them into nonsecure erase.

The FFU code is completely self-contained and has no interaction with the quirks code.

Entropy512 · Feb 18, 2013

Hmm. I need to take a closer look at this FFU stuff.

When running on my N7000 (VYL00M 0x19):

Code:

<1>[  338.705203] c0 Found platform driver dw_mmc : c0cf8db8
<1>[  338.705294] c0 Found platform device dw_mmc : c0cc8fb8
<1>[  338.705310] c0 FFU getting host data success.
<1>[  338.705333] c0 mmc0: clock 33333333Hz busmode 2 powermode 2 cs 0 Vdd 21 width 3 timing 5
<1>[  338.720523] c0 FFU It's not a VHX. 0000

So whatever it's meant to update, it appears to be something other than a Superbrick-vulnerable eMMC.

I'll take a deeper look at this tonight - such as modifying it to display what it DID get from the Smart Report.

E:V:A · Feb 18, 2013

Does someone know what CMD60 and CMD61 does?

According to THIS old (2006) EE-Times article, it seem that they are connected
with CE-ATA interface, to provide backward compatibility when the eMMC chips
are used in a CF+ configuration. I guess it could also be involved when they
are used in a SSD RAID configuration. But this should be confirmed... Here's
what they say. [Attention, the EE-Times bastards only allow you to see 2-pages!]

CE-ATA is based primarily on a combination of the MMC electrical interface
standard and on an optimized subset of the ATA command protocol. CE-ATA
specifications include:

Scalable transfer rates up to 52 MB/s
Low pin count with six or ten interface signals, depending on data transfer rate requirements
A streamlined ATA command set — only five commands
Five MMC commands, two exclusively developed for CE-ATA
A simple protocol

MMC commands in CE-ATA scope

CE-ATA utilizes five MMC commands during the course of normal execution.
Resets are performed using the “GO_IDLE_STATE (CMD0)”, aborting an ATA command
is done by issuing “STOP_TRANSMISSION (CMD12)”, byte-wise access to the ATA
taskfile register is achieved using “FAST_IO (CMD39)”, issuance of an ATA
command or access to the status and control registers is executed by
“RW_MULTIPLE_REGISTER (CMD60)”, and ATA command data transfer is achieved
using “RW_MULTIPLE_BLOCK (CMD61)”.

CMD60 and CMD61 are MMC commands newly defined by CE-ATA for efficient ATA
command execution. CE-ATA utilizes the same MMC command sequence for
initialization as traditional MMC devices. The ATA operation occurs within the
MMC “TRAN” state.

ATA processing on MMC

CE-ATA maps the streamlined ATA command set onto the MMC interface. The ATA
taskfile is mapped onto the MMC register space starting at MMC address zero as
outlined in Figure 1. To accommodate future capacity growth, and to ensure
that large transfers can be done, 48-bit addressing is accommodated from the
start. Transferring a single song from the device typically requires more than
128KB of data, which is the maximum transfer size with only 28-bit LBA
addressing.

...

CMD60 : RW multiple register

CMD60 allows multiple registers to be read/written in a single transaction
sequence. Using CMD60, an ATA command can be issued using a single MMC
command. Alternatively, software may have to execute up to 13 individual
taskfile register writes with "FAST_IO" in order to issue a single ATA
command.

CMD61 : RW multiple block

CMD61 is used to transfer the data for an ATA command (like READ DMA EXT).
Data transfer for media access commands must be multiples of the CE-ATA sector
size that is a minimum of 4KB. Each transfer is broken into multiple MMC data
blocks that are 512 bytes, 1KB, or 4KB in size as negotiated between the host
and the device.

The question is why they have included CMD60 in movinand.c, since it is never
actually used anywhere, as far as I can see.
So perhaps it has a different use altogether...

Code:

[SIZE=2]int mmc_vendor_cmd60(unsigned int arg) {
    int err;
    struct mmc_host *host = ffu_mmc_host();
    unsigned int claim = 0;
    int resp;
    struct mmc_command cmd = {0};

    if( host ) {
        claim = host->claimed;
        cmd.opcode = 60;
        cmd.arg = arg;
        cmd.flags = MMC_RSP_SPI_R1B;

        host->claimed = 1;
        err = mmc_wait_for_cmd(host, &cmd, 3);
        host->claimed = claim;
        if (err) {
            printk(KERN_ALERT"FFU : CMD60 error with arg : %x\n",arg);
            return err;
        }

        do{ mmc_send_status(&resp); }while(resp != 0x900);
        return 0;
    }
    return -1;
}[/SIZE]

E:V:A · Feb 18, 2013

@ Entropy512:

Yes, the FW binary in fw.h seem to be for the "VHX" microcontroller. However,
there are two problems that could occur here. One is that the I9300 is using a
different chip and thus probably also using a different µ-Controller, and/or
µC FW version (as we have seen). That's why I asked earlier to check what
exact eMMC they use in these phones. The best way to check is to open and
look! Second, is that if you're on an older chip, that has not been updated,
you'll get hit in the face by the "Catch-22" bug. See below.

Check HERE:

Code:

[SIZE=2]Size    Cont*   Part Number             PKG             PKG Dimension   MMC Version     Class
--------------------------------------------------------------------------------------------------
4GB     VFX_U   KLM4G1HE3F-B00x         153-ball BGA    11.5mm x 13mm   eMMC4.41        Class50         
8GB     VFX_U   KLM8G2FE3B-B00x         153-ball BGA    11.5mm x 13mm   eMMC4.41        Class100        
16GB    [B]VHX[/B]     KLMAG2GE4A-A00x         169-ball BGA    12mm x 16mm     eMMC4.41        Class400        
32GB    [B]VHX[/B]     KLMBG4GE4A-A00x         169-ball BGA    12mm x 16mm     eMMC4.41        Class400        
64GB    [B]VHX[/B]     KLMCG8GE4A-A00x         169-ball BGA    12mm x 16mm     eMMC4.41        Class400        
16GB    VHX2    KLMAG2GE2A-A00x         169-ball BGA    12mm x 16mm     eMMC 4.5        Class1500       
32GB    VHX2    KLMBG4GE2A-A00x         169-ball BGA    12mm x 16mm     eMMC 4.5        Class1500       
64GB    VHX2    KLMCG8GE2A-A00x         169-ball BGA    12mm x 16mm     eMMC 4.5        Class1500       
--------------------------------------------------------------------------------------------------
Cont* = Controller[/SIZE]

The check is done in movinand.c, by comparing the first 4-bytes of the 16-byte
"FW Patch Version" field of the eMMC Smart Report [327:312]. This is located
at offset 0x1c in the fw.bin. It should be noted that the Smart Report data
structure, as specified for the MoviNAND document (in OP) was changed with
newer chips. So this could potentially cause problems for updating, since an
older FW version would also have a different structure without the "FW Patch
Version" data, than the newer version, potentially causing a Catch-22 error!

Code:

int mmc_smartreport(void)
{
...
if( !(buff[312] == 'V' && buff[313] == 'H' && buff[314] == 'X' && buff[315] == '0') ) {
    printk(KERN_ALERT"FFU It's not a VHX. %x%x%x%x", buff[312], buff[313], buff[314], buff[315]);
    return -1;
}
...

Perhaps you can dump your FW binary, so that we can have a look and compare?

The "old" MoviNAND Smart Report structure:

Code:

[SIZE=2]#-------------------------------------------------------------------------------
# Smart Report Output Data:     (from: http://tiny.cc/2jacsw )
#-------------------------------------------------------------------------------
#DataSlice  Width   Field           Remark
#-------------------------------------------------------------------------------
#[3:0]          4   Error Mode              Normal:             0xD2D2D2D2
#                                           OpenFatalError:     0x37373737
#                                           RuntimeFatalError:  0x5C5C5C5C
#                                           MetaBrokenError:    0xE1E1E1E1
#[7:4]          4   Super Block Size                [1] Total Size of simultaneously erasable physical blocks
#[11:8]         4   Super Page Size                 [2] Total Size of simultaneously programmable physical pages
#[15:12]        4   Optimal Write Size              [3] Write size at which the device performs best
#[19:16]        4   Number Of Banks                 Number of banks connecting to each NAND flash
#[23:20]        4   Bank0 Init Bad Block            Number of initial defective physical blocks
#[27:24]        4   Bank0 Runtime Bad Block         Number of runtime defective physical blocks
#[31:28]        4   Bank0 Remain Reserved Block     Number of remain reserved physical blocks
#[35:32]        4   Bank1 Init Bad Block            "
#[39:36]        4   Bank1 Runtime Bad Block         "
#[43:40]        4   Bank1 Remain Reserved Block     "
#[47:44]        4   Bank2 Init Bad Block            "
#[51:48]        4   Bank2 Runtime Bad Block         "
#[55:52]        4   Bank2 Remain Reserved Block     "
#[59:56]        4   Bank3 Init Bad Block            "
#[63:60]        4   Bank3 Runtime Bad Block         "
#[67:64]        4   Bank3 Reserved Block            "
#[71:68]        4   Max. Erase Count                    Maximum erase count from among all physical blocks
#[75:72]        4   Min. Erase Count                    Minimum erase count from among all physical blocks
#[79:76]        4   Avg. Erase Count                    Average erase count of all physical blocks
#*****************  CHANGES FROM HERE ON  **************************************
#[83:80]        4   Number of ECC Uncorrectable Error
#[143:84]    30x2    ECC Uncorrectable Error Location    Physical Block Address of ECC Uncorrectable Error
#[203:144]   30x2    ECC Uncorrectable Error Location    Physical Page Offset of ECC Uncorrectable Error
#[219:204]   (16)    Reserved
#[223:220]      4   Read Reclaim Count                  Number of Read Reclaim Count
#[511:224]  (288)   Reserved
#-------------------------------------------------------------------------------
#[1] Number of Channel * N-way Interleaving * physical block size
#[2] Number of Channel * physical page size
#[3] Super Page Size * N-way Interleaving
#-------------------------------------------------------------------------------
[/SIZE]

The "new" structure:

Code:

[SIZE=2]...
# [83:80]       4               Read Reclaim Count
# [87:84]       4               Optimal Trim Size
# [119:88]      32              Firmware Hash Code
# [123:120]     4               SLC Erase Count Max. 
# [127:124]     4               SLC Erase Count Min. 
# [131:128]     4               SLC Erase Count Avg. 
# [135:132]     4               MLC Erase Count Max. 
# [139:136]     4               MLC Erase Count Min. 
# [143:140]     4               MLC Erase Count Avg. 
# [147:144]     4               ECC Uncorrectable Errors (ECC_UE)
# [307:148]     10x2x8          ECC_UE Location (physical block address)
# [311:308]     4               Erase Unit Size
# [B][327:312]     16              FW Patch Version[/B]
# [331:328]     4               FCB Scan Result
# [335:332]     4               FTL Open Count
# [511:336]     (176)           Reserved[/SIZE]

Oranav · Feb 18, 2013

E:V:A said:
Does someone know what CMD60 and CMD61 does?

According to THIS old (2006) EE-Times article...

No, this documentation is irrelevant.
As I said before on this thread, there are 2 vendor-specific MMC commands Samsung has implemented: CMD60 and CMD62. It's their own implementation, you won't see any documentation for it unless you sign an NDA.
I have reversed most of the CMD60 and CMD62 interface though. There are some interesting features there.
MoviNAND doesn't have any CMD61 implementation.

They included it probably because they just copy-pasted some engineering code.
Besides that, an interesting fact is that CMD60 has an interface for doing firmware upgrades (so you don't need a stub as they've done here); they don't use it however.

E:V:A said:
Yes, the FW binary in fw.h seem to be for the "VHX" microcontroller.

Right, thanks. I didn't know it was the microcontroller type.

E:V:A said:
This is located at offset 0x1c in the fw.bin.

Actually, offset 0x1c is the 4 reserved DWORDs of the Cortex-M3 vector table.
Note that it is only located there for this firmware; in our RAM dumps, it's located elsewhere (hard-coded to the smart report generate function).

E:V:A said:
It should be noted that the Smart Report data
structure, as specified for the MoviNAND document (in OP) was changed with
newer chips. So this could potentially cause problems for updating, since an
older FW version would also have a different structure without the "FW Patch
Version" data, than the newer version, potentially causing a Catch-22 error!

This isn't true. As I've said, there is more than one kind of smart report, and it depends on which "block" you read.
If you read block 0 (as in the MoviNAND documentation), you get the regular report.
If you read block 0x1000 (as in the FFU and in the SDS temporary patch), you get the extended report.
There are more possible values. One, for example, is 0x2000 - to read the FTL context structure (0x3E4C bytes long).

E:V:A · Feb 18, 2013

Oranav said:
... This isn't true. As I've said, there is more than one kind of smart report, and it depends on which "block" you read.
If you read block 0 (as in the MoviNAND documentation), you get the regular report. If you read block 0x1000 (as in the FFU and in the SDS temporary patch), you get the extended report. There are more possible values. One, for example, is 0x2000 - to read the FTL context structure (0x3E4C bytes long).

You sound awfully sure about what you're saying. Are you really that sure? I
think you might be confused by the 3 partitions (2 boot partitions and 1 USER
area) or possibly the 4-banks, which are shown (but not necessarily present)
in the Smart Report section of the "moviNAND" datasheet (Jan 2010). [I assume
that the datasheet you refer to is the one marked "KLMxGxxEHx".] This is quite
different from the 4-partitions layout that I'm looking at. There are a lot of
data-sheets out there, and the one I'm looking at is for the KMKUS000VM-B410
(March 2012).

Certainly it would not make any sense to have an "extended" Smart Report for
each, and indeed there is no such report mentioned in any of the Samsung
eMMC/NAND documents I have found to date. So I'm sure you're wrong.

If you feel different and that I'm in error, please provide a better
explanation of what's going on. Also tell us what is your exact eMMC chip,
that you're basing your analysis on. And most importantly, please provide the
Smart Report binary dump, as you have found it. (It should not be longer than
0x200 bytes!) It will clearly show you how many banks your chip is using, among
other things. I have a script which parses this for you. The results look like this:

Code:

[SIZE=2]$ ./smartview.pl smart.bin
Parsing Smart Records from: smart.bin

Error Mode: Normal

004: 4  : Super Block Size [1]              :00200000
008: 4  : Super Page Size [2]               :00004000
012: 4  : Optimal Write Size [3]            :00004000
016: 4  : Number Of Banks                   :00000001
020: 4  : Bank0 Initial bad blocks          :00000004
024: 4  : Bank0 Runtime bad blocks          : <zero>
028: 4  : Bank0 Remaining reserved blocks   :00000038
032: 4  : Bank1 Initial bad blocks          : <zero>
036: 4  : Bank1 Runtime bad blocks          : <zero>
040: 4  : Bank1 Remaining reserved blocks   : <zero>
044: 4  : Bank2 Initial bad blocks          : <zero>
048: 4  : Bank2 Runtime bad blocks          : <zero>
052: 4  : Bank2 Remaining reserved blocks   : <zero>
056: 4  : Bank3 Initial bad blocks          : <zero>
060: 4  : Bank3 Runtime bad blocks          : <zero>
064: 4  : Bank3 Remaining reserved blocks   : <zero>
068: 4  : Max. Erase Count                  :00000079
072: 4  : Min. Erase Count                  : <zero>
076: 4  : Avg. Erase Count                  :0000002c
080: 4  : Read Reclaim Count                : <zero>
084: 4  : Optimal Trim Size                 :00002000
088: 32 : Firmware Hash Code                :

          30aedf3e2295c9241457415f0f7c29a5
          4cdb54ee338ab1dd96ab785f9e0b80bd

120: 4  : SLC Erase Count Max.              :00000054
124: 4  : SLC Erase Count Min.              : <zero>
128: 4  : SLC Erase Count Avg.              :0000001e
132: 4  : MLC Erase Count Max.              :00000079
136: 4  : MLC Erase Count Min.              :00000001
140: 4  : MLC Erase Count Avg.              :0000002c
144: 4  : ECC Uncorrectable Errors          : <zero>
148: 160: ECC_UEL Physical Block Address [4]: <zero>
308: 4  : Erase Unit Size                   : <zero>
312: 16 : FW Patch Version                  : <zero>
328: 4  : FCB Scan Result                   : <zero>
332: 4  : FTL Open Count                    : <zero>
336: 176: Reserved                          : <zero>[/SIZE]

[This is not for our chip!]

Also, it should not be necessary to use any kernel module to do the dumping.
It should suffice with a ioctl for the eMMC CMDs... (The PoC is the result above.)

It's pointless to talk around the bush, if you have some data/code to show,
dump it somewhere for others to see, so that we can compare notes.

Finally it can't be emphasized enough, that Samsung's documentation is often
full of errors!

Oranav · Feb 18, 2013

E:V:A said:
You sound awfully sure about what you're saying. Are you really that sure? ...

Yes, I'm 100% sure.
This is a pseudo-code of MMC_READ_SINGLE_BLOCK command handler after you issue CMD62(0xEFAC62EC) CMD62(0x0000CCEE):

Code:

void __fastcall f_smart_report_send(mmc_command *cmd)
{
  uint32_t arg; // r1@1
  int arg_high_byte; // r4@1
  int arg_low_byte; // r5@1
...
  arg = cmd->arg;
  arg_high_byte = arg & 0xFF00;
  arg_low_byte = (unsigned __int8)arg;
...
  if ( arg_high_byte == 0x4100 )
  {
    sub_41954((int)g_output, arg_low_byte);
    goto done;
  }
...
    if ( arg_high_byte == 0x2000 )
    {
      f_ftl_get_context(&ftl);
      f_memcpy(g_output, ftl, 0x3E4Cu);
    }
...
          if ( arg_high_byte == 0x1000 )
          {
            f_memcpy(g_output, &g_smart_report_output, 0x200u);
          }
...
        if ( arg_high_byte == 0 )
        {
          f_memcpy(g_output, &g_smart_report_output, 0x90u);
        }
...
}

It was obtained using Hex-Rays decompiler over the 0xf1 firmware. It's exactly the same in 0xf7.
There are more possible values for arg, which I didn't include.

Rob2222 · Feb 28, 2013

Hello,

Sourcecode update 8 is out and has some differences in the MMC driver compared to update 7.
They seem to use a feature called Power Off Notification (PON) now.
Do they maybe use it to detect locked state (by sd-fix) faster?
However, the sd-fix patch bytes they are writing to the eMMC are still the same.

BR
Rob

AndreiLux · Mar 1, 2013

Rob2222 said:
Hello,

Sourcecode update 8 is out and has some differences in the MMC driver compared to update 7.
They seem to use a feature called Power Off Notification (PON) now.
Do they maybe use it to detect locked state (by sd-fix) faster?
However, the sd-fix patch bytes they are writing to the eMMC are still the same.

BR
Rob

We're aware of it. Although no absolute confirmation, they enable the chip's notification to communicate to the kernel of a restart; it'll probably avoid the 10 minute lockups.

Rob2222 · Mar 1, 2013

AndreiLux said:
We're aware of it. Although no absolute confirmation, they enable the chip's notification to communicate to the kernel of a restart; it'll probably avoid the 10 minute lockups.

Hello,
but isn't this notification send on power-down from the host to the eMMC?
At least thats the way I understood the code. But I am not sure.

BR
Robert

E:V:A · Mar 1, 2013

Rob2222 said:
Hello,but isn't this notification send on power-down from the host to the eMMC?At least thats the way I understood the code. But I am not sure.

We don't think this is very interesting, as it just give time for the device to make proper shutdown in case watchdog wanna shut it down. It's probably the watchdog that is responsible for shutting down the eMMC when reaching the oo-loop mentioned earlier.

You can read all about it in JEDEC JESD84-B451 on page 185 (207 in PDF).

Code:

[SIZE=2]7.4.86 [B]POWER_OFF_NOTIFICATION[/B] [34]

This field allows host to notify the device before the device is powered off. 
Values not in the below table are invalid and setting them will result in 
SWITCH_ERROR.

Value   Name                    Description
-----------------------------------------------------------
0x00    NO_POWER_NOTIFICATION   Power off notification is not supported by host, device shall not assume any notification
0x01    POWERED_ON              Host shall notify before powering off the device, and keep power supplies alive and active until then
0x02    POWER_OFF_SHORT         Host is going to power off the device, The device shall respond within GENERIC_CMD_6_TIME.
0x03    POWER_OFF_LONG          Host is going to power off the device The device shall respond within POWER_OFF_LONG_TIME.
[/SIZE]

MSK1 · Mar 2, 2013

I have just had SDS this morning....

the morning after the update..

phoenixdigital · Mar 25, 2013

MSK1 said:
I have just had SDS this morning....

the morning after the update..

My friend was also running a ROM with the 'patch' from Samsung and his phone still died of SDS.

I know that when my phone came back from repair it was still vulnerable to SDS. I am no longer content that this supposed 'fix' in the kernel and recovery is enough to protect my phone. I am almost certain this will happen again.

Has there been any more research into patching the firmware on the eMMC ourselves?

MSK1 · Mar 26, 2013

phoenixdigital said:
My friend was also running a ROM with the 'patch' from Samsung and his phone still died of SDS.

I know that when my phone came back from repair it was still vulnerable to SDS. I am no longer content that this supposed 'fix' in the kernel and recovery is enough to protect my phone. I am almost certain this will happen again.

Has there been any more research into patching the firmware on the eMMC ourselves?

Oh no!!!!!!!!!

Can anyone verify this. .... !!!!!!

phoenixdigital · Mar 26, 2013

MSK1 said:
Oh no!!!!!!!!!

Can anyone verify this. .... !!!!!!

To further clarify he was running Super Nexus Build 2
http://xdaforums.com/showthread.php?t=2076672

which contains
I9300: Patched kernel with sudden death fix from Samsung's kernel update 7 sources

I have been informed there is an update 8 out in the wild.

Entropy512 · Mar 26, 2013

You know, there are plenty of other ways a device can fail with the same symptoms as SDS other than SDS itself... We have no idea if that device actually did have SDS occur. Considering that the number of damage reports has plummeted to almost nothing since the patch was deployed, I'm suspecting he had some other sort of failure... It happens, bad luck.

Product F(RED) · Mar 26, 2013

Entropy512 said:
You know, there are plenty of other ways a device can fail with the same symptoms as SDS other than SDS itself... We have no idea if that device actually did have SDS occur. Considering that the number of damage reports has plummeted to almost nothing since the patch was deployed, I'm suspecting he had some other sort of failure... It happens, bad luck.

Basically, unless Samsung comes out and says something (like Sony did a few days ago for the same exact issue), we can't do much. I haven't had SDS, but all I know is from now on if I buy another Samsung device, it's going to be a US model. I don't care about the specs anymore. We've reached a point where the Exynos is only marginally faster than the competition. It's not worth the risk of ending up with a dead device that won't be repaired overseas (since I'm in the US). I've already had my S3 repaired once for a separate issue and it cost me $80 to ship it and a month of waiting.

AndreiLux · Mar 26, 2013

Product F(RED) said:
Basically, unless Samsung comes out and says something (like Sony did a few days ago for the same exact issue), we can't do much. I haven't had SDS, but all I know is from now on if I buy another Samsung device, it's going to be a US model. I don't care about the specs anymore. We've reached a point where the Exynos is only marginally faster than the competition. It's not worth the risk of ending up with a dead device that won't be repaired overseas (since I'm in the US). I've already had my S3 repaired once for a separate issue and it cost me $80 to ship it and a month of waiting.

What does this have anything to do with Exynos/Qualcomm?

eMMC sudden death research

Senior Member

Inactive Recognized Developer

Senior Recognized Developer

Senior Recognized Developer

Inactive Recognized Developer

Inactive Recognized Developer

Senior Member

Inactive Recognized Developer

Senior Member

Senior Member

Senior Member

Senior Member

Inactive Recognized Developer

Senior Member

Senior Member

Senior Member

Senior Member

Senior Recognized Developer

Senior Member

Senior Member

Similar threads

Top Liked Posts