• Introducing XDA Computing: Discussion zones for Hardware, Software, and more!    Check it out!

***WARNING!!! PLEASE READ! ICS Kernels and Recovery

Status
Not open for further replies.
Search This thread

Entropy512

Senior Recognized Developer
Aug 31, 2007
14,094
25,088
Owego, NY
I have realized this post was sorely outdated. I have updated it partially, but more updates are needed.

Reinbeau asked me to summarize the situation with ICS leaks in a way that can be stickied, so here it is:

DO NOT FLASH OR WIPE USING ANY ICS REPACK OR STOCK KERNEL

While stock recovery is safer than CWM recovery, there is still evidence that it is dangerous.. Almost all ICS kernels for the GT-N7000 are affected. Only two are currently known to be safe - see the list below.

These kernels are fundamentally dangerous. They can trigger an underlying defect in the flash chip, and once this defect is triggered, the damage is unrepairable, not even with JTAG. Note that for danger, three things are needed:
1) A defective eMMC chip - Nearly every Note has this
2) A kernel that allows MMC ERASE commands to go through - All Samsung ICS stock kernels and leaks fall into this category
3) A recovery or update-binary within a ZIP that attempts to erase a partition when formatting. Unmodified CWM performs a secure erase, which is the most dangerous. Stock recovery performs a nonsecure erase, which is less dangerous but there is still evidence of danger.

The issue is not limited to just Clockworkmod Recovery - Stock recovery, along with factory resetting a device from Settings, can also be dangerous. When XXLPY first came out, a number of people bricked this way. Stock recovery is less dangerous due to using nonsecure erase rather than secure erase, but it has yet to be proven to be fully safe.

Kernels that have been confirmed affected are:
All ICS leaks for the Samsung Epic 4G Touch (SPH-D710)
All ICS leaks for the Samsung Galaxy Note (GT-N7000)
All ICS official releases for the Samsung Galaxy Note (GT-N7000) as of late May 2012 - This includes XXLPY, ZSLPF, and DXLP9, and future kernels should be assumed affected until further notice.
UCLD3 ICS leak for the AT&T Samsung Galaxy S II (SGH-I777) - Other leaks may also be affected
Kernels built using the most recent SHW-M250S/K/L official source code release as of May 3, 2012 - This includes SiyahKernel 3.1rc6 for GT-I9100 (all other Siyah releases are safe)

Damage is not guaranteed - it may only affect a small percentage of users, but even a 5% chance is far more dangerous than the effectively 0% chance of hardbricking due to kernel bugs in safe kernels.
Not all users hardbrick - some wind up with /system, /data, or another partition becoming unwriteable, which leads to an effectively useless phone even though they are able to flash kernels in Odin.

Kernels that have been confirmed safe:

All known Gingerbread kernels for the Galaxy Note and other affected devices listed above
Kernels built from the GT-I9100 Update4 source code release - this includes XplodWILD's CM9 release and my DAFUQ release, hopefully more kernel choices will become available soon
Kernels with MMC_CAP_ERASE removed from mshci.c should be safe - look for it in the listed features of any kernel based off of N7000 Update3. (N7000 Update3 source code without this change made to render it safe is dangerous.)

If you are running an affected kernel:
STOP USING IT IMMEDIATELY. FLASH A SAFE KERNEL USING ODIN/HEIMDALL.
DO NOT wipe in recovery
DO NOT flash anything else in recovery
In general, DO NOT use recovery at all

Right now, what we know:
Some people can wipe with affected kernels as often as they want without problems. Just because you didn't brick, DO NOT advise other users that they will be OK.
Based on reports from the Epic 4G Touch community, some people can wipe/flash 20-30 times before hardbricking - Just because you didn't brick once, DO NOT continue flashing with an affected kernel

What we don't know:
*this needs to be completely updated*

More info:
http://forum.xda-developers.com/showthread.php?t=1607112 - Hardbricks on I777 UCLD3
http://forum.xda-developers.com/showthread.php?t=1615058 - Issues with SHW-M250L Update4
http://forum.xda-developers.com/showpost.php?p=26255080&postcount=159 - Information indicating that our eMMC chip has a serious firmware bug. All of the issues with fwrev 0x19 match our symptoms PERFECTLY. It explains partially why I9100 update4 is safe - MMC_CAP_ERASE is not enabled in the I9100 update4 MMC driver.
 
Last edited:

reinbeau

Retired Forum Moderator
Sep 14, 2010
7,358
4,607
65
South of Boston, MA
Additional information

From this post
I know this tread is not about bricks, but your warning must have saved lots of souls from bricks and sorrow. Just wanting to ask, for those currently running ROMs with affected kernels, though, ROMs could be working, could any undetected/unobservable damage possible without one's knowing?

Entropy replies: Possible - With the Siyah 3.1rc6 fiasco, some people had situations where just one or two partitions became hosed up (usually /data)

It could be possible that some devices might only have a few damaged sectors in the eMMC - but I haven't seen confirmed reports of this yet. It has always been my suspicion that deleting large files on affected kernels could be dangerous - but I'm not sure about this.
 

Entropy512

Senior Recognized Developer
Aug 31, 2007
14,094
25,088
Owego, NY
Mod note: I copied this post here from its original thread as it contains pertinent information.

Ok so it still isn't really clear if the problem is still present in LPY, or if the issue is "left-over" due to previous ROMs.

It's all possible. Either way, what I wan't to make absolutely clear again:

If the problem is present in CF-Root LPY (and it's not a "leftover" problem from previous ROM - this is possibly the case), then it's also present in stock LPY. This means you *can* randomly brick your phone even if fully booted into Android, under heavy I/O load.

(on the Note there normal and recovery kernels are the same, and CWM is just a userspace program, nothing special(!) - so if the problem can happen in CWM, it can happen in Android)
Additional information: I've seen at least one report of someone's device dying after a wipe in stock recovery, and one report of someone's device dying by doing a factory reset in Settings.

Which confirms everything you just said.
 

garwynn

Retired Forum Mod / Inactive Recognized Developer
Jul 30, 2011
5,182
8,589
NE Ohio
www.extra-life.org
Latest from Android Team

Well, it's been some time but thankfully Mr. Sumrall from Android did get back to us on our questions. I think the community will find that this was worth the wait.

Issue: fwrev not set properly.
As we suspected the bugfix is not in our build. (The patch applies this unconditionally.)
Ken Sumrall said:
The patch includes a line in mmc.c setting fwrev to the rights bits from the cid register. Before this patch, the file /sys/class/block/mmcblk0/device/fwrev was not initialized from the CID for emmc devices rev 4 and greater, and thus showed zero.

(On second inquiry)
fwrev is zero until the patch is applied.

Question: Revision didn't match the fix
(Emphasis mine in red as it discusses the superbrick issue.)
Ken Sumrall said:
You probably have the bug, but rev 0x19 was a previous version of the firmware we had in our prototype devices, but we found it had another bug that if you issued an mmc erase command, it could screw up the data structures in the chip and lead to the device locking up until it was powered cycled. We discovered this when many of our developers were doing a fastboot erase userdata while we were developing ICS. So Samsung fixed the problem and moved to firmware revision 0x25. Yes, it is very annoying that 0x19 is decimal 25, and that led to lots of confusion when trying to diagnose emmc firmware issues. I finally learned to _ALWAYS_ refer to emmc version in hexadecimal, and precede the number with 0x just to be unambiguous.

However, even though 0x19 probably has the bug that can insert 32 Kbytes of zeros into the flash, you can't use this patch on devices with firmware revision 0x19. This patch does a very specific hack to two bytes of code in the revision 0x25 firmware, and the patch most likely will not work on 0x19, and will probably cause the chip to malfunction at best, and lose data at worst. There is a reason the selection criteria are so strict for applying this patch to the emmc firmware.

I passed on our results a few days later mentioning that the file system didn't corrupt until the wipe. This is a response to that follow-up.

As I mentioned in the previous post, firmware rev 0x19 has a bug where the emmc chip can lockup after an erase command is given. Not every time, but often enough. Usually, the device can reboot after this, but then lockup during the boot process. Very rarely, it can lockup even before fastboot is loaded. Your tester was unlucky. Since you can't even start fastboot, the device is probably bricked. :-( If he could run fastboot, then the device could probably be recovered with the firmware update code I have, assuming I can share it. I'll ask.

Question: Why the /data partition?
Ken Sumrall (Android SE) said:
Because /data is the place the chip that experiences the most write activity. /system is never written to (except during an system update) and /cache is rarely used (mostly to receiving OTAs).

Question: Why JTAG won't work?
Ken Sumrall said:
As I mention above, the revision 0x19 firmware had a bug that after an emmc erase command, it could leave the internal data structures of the emmc chip in a bad state that cause the chip to lock up when a particular sector was accessed. The only fix was to wipe the chip, and update the firmware. I have code to do that, but I don't know if I can share it. I'll ask.

Question: Can a corrupted file system be repaired (on the eMMC)?
Ken Sumrall said:
e2fsck can repair the filesystem, but often the 32 Kbytes were inserted at the start of a block group, which erased many inodes, and thus running e2fsck would often result in many files getting lost.

So, while the fix doesn't apply to us at the moment, we've been given a great insight into the superbrick issue as well as information that a fix is already developed (hopefully we'll see it released!). The bug likely applies to us and assuming the fix for the 0x19 firmware is given then it would apply to our devices.

On a lighter note, I wanted to include his close:
Ken Sumrall said:
You are getting a glimpse into the exciting life of an Android kernel developer. :) Turns out the job is mostly fighting with buggy hardware. At least, it seems that way sometimes.
 

Entropy512

Senior Recognized Developer
Aug 31, 2007
14,094
25,088
Owego, NY
Well, it's been some time but thankfully Mr. Sumrall from Android did get back to us on our questions. I think the community will find that this was worth the wait.

Issue: fwrev not set properly.
As we suspected the bugfix is not in our build. (The patch applies this unconditionally.)


Question: Revision didn't match the fix
(Emphasis mine in red as it discusses the superbrick issue.)


Question: Why the /data partition?


Question: Why JTAG won't work?


Question: Can a corrupted file system be repaired (on the eMMC)?


So, while the fix doesn't apply to us at the moment, we've been given a great insight into the superbrick issue as well as information that a fix is already developed (hopefully we'll see it released!). The bug likely applies to us and assuming the fix for the 0x19 firmware is given then it would apply to our devices.

On a lighter note, I wanted to include his close:

BOOYA! (unfortunately, my quote of your post likely doesn't include Mr. Sumrall's comments)

So, in short - the AOSP patch in 4.0.4 is not relevant to our devices, HOWEVER:

There is a known issue that does affect our devices.

A bug with the ERASE command is EXACTLY what one would expect to cause this kind of behavior.

The remaining question is:
Gingerbread kernels have MMC_CAP_ERASE enabled for the MMC driver- why don't they cause problems, unless there's something somewhere else that prevents the ERASE function from being used even though the controller supports it?

This also explains why I9100 update4 is safe - one of the major differences between I9100 update4 and SHW-M250* sources is that MMC_CAP_ERASE is enabled in the SHW-M250* sources and not in I9100 update4.
 

reinbeau

Retired Forum Moderator
Sep 14, 2010
7,358
4,607
65
South of Boston, MA
This post on the portal is a very informative read. The take away quote for me:

Originally Posted by Ken Sumrall
You are getting a glimpse into the exciting life of an Android kernel developer. Turns out the job is mostly fighting with buggy hardware. At least, it seems that way sometimes.
Please stay clear from flashing anything ICS onto your devices until this has been solved.
 
  • Like
Reactions: syncopath
Status
Not open for further replies.

Top Liked Posts

  • There are no posts matching your filters.
  • 89
    I have realized this post was sorely outdated. I have updated it partially, but more updates are needed.

    Reinbeau asked me to summarize the situation with ICS leaks in a way that can be stickied, so here it is:

    DO NOT FLASH OR WIPE USING ANY ICS REPACK OR STOCK KERNEL

    While stock recovery is safer than CWM recovery, there is still evidence that it is dangerous.. Almost all ICS kernels for the GT-N7000 are affected. Only two are currently known to be safe - see the list below.

    These kernels are fundamentally dangerous. They can trigger an underlying defect in the flash chip, and once this defect is triggered, the damage is unrepairable, not even with JTAG. Note that for danger, three things are needed:
    1) A defective eMMC chip - Nearly every Note has this
    2) A kernel that allows MMC ERASE commands to go through - All Samsung ICS stock kernels and leaks fall into this category
    3) A recovery or update-binary within a ZIP that attempts to erase a partition when formatting. Unmodified CWM performs a secure erase, which is the most dangerous. Stock recovery performs a nonsecure erase, which is less dangerous but there is still evidence of danger.

    The issue is not limited to just Clockworkmod Recovery - Stock recovery, along with factory resetting a device from Settings, can also be dangerous. When XXLPY first came out, a number of people bricked this way. Stock recovery is less dangerous due to using nonsecure erase rather than secure erase, but it has yet to be proven to be fully safe.

    Kernels that have been confirmed affected are:
    All ICS leaks for the Samsung Epic 4G Touch (SPH-D710)
    All ICS leaks for the Samsung Galaxy Note (GT-N7000)
    All ICS official releases for the Samsung Galaxy Note (GT-N7000) as of late May 2012 - This includes XXLPY, ZSLPF, and DXLP9, and future kernels should be assumed affected until further notice.
    UCLD3 ICS leak for the AT&T Samsung Galaxy S II (SGH-I777) - Other leaks may also be affected
    Kernels built using the most recent SHW-M250S/K/L official source code release as of May 3, 2012 - This includes SiyahKernel 3.1rc6 for GT-I9100 (all other Siyah releases are safe)

    Damage is not guaranteed - it may only affect a small percentage of users, but even a 5% chance is far more dangerous than the effectively 0% chance of hardbricking due to kernel bugs in safe kernels.
    Not all users hardbrick - some wind up with /system, /data, or another partition becoming unwriteable, which leads to an effectively useless phone even though they are able to flash kernels in Odin.

    Kernels that have been confirmed safe:

    All known Gingerbread kernels for the Galaxy Note and other affected devices listed above
    Kernels built from the GT-I9100 Update4 source code release - this includes XplodWILD's CM9 release and my DAFUQ release, hopefully more kernel choices will become available soon
    Kernels with MMC_CAP_ERASE removed from mshci.c should be safe - look for it in the listed features of any kernel based off of N7000 Update3. (N7000 Update3 source code without this change made to render it safe is dangerous.)

    If you are running an affected kernel:
    STOP USING IT IMMEDIATELY. FLASH A SAFE KERNEL USING ODIN/HEIMDALL.
    DO NOT wipe in recovery
    DO NOT flash anything else in recovery
    In general, DO NOT use recovery at all

    Right now, what we know:
    Some people can wipe with affected kernels as often as they want without problems. Just because you didn't brick, DO NOT advise other users that they will be OK.
    Based on reports from the Epic 4G Touch community, some people can wipe/flash 20-30 times before hardbricking - Just because you didn't brick once, DO NOT continue flashing with an affected kernel

    What we don't know:
    *this needs to be completely updated*

    More info:
    http://forum.xda-developers.com/showthread.php?t=1607112 - Hardbricks on I777 UCLD3
    http://forum.xda-developers.com/showthread.php?t=1615058 - Issues with SHW-M250L Update4
    http://forum.xda-developers.com/showpost.php?p=26255080&postcount=159 - Information indicating that our eMMC chip has a serious firmware bug. All of the issues with fwrev 0x19 match our symptoms PERFECTLY. It explains partially why I9100 update4 is safe - MMC_CAP_ERASE is not enabled in the I9100 update4 MMC driver.
    7
    Latest from Android Team

    Well, it's been some time but thankfully Mr. Sumrall from Android did get back to us on our questions. I think the community will find that this was worth the wait.

    Issue: fwrev not set properly.
    As we suspected the bugfix is not in our build. (The patch applies this unconditionally.)
    Ken Sumrall said:
    The patch includes a line in mmc.c setting fwrev to the rights bits from the cid register. Before this patch, the file /sys/class/block/mmcblk0/device/fwrev was not initialized from the CID for emmc devices rev 4 and greater, and thus showed zero.

    (On second inquiry)
    fwrev is zero until the patch is applied.

    Question: Revision didn't match the fix
    (Emphasis mine in red as it discusses the superbrick issue.)
    Ken Sumrall said:
    You probably have the bug, but rev 0x19 was a previous version of the firmware we had in our prototype devices, but we found it had another bug that if you issued an mmc erase command, it could screw up the data structures in the chip and lead to the device locking up until it was powered cycled. We discovered this when many of our developers were doing a fastboot erase userdata while we were developing ICS. So Samsung fixed the problem and moved to firmware revision 0x25. Yes, it is very annoying that 0x19 is decimal 25, and that led to lots of confusion when trying to diagnose emmc firmware issues. I finally learned to _ALWAYS_ refer to emmc version in hexadecimal, and precede the number with 0x just to be unambiguous.

    However, even though 0x19 probably has the bug that can insert 32 Kbytes of zeros into the flash, you can't use this patch on devices with firmware revision 0x19. This patch does a very specific hack to two bytes of code in the revision 0x25 firmware, and the patch most likely will not work on 0x19, and will probably cause the chip to malfunction at best, and lose data at worst. There is a reason the selection criteria are so strict for applying this patch to the emmc firmware.

    I passed on our results a few days later mentioning that the file system didn't corrupt until the wipe. This is a response to that follow-up.

    As I mentioned in the previous post, firmware rev 0x19 has a bug where the emmc chip can lockup after an erase command is given. Not every time, but often enough. Usually, the device can reboot after this, but then lockup during the boot process. Very rarely, it can lockup even before fastboot is loaded. Your tester was unlucky. Since you can't even start fastboot, the device is probably bricked. :-( If he could run fastboot, then the device could probably be recovered with the firmware update code I have, assuming I can share it. I'll ask.

    Question: Why the /data partition?
    Ken Sumrall (Android SE) said:
    Because /data is the place the chip that experiences the most write activity. /system is never written to (except during an system update) and /cache is rarely used (mostly to receiving OTAs).

    Question: Why JTAG won't work?
    Ken Sumrall said:
    As I mention above, the revision 0x19 firmware had a bug that after an emmc erase command, it could leave the internal data structures of the emmc chip in a bad state that cause the chip to lock up when a particular sector was accessed. The only fix was to wipe the chip, and update the firmware. I have code to do that, but I don't know if I can share it. I'll ask.

    Question: Can a corrupted file system be repaired (on the eMMC)?
    Ken Sumrall said:
    e2fsck can repair the filesystem, but often the 32 Kbytes were inserted at the start of a block group, which erased many inodes, and thus running e2fsck would often result in many files getting lost.

    So, while the fix doesn't apply to us at the moment, we've been given a great insight into the superbrick issue as well as information that a fix is already developed (hopefully we'll see it released!). The bug likely applies to us and assuming the fix for the 0x19 firmware is given then it would apply to our devices.

    On a lighter note, I wanted to include his close:
    Ken Sumrall said:
    You are getting a glimpse into the exciting life of an Android kernel developer. :) Turns out the job is mostly fighting with buggy hardware. At least, it seems that way sometimes.
    4
    Additional information

    From this post
    I know this tread is not about bricks, but your warning must have saved lots of souls from bricks and sorrow. Just wanting to ask, for those currently running ROMs with affected kernels, though, ROMs could be working, could any undetected/unobservable damage possible without one's knowing?

    Entropy replies: Possible - With the Siyah 3.1rc6 fiasco, some people had situations where just one or two partitions became hosed up (usually /data)

    It could be possible that some devices might only have a few damaged sectors in the eMMC - but I haven't seen confirmed reports of this yet. It has always been my suspicion that deleting large files on affected kernels could be dangerous - but I'm not sure about this.
    4
    Well, it's been some time but thankfully Mr. Sumrall from Android did get back to us on our questions. I think the community will find that this was worth the wait.

    Issue: fwrev not set properly.
    As we suspected the bugfix is not in our build. (The patch applies this unconditionally.)


    Question: Revision didn't match the fix
    (Emphasis mine in red as it discusses the superbrick issue.)


    Question: Why the /data partition?


    Question: Why JTAG won't work?


    Question: Can a corrupted file system be repaired (on the eMMC)?


    So, while the fix doesn't apply to us at the moment, we've been given a great insight into the superbrick issue as well as information that a fix is already developed (hopefully we'll see it released!). The bug likely applies to us and assuming the fix for the 0x19 firmware is given then it would apply to our devices.

    On a lighter note, I wanted to include his close:

    BOOYA! (unfortunately, my quote of your post likely doesn't include Mr. Sumrall's comments)

    So, in short - the AOSP patch in 4.0.4 is not relevant to our devices, HOWEVER:

    There is a known issue that does affect our devices.

    A bug with the ERASE command is EXACTLY what one would expect to cause this kind of behavior.

    The remaining question is:
    Gingerbread kernels have MMC_CAP_ERASE enabled for the MMC driver- why don't they cause problems, unless there's something somewhere else that prevents the ERASE function from being used even though the controller supports it?

    This also explains why I9100 update4 is safe - one of the major differences between I9100 update4 and SHW-M250* sources is that MMC_CAP_ERASE is enabled in the SHW-M250* sources and not in I9100 update4.
    3
    Mod note: I copied this post here from its original thread as it contains pertinent information.

    Ok so it still isn't really clear if the problem is still present in LPY, or if the issue is "left-over" due to previous ROMs.

    It's all possible. Either way, what I wan't to make absolutely clear again:

    If the problem is present in CF-Root LPY (and it's not a "leftover" problem from previous ROM - this is possibly the case), then it's also present in stock LPY. This means you *can* randomly brick your phone even if fully booted into Android, under heavy I/O load.

    (on the Note there normal and recovery kernels are the same, and CWM is just a userspace program, nothing special(!) - so if the problem can happen in CWM, it can happen in Android)
    Additional information: I've seen at least one report of someone's device dying after a wipe in stock recovery, and one report of someone's device dying by doing a factory reset in Settings.

    Which confirms everything you just said.