Thread Closed

***WARNING! PLEASE READ!! ICS Kernels and Recovery - HARDBRICK WARNING

OP Entropy512

3rd May 2012, 10:40 PM   |  #1  
OP Senior Recognized Developer
Flag Owego, NY
Thanks Meter: 24,470
 
13,360 posts
Join Date:Joined: Aug 2007
Donate to Me
More
Reinbeau asked me to summarize the situation with ICS leaks in a way that can be stickied, so here it is:

DO NOT FLASH OR WIPE USING ANY ICS REPACK

The list of "known safe" kernels is below in this post - there are currently two known-safe ICS kernels, but only two.

These kernels are fundamentally dangerous. Samsung introduced some sort of bug in the eMMC driver that can permanently damage the eMMC flash storage of the phone. This leads to unusable partitions at best, and at worst a hardbricked device. The nature of the failure is so severe that the usual method for hardbrick recovery (JTAG) is unable to recover devices damaged in this manner.

Kernels that have been confirmed affected are:
All ICS leaks for the Samsung Epic 4G Touch (SPH-D710)
All ICS leaks for the Samsung Galaxy Note (GT-N7000)
XXLPY official ICS release for GT-N7000 - at least one hardbrick (chasmodo), 2 people with damaged partitions (unusable /data or /system), at least 1 person with unusable /data after a wipe in *factory* recovery - it's not just CWM, and one person who hardbricked after wiping in Settings
ZSLPF official ICS release for GT-N7000 - http://forum.xda-developers.com/show....php?t=1661590 and http://forum.xda-developers.com/show...1#post26275391 are the first two reports.
UCLD3 ICS leak for the AT&T Samsung Galaxy S II (SGH-I777) - Other leaks may also be affected
Kernels built using the most recent SHW-M250S/K/L official source code release as of May 3, 2012 - This includes SiyahKernel 3.1rc6 for GT-I9100 (all other Siyah releases are safe)

Damage is not guaranteed - it may only affect a small percentage of users, but even a 5% chance is far more dangerous than the effectively 0% chance of hardbricking due to kernel bugs in safe kernels.
Also, some people will not fully hardbrick - /data or /system will become unwritable, resulting in a phone that can enter download mode, can flash kernels, can write to some partitions in recovery, but is overall unusable due to one or more critical partitions being unusable.

Kernels that have been confirmed safe:
All known Gingerbread kernels for the Galaxy Note and other affected devices listed above
Kernels built from the GT-I9100 Update4 source code release - this includes XplodWILD's CM9 release and my DAFUQ release, hopefully more kernel choices will become available soon
Kernels that have had MMC_CAP_ERASE disabled in mshci.c should be safe, look for it in the listed features of the kernel - preliminary results are good, no bricks have been reported by anyone confirmed to be actually running such a kernel yet.

If you are running an affected kernel:
STOP USING IT IMMEDIATELY. FLASH A SAFE KERNEL USING ODIN/HEIMDALL.
DO NOT wipe in recovery
DO NOT flash anything else in recovery
In general, DO NOT use recovery at all

Right now, what we know:
Some people can wipe with affected kernels as often as they want without problems. Just because you didn't brick, DO NOT advise other users that they will be OK.
Based on reports from the Epic 4G Touch community, some people can wipe/flash 20-30 times before hardbricking - Just because you didn't brick once, DO NOT continue flashing with an affected kernel
The source of the problem is somewhere within the changes between I9100 Update4 and SHW-M250S Update5 - https://github.com/Entropy512/kernel...250s_dangerous

What we don't know:
Exactly which source commit above is responsible
How to determine if a future kernel or source release is safe without putting user's devices at risk - You only need to reproduce the problem once to be hosed.
Last edited by Entropy512; 30th May 2012 at 01:58 PM. Reason: Edit title
The Following 268 Users Say Thank You to Entropy512 For This Useful Post: [ View ]
6th May 2012, 02:51 PM   |  #2  
reinbeau's Avatar
Retired Forum Moderator
Flag South of Boston, MA
Thanks Meter: 4,691
 
7,373 posts
Join Date:Joined: Sep 2010
More
Additional information
From this post
Quote:
Originally Posted by Elle233

I know this tread is not about bricks, but your warning must have saved lots of souls from bricks and sorrow. Just wanting to ask, for those currently running ROMs with affected kernels, though, ROMs could be working, could any undetected/unobservable damage possible without one's knowing?

Entropy replies: Possible - With the Siyah 3.1rc6 fiasco, some people had situations where just one or two partitions became hosed up (usually /data)

It could be possible that some devices might only have a few damaged sectors in the eMMC - but I haven't seen confirmed reports of this yet. It has always been my suspicion that deleting large files on affected kernels could be dangerous - but I'm not sure about this.
The Following 23 Users Say Thank You to reinbeau For This Useful Post: [ View ]
15th May 2012, 10:58 PM   |  #3  
OP Senior Recognized Developer
Flag Owego, NY
Thanks Meter: 24,470
 
13,360 posts
Join Date:Joined: Aug 2007
Donate to Me
More
Mod note: I copied this post here from its original thread as it contains pertinent information.

Quote:
Originally Posted by Chainfire

Ok so it still isn't really clear if the problem is still present in LPY, or if the issue is "left-over" due to previous ROMs.

It's all possible. Either way, what I wan't to make absolutely clear again:

If the problem is present in CF-Root LPY (and it's not a "leftover" problem from previous ROM - this is possibly the case), then it's also present in stock LPY. This means you *can* randomly brick your phone even if fully booted into Android, under heavy I/O load.

(on the Note there normal and recovery kernels are the same, and CWM is just a userspace program, nothing special(!) - so if the problem can happen in CWM, it can happen in Android)

Additional information: I've seen at least one report of someone's device dying after a wipe in stock recovery, and one report of someone's device dying by doing a factory reset in Settings.

Which confirms everything you just said.
Last edited by reinbeau; 15th May 2012 at 11:36 PM.
The Following 16 Users Say Thank You to Entropy512 For This Useful Post: [ View ]
18th May 2012, 03:55 PM   |  #4  
garwynn's Avatar
Forum Moderator / Recognized Developer
Flag Chi-town
Thanks Meter: 7,597
 
4,698 posts
Join Date:Joined: Jul 2011
Donate to Me
More
Latest from Android Team
Well, it's been some time but thankfully Mr. Sumrall from Android did get back to us on our questions. I think the community will find that this was worth the wait.

Issue: fwrev not set properly.
As we suspected the bugfix is not in our build. (The patch applies this unconditionally.)
Quote:
Originally Posted by Ken Sumrall

The patch includes a line in mmc.c setting fwrev to the rights bits from the cid register. Before this patch, the file /sys/class/block/mmcblk0/device/fwrev was not initialized from the CID for emmc devices rev 4 and greater, and thus showed zero.

(On second inquiry)
fwrev is zero until the patch is applied.

Question: Revision didn't match the fix
(Emphasis mine in red as it discusses the superbrick issue.)
Quote:
Originally Posted by Ken Sumrall

You probably have the bug, but rev 0x19 was a previous version of the firmware we had in our prototype devices, but we found it had another bug that if you issued an mmc erase command, it could screw up the data structures in the chip and lead to the device locking up until it was powered cycled. We discovered this when many of our developers were doing a fastboot erase userdata while we were developing ICS. So Samsung fixed the problem and moved to firmware revision 0x25. Yes, it is very annoying that 0x19 is decimal 25, and that led to lots of confusion when trying to diagnose emmc firmware issues. I finally learned to _ALWAYS_ refer to emmc version in hexadecimal, and precede the number with 0x just to be unambiguous.

However, even though 0x19 probably has the bug that can insert 32 Kbytes of zeros into the flash, you can't use this patch on devices with firmware revision 0x19. This patch does a very specific hack to two bytes of code in the revision 0x25 firmware, and the patch most likely will not work on 0x19, and will probably cause the chip to malfunction at best, and lose data at worst. There is a reason the selection criteria are so strict for applying this patch to the emmc firmware.

I passed on our results a few days later mentioning that the file system didn't corrupt until the wipe. This is a response to that follow-up.

As I mentioned in the previous post, firmware rev 0x19 has a bug where the emmc chip can lockup after an erase command is given. Not every time, but often enough. Usually, the device can reboot after this, but then lockup during the boot process. Very rarely, it can lockup even before fastboot is loaded. Your tester was unlucky. Since you can't even start fastboot, the device is probably bricked. :( If he could run fastboot, then the device could probably be recovered with the firmware update code I have, assuming I can share it. I'll ask.

Question: Why the /data partition?
Quote:
Originally Posted by Ken Sumrall (Android SE)

Because /data is the place the chip that experiences the most write activity. /system is never written to (except during an system update) and /cache is rarely used (mostly to receiving OTAs).

Question: Why JTAG won't work?
Quote:
Originally Posted by Ken Sumrall

As I mention above, the revision 0x19 firmware had a bug that after an emmc erase command, it could leave the internal data structures of the emmc chip in a bad state that cause the chip to lock up when a particular sector was accessed. The only fix was to wipe the chip, and update the firmware. I have code to do that, but I don't know if I can share it. I'll ask.

Question: Can a corrupted file system be repaired (on the eMMC)?
Quote:
Originally Posted by Ken Sumrall

e2fsck can repair the filesystem, but often the 32 Kbytes were inserted at the start of a block group, which erased many inodes, and thus running e2fsck would often result in many files getting lost.

So, while the fix doesn't apply to us at the moment, we've been given a great insight into the superbrick issue as well as information that a fix is already developed (hopefully we'll see it released!). The bug likely applies to us and assuming the fix for the 0x19 firmware is given then it would apply to our devices.

On a lighter note, I wanted to include his close:
Quote:
Originally Posted by Ken Sumrall

You are getting a glimpse into the exciting life of an Android kernel developer. :) Turns out the job is mostly fighting with buggy hardware. At least, it seems that way sometimes.

The Following 20 Users Say Thank You to garwynn For This Useful Post: [ View ]
18th May 2012, 05:54 PM   |  #5  
OP Senior Recognized Developer
Flag Owego, NY
Thanks Meter: 24,470
 
13,360 posts
Join Date:Joined: Aug 2007
Donate to Me
More
Quote:
Originally Posted by garwynn

Well, it's been some time but thankfully Mr. Sumrall from Android did get back to us on our questions. I think the community will find that this was worth the wait.

Issue: fwrev not set properly.
As we suspected the bugfix is not in our build. (The patch applies this unconditionally.)


Question: Revision didn't match the fix
(Emphasis mine in red as it discusses the superbrick issue.)


Question: Why the /data partition?


Question: Why JTAG won't work?


Question: Can a corrupted file system be repaired (on the eMMC)?


So, while the fix doesn't apply to us at the moment, we've been given a great insight into the superbrick issue as well as information that a fix is already developed (hopefully we'll see it released!). The bug likely applies to us and assuming the fix for the 0x19 firmware is given then it would apply to our devices.

On a lighter note, I wanted to include his close:

BOOYA! (unfortunately, my quote of your post likely doesn't include Mr. Sumrall's comments)

So, in short - the AOSP patch in 4.0.4 is not relevant to our devices, HOWEVER:

There is a known issue that does affect our devices.

A bug with the ERASE command is EXACTLY what one would expect to cause this kind of behavior.

The remaining question is:
Gingerbread kernels have MMC_CAP_ERASE enabled for the MMC driver- why don't they cause problems, unless there's something somewhere else that prevents the ERASE function from being used even though the controller supports it?

This also explains why I9100 update4 is safe - one of the major differences between I9100 update4 and SHW-M250* sources is that MMC_CAP_ERASE is enabled in the SHW-M250* sources and not in I9100 update4.
The Following 14 Users Say Thank You to Entropy512 For This Useful Post: [ View ]
19th May 2012, 02:42 PM   |  #6  
reinbeau's Avatar
Retired Forum Moderator
Flag South of Boston, MA
Thanks Meter: 4,691
 
7,373 posts
Join Date:Joined: Sep 2010
More
This post on the portal is a very informative read. The take away quote for me:

Quote:

Originally Posted by Ken Sumrall
You are getting a glimpse into the exciting life of an Android kernel developer. Turns out the job is mostly fighting with buggy hardware. At least, it seems that way sometimes.
Please stay clear from flashing anything ICS onto your devices until this has been solved.

The Following 14 Users Say Thank You to reinbeau For This Useful Post: [ View ]
Thread Closed Subscribe to Thread
Previous Thread Next Thread
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes


Top Threads in Galaxy Note GT-N7000 Android Development by ThreadRank