Attend XDA's Second Annual Developer Conference, XDA:DevCon 2014!
5,810,194 Members 51,587 Now Online
XDA Developers Android and Mobile Development Forum

***WARNING!!! PLEASE READ! ICS Kernels and Recovery

Tip us?
 
Entropy512
Old
(Last edited by Entropy512; 6th August 2012 at 12:51 PM.) Reason: edit title
#1  
Senior Recognized Developer - OP
Thanks Meter 24,366
Posts: 13,270
Join Date: Aug 2007
Location: Owego, NY

 
DONATE TO ME
Default ***WARNING!!! PLEASE READ! ICS Kernels and Recovery

I have realized this post was sorely outdated. I have updated it partially, but more updates are needed.

Reinbeau asked me to summarize the situation with ICS leaks in a way that can be stickied, so here it is:

DO NOT FLASH OR WIPE USING ANY ICS REPACK OR STOCK KERNEL

While stock recovery is safer than CWM recovery, there is still evidence that it is dangerous.. Almost all ICS kernels for the GT-N7000 are affected. Only two are currently known to be safe - see the list below.

These kernels are fundamentally dangerous. They can trigger an underlying defect in the flash chip, and once this defect is triggered, the damage is unrepairable, not even with JTAG. Note that for danger, three things are needed:
1) A defective eMMC chip - Nearly every Note has this
2) A kernel that allows MMC ERASE commands to go through - All Samsung ICS stock kernels and leaks fall into this category
3) A recovery or update-binary within a ZIP that attempts to erase a partition when formatting. Unmodified CWM performs a secure erase, which is the most dangerous. Stock recovery performs a nonsecure erase, which is less dangerous but there is still evidence of danger.

The issue is not limited to just Clockworkmod Recovery - Stock recovery, along with factory resetting a device from Settings, can also be dangerous. When XXLPY first came out, a number of people bricked this way. Stock recovery is less dangerous due to using nonsecure erase rather than secure erase, but it has yet to be proven to be fully safe.

Kernels that have been confirmed affected are:
All ICS leaks for the Samsung Epic 4G Touch (SPH-D710)
All ICS leaks for the Samsung Galaxy Note (GT-N7000)
All ICS official releases for the Samsung Galaxy Note (GT-N7000) as of late May 2012 - This includes XXLPY, ZSLPF, and DXLP9, and future kernels should be assumed affected until further notice.
UCLD3 ICS leak for the AT&T Samsung Galaxy S II (SGH-I777) - Other leaks may also be affected
Kernels built using the most recent SHW-M250S/K/L official source code release as of May 3, 2012 - This includes SiyahKernel 3.1rc6 for GT-I9100 (all other Siyah releases are safe)

Damage is not guaranteed - it may only affect a small percentage of users, but even a 5% chance is far more dangerous than the effectively 0% chance of hardbricking due to kernel bugs in safe kernels.
Not all users hardbrick - some wind up with /system, /data, or another partition becoming unwriteable, which leads to an effectively useless phone even though they are able to flash kernels in Odin.

Kernels that have been confirmed safe:

All known Gingerbread kernels for the Galaxy Note and other affected devices listed above
Kernels built from the GT-I9100 Update4 source code release - this includes XplodWILD's CM9 release and my DAFUQ release, hopefully more kernel choices will become available soon
Kernels with MMC_CAP_ERASE removed from mshci.c should be safe - look for it in the listed features of any kernel based off of N7000 Update3. (N7000 Update3 source code without this change made to render it safe is dangerous.)

If you are running an affected kernel:
STOP USING IT IMMEDIATELY. FLASH A SAFE KERNEL USING ODIN/HEIMDALL.
DO NOT wipe in recovery
DO NOT flash anything else in recovery
In general, DO NOT use recovery at all

Right now, what we know:
Some people can wipe with affected kernels as often as they want without problems. Just because you didn't brick, DO NOT advise other users that they will be OK.
Based on reports from the Epic 4G Touch community, some people can wipe/flash 20-30 times before hardbricking - Just because you didn't brick once, DO NOT continue flashing with an affected kernel

What we don't know:
*this needs to be completely updated*

More info:
http://forum.xda-developers.com/show....php?t=1607112 - Hardbricks on I777 UCLD3
http://forum.xda-developers.com/show....php?t=1615058 - Issues with SHW-M250L Update4
http://forum.xda-developers.com/show...&postcount=159 - Information indicating that our eMMC chip has a serious firmware bug. All of the issues with fwrev 0x19 match our symptoms PERFECTLY. It explains partially why I9100 update4 is safe - MMC_CAP_ERASE is not enabled in the I9100 update4 MMC driver.
*so much sig updating needed*

My Github profile - Some Android stuff, some AVR stuff

An excellent post on "noobs vs. developers"

A few opinions on kernel development "good practices"

Note: I have chosen not to use XDA's "friends" feature - I will reject all incoming "friend" requests.

Code:
<MikeyMike01> Smali is a spawn of hell
<shoman94> ^^^ +!
Code:
<Entropy512> gotta be careful not to step on each other's work.  :)
<Bumble-Bee> thats true
<jerdog> compeete for donations
The Following 90 Users Say Thank You to Entropy512 For This Useful Post: [ Click to Expand ]
 
reinbeau
Old
#2  
reinbeau's Avatar
Retired Forum Moderator
Thanks Meter 4,690
Posts: 7,373
Join Date: Sep 2010
Location: South of Boston, MA
Default Additional information

From this post
Quote:
Originally Posted by Elle233 View Post
I know this tread is not about bricks, but your warning must have saved lots of souls from bricks and sorrow. Just wanting to ask, for those currently running ROMs with affected kernels, though, ROMs could be working, could any undetected/unobservable damage possible without one's knowing?
Entropy replies: Possible - With the Siyah 3.1rc6 fiasco, some people had situations where just one or two partitions became hosed up (usually /data)

It could be possible that some devices might only have a few damaged sectors in the eMMC - but I haven't seen confirmed reports of this yet. It has always been my suspicion that deleting large files on affected kernels could be dangerous - but I'm not sure about this.
Ann - Not a man

Did you read the first post of the thread?

The real purpose of XDA explained in one post - Click here!

| Samsung US/Canada GSM|

Μολὼν Λαβέ
The Following 5 Users Say Thank You to reinbeau For This Useful Post: [ Click to Expand ]
 
Entropy512
Old
#3  
Senior Recognized Developer - OP
Thanks Meter 24,366
Posts: 13,270
Join Date: Aug 2007
Location: Owego, NY

 
DONATE TO ME
Mod note: I copied this post here from its original thread as it contains pertinent information.

Quote:
Originally Posted by Chainfire View Post
Ok so it still isn't really clear if the problem is still present in LPY, or if the issue is "left-over" due to previous ROMs.

It's all possible. Either way, what I wan't to make absolutely clear again:

If the problem is present in CF-Root LPY (and it's not a "leftover" problem from previous ROM - this is possibly the case), then it's also present in stock LPY. This means you *can* randomly brick your phone even if fully booted into Android, under heavy I/O load.

(on the Note there normal and recovery kernels are the same, and CWM is just a userspace program, nothing special(!) - so if the problem can happen in CWM, it can happen in Android)
Additional information: I've seen at least one report of someone's device dying after a wipe in stock recovery, and one report of someone's device dying by doing a factory reset in Settings.

Which confirms everything you just said.
*so much sig updating needed*

My Github profile - Some Android stuff, some AVR stuff

An excellent post on "noobs vs. developers"

A few opinions on kernel development "good practices"

Note: I have chosen not to use XDA's "friends" feature - I will reject all incoming "friend" requests.

Code:
<MikeyMike01> Smali is a spawn of hell
<shoman94> ^^^ +!
Code:
<Entropy512> gotta be careful not to step on each other's work.  :)
<Bumble-Bee> thats true
<jerdog> compeete for donations
The Following 3 Users Say Thank You to Entropy512 For This Useful Post: [ Click to Expand ]
 
garwynn
Old
#4  
garwynn's Avatar
Forum Moderator / Recognized Developer
Thanks Meter 7,470
Posts: 4,667
Join Date: Jul 2011
Location: Chi-town

 
DONATE TO ME
Default Latest from Android Team

Well, it's been some time but thankfully Mr. Sumrall from Android did get back to us on our questions. I think the community will find that this was worth the wait.

Issue: fwrev not set properly.
As we suspected the bugfix is not in our build. (The patch applies this unconditionally.)
Quote:
Originally Posted by Ken Sumrall
The patch includes a line in mmc.c setting fwrev to the rights bits from the cid register. Before this patch, the file /sys/class/block/mmcblk0/device/fwrev was not initialized from the CID for emmc devices rev 4 and greater, and thus showed zero.

(On second inquiry)
fwrev is zero until the patch is applied.
Question: Revision didn't match the fix
(Emphasis mine in red as it discusses the superbrick issue.)
Quote:
Originally Posted by Ken Sumrall
You probably have the bug, but rev 0x19 was a previous version of the firmware we had in our prototype devices, but we found it had another bug that if you issued an mmc erase command, it could screw up the data structures in the chip and lead to the device locking up until it was powered cycled. We discovered this when many of our developers were doing a fastboot erase userdata while we were developing ICS. So Samsung fixed the problem and moved to firmware revision 0x25. Yes, it is very annoying that 0x19 is decimal 25, and that led to lots of confusion when trying to diagnose emmc firmware issues. I finally learned to _ALWAYS_ refer to emmc version in hexadecimal, and precede the number with 0x just to be unambiguous.

However, even though 0x19 probably has the bug that can insert 32 Kbytes of zeros into the flash, you can't use this patch on devices with firmware revision 0x19. This patch does a very specific hack to two bytes of code in the revision 0x25 firmware, and the patch most likely will not work on 0x19, and will probably cause the chip to malfunction at best, and lose data at worst. There is a reason the selection criteria are so strict for applying this patch to the emmc firmware.

I passed on our results a few days later mentioning that the file system didn't corrupt until the wipe. This is a response to that follow-up.

As I mentioned in the previous post, firmware rev 0x19 has a bug where the emmc chip can lockup after an erase command is given. Not every time, but often enough. Usually, the device can reboot after this, but then lockup during the boot process. Very rarely, it can lockup even before fastboot is loaded. Your tester was unlucky. Since you can't even start fastboot, the device is probably bricked. :( If he could run fastboot, then the device could probably be recovered with the firmware update code I have, assuming I can share it. I'll ask.
Question: Why the /data partition?
Quote:
Originally Posted by Ken Sumrall (Android SE)
Because /data is the place the chip that experiences the most write activity. /system is never written to (except during an system update) and /cache is rarely used (mostly to receiving OTAs).
Question: Why JTAG won't work?
Quote:
Originally Posted by Ken Sumrall
As I mention above, the revision 0x19 firmware had a bug that after an emmc erase command, it could leave the internal data structures of the emmc chip in a bad state that cause the chip to lock up when a particular sector was accessed. The only fix was to wipe the chip, and update the firmware. I have code to do that, but I don't know if I can share it. I'll ask.
Question: Can a corrupted file system be repaired (on the eMMC)?
Quote:
Originally Posted by Ken Sumrall
e2fsck can repair the filesystem, but often the 32 Kbytes were inserted at the start of a block group, which erased many inodes, and thus running e2fsck would often result in many files getting lost.
So, while the fix doesn't apply to us at the moment, we've been given a great insight into the superbrick issue as well as information that a fix is already developed (hopefully we'll see it released!). The bug likely applies to us and assuming the fix for the 0x19 firmware is given then it would apply to our devices.

On a lighter note, I wanted to include his close:
Quote:
Originally Posted by Ken Sumrall
You are getting a glimpse into the exciting life of an Android kernel developer. :) Turns out the job is mostly fighting with buggy hardware. At least, it seems that way sometimes.
The Following 7 Users Say Thank You to garwynn For This Useful Post: [ Click to Expand ]
 
Entropy512
Old
#5  
Senior Recognized Developer - OP
Thanks Meter 24,366
Posts: 13,270
Join Date: Aug 2007
Location: Owego, NY

 
DONATE TO ME
Quote:
Originally Posted by garwynn View Post
Well, it's been some time but thankfully Mr. Sumrall from Android did get back to us on our questions. I think the community will find that this was worth the wait.

Issue: fwrev not set properly.
As we suspected the bugfix is not in our build. (The patch applies this unconditionally.)


Question: Revision didn't match the fix
(Emphasis mine in red as it discusses the superbrick issue.)


Question: Why the /data partition?


Question: Why JTAG won't work?


Question: Can a corrupted file system be repaired (on the eMMC)?


So, while the fix doesn't apply to us at the moment, we've been given a great insight into the superbrick issue as well as information that a fix is already developed (hopefully we'll see it released!). The bug likely applies to us and assuming the fix for the 0x19 firmware is given then it would apply to our devices.

On a lighter note, I wanted to include his close:
BOOYA! (unfortunately, my quote of your post likely doesn't include Mr. Sumrall's comments)

So, in short - the AOSP patch in 4.0.4 is not relevant to our devices, HOWEVER:

There is a known issue that does affect our devices.

A bug with the ERASE command is EXACTLY what one would expect to cause this kind of behavior.

The remaining question is:
Gingerbread kernels have MMC_CAP_ERASE enabled for the MMC driver- why don't they cause problems, unless there's something somewhere else that prevents the ERASE function from being used even though the controller supports it?

This also explains why I9100 update4 is safe - one of the major differences between I9100 update4 and SHW-M250* sources is that MMC_CAP_ERASE is enabled in the SHW-M250* sources and not in I9100 update4.
*so much sig updating needed*

My Github profile - Some Android stuff, some AVR stuff

An excellent post on "noobs vs. developers"

A few opinions on kernel development "good practices"

Note: I have chosen not to use XDA's "friends" feature - I will reject all incoming "friend" requests.

Code:
<MikeyMike01> Smali is a spawn of hell
<shoman94> ^^^ +!
Code:
<Entropy512> gotta be careful not to step on each other's work.  :)
<Bumble-Bee> thats true
<jerdog> compeete for donations
The Following 4 Users Say Thank You to Entropy512 For This Useful Post: [ Click to Expand ]
 
reinbeau
Old
#6  
reinbeau's Avatar
Retired Forum Moderator
Thanks Meter 4,690
Posts: 7,373
Join Date: Sep 2010
Location: South of Boston, MA
This post on the portal is a very informative read. The take away quote for me:

Quote:
Originally Posted by Ken Sumrall
You are getting a glimpse into the exciting life of an Android kernel developer. Turns out the job is mostly fighting with buggy hardware. At least, it seems that way sometimes.
Please stay clear from flashing anything ICS onto your devices until this has been solved.
Ann - Not a man

Did you read the first post of the thread?

The real purpose of XDA explained in one post - Click here!

| Samsung US/Canada GSM|

Μολὼν Λαβέ
The Following User Says Thank You to reinbeau For This Useful Post: [ Click to Expand ]
THREAD CLOSED
Subscribe
Thread Tools
Display Modes