4.0.3 LPY Bug Discussion (STOP POSTING eMMC CID data)

Still Using LPY?

  • Yes, it rocks. no problems whatsover.

    Votes: 139 92.1%
  • No, im really scared and reverted to GB.

    Votes: 12 7.9%

  • Total voters
    151
  • Poll closed .

Sine.

Senior Member
Jan 5, 2011
623
391
0
😁
^ lol

VYL00M
0x0
0x0
0x000015
0x0100
02/2012
MMC
0x08e872a1

15010056594c30304d1908e872a12f41

Hi Guys,

Would you be shedding any light on what your looking at and the fix you may have with the date that myself and over xda'rs have provided you with. What are the trends besides the chip being the same but manufacturing date being different.
See here : Discussion thread for /data EMMC corruption bug
 
Last edited:

garwynn

Retired Forum Moderator / Inactive Recognized Deve
Jul 30, 2011
5,182
8,589
0
NE Ohio
www.extra-life.org
Hi Guys,

Would you be shedding any light on what your looking at and the fix you may have with the date that myself and over xda'rs have provided you with. What are the trends besides the chip being the same but manufacturing date being different.
It's a long discussion but here's the thread with our discussion of it:
http://forum.xda-developers.com/showthread.php?t=1644364

It started when we saw something a little unexpected - the Epic 4G Touch test builds of ICS jumped from 4.0.3 to 4.0.4. Naturally curiosity got the best of me and so I started looking at the changelog - that's when we found this.

Since it's only in 4.0.4_r1.1 and higher it may explain why ICS has taken so long to get out to devices. Why push it out if you know there's a prevalent bug out there? It seems prudent to fix that first and then push out a stable release.

---------- Post added at 07:37 PM ---------- Previous post was at 07:34 PM ----------

Thanks to all that posted the results - I think I've got enough results now. I'll pass the results on and hopefully this will help them fix this problem!

If anyone should find though that they have 2 characters other than 19 in that cid string, please post it here.
 

Entropy512

Senior Recognized Developer
Aug 31, 2007
14,095
25,085
0
Owego, NY
It's a long discussion but here's the thread with our discussion of it:
http://forum.xda-developers.com/showthread.php?t=1644364

It started when we saw something a little unexpected - the Epic 4G Touch test builds of ICS jumped from 4.0.3 to 4.0.4. Naturally curiosity got the best of me and so I started looking at the changelog - that's when we found this.

Since it's only in 4.0.4_r1.1 and higher it may explain why ICS has taken so long to get out to devices. Why push it out if you know there's a prevalent bug out there? It seems prudent to fix that first and then push out a stable release.

---------- Post added at 07:37 PM ---------- Previous post was at 07:34 PM ----------

Thanks to all that posted the results - I think I've got enough results now. I'll pass the results on and hopefully this will help them fix this problem!

If anyone should find though that they have 2 characters other than 19 in that cid string, please post it here.
The thing is:
1) Vendor kernel trees are almost NEVER synced to AOSP - it's frequent that Samsung kernels for non-Nexus devices are missing patches present in AOSP kernel trees for Nexus devices
2) The failure mode described for that bug is different from ours - the failure mode in question wouldn't cause eMMC blocks to become unwritable.
3) It's a fix for the Galaxy Nexus, which never encountered this particular bug, which is more evidence that the patch in 4.0.4 is unrelated to Superbrick

(and as I understand it - your latest releases are still exhibiting Superbrick behavior)
 

jgaviota

Senior Member
Jan 4, 2008
383
63
48
VYLOOM
0x0
0x0
0x000015
0x0100
02/2012
MMC
0x149c5998
15010056594c30304d19149c59982f53

---------- Post added at 04:31 PM ---------- Previous post was at 04:12 PM ----------

What I think is happening is that the information about the partitions is being corrupted by the format, in a way that cannot easily be recovered.

As far as I can see ALL the bricks resulted from a format, either via a wipe from recovery or from the install script of Stunner.

Was there any change in the way a format is made from a GB kernel to an ICS one?

If the partition info that gets corrupted is from a partition that does not hold the pbl or sbl then the phone can get into download mode or recovery, but if the partition info that gets corrupted is for any of those two then you have a black screen hard brick.

Has anyone that has a bootloop brick (where you can still go into Download mode) tried repartitioning from latest Heimdall from Linux?

If it is not the partition info it's something like that, maybe some kind of access table...

Anyone has any technical info on the emmc chip used here, and how the partitions are stored? Also any info on the booting process for our exynos? Has AdamOutler made an UnbrickableMod for any of the afflicted devices?

An UnbrickableMod could help us by giving lower level access to the boot process.

BTW I'm an electronics engineer, but I don't have access to the documentation, or to the right tools, and also I don't have the soldering skills to try to develope an UnbrickableMod.
 
Last edited:

tbong777

Senior Member
Sep 27, 2010
2,335
350
0
Myrtle Beach
Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation. All rights reserved.

C:\Users\Tom>cd \android-sdk-windows\platform-tools

C:\android-sdk-windows\platform-tools>adb shell
[email protected]:/ # su
su
[email protected]:/ # cd /sys/class/block/mmcblk0/device
cd /sys/class/block/mmcblk0/device
[email protected]:/sys/class/block/mmcblk0/device # cat name hwrev fwrev manfid oe
date type serial
wrev manfid oemid date type serial


VYL00M
0x0
0x0
0x000015
0x0100
09/2011
MMC
0x15584faf



[email protected]:/sys/class/block/mmcblk0/device # cat cid
cat cid
15010056594c30304d1915584faf9edd
 

garwynn

Retired Forum Moderator / Inactive Recognized Deve
Jul 30, 2011
5,182
8,589
0
NE Ohio
www.extra-life.org
The thing is:
1) Vendor kernel trees are almost NEVER synced to AOSP - it's frequent that Samsung kernels for non-Nexus devices are missing patches present in AOSP kernel trees for Nexus devices
2) The failure mode described for that bug is different from ours - the failure mode in question wouldn't cause eMMC blocks to become unwritable.
3) It's a fix for the Galaxy Nexus, which never encountered this particular bug, which is more evidence that the patch in 4.0.4 is unrelated to Superbrick

(and as I understand it - your latest releases are still exhibiting Superbrick behavior)
Thanks for the reply.

Trying to address #1 by asking those who can submit bug reports to Sammy - at least on the E4GT leaks - to do so. Since I don't have any direct contact with their dev teams I can only speculate at best. What you mentioned is plausible - but if they merged the changes to roll up to 4.0.4 on the E4GT they had to merge from somewhere. I can't see it in the original 4.0.4 (r1.1 and above only) so it is possible that this was missed.

However, there is a way to bypass their kernel build - and I've been trying to look into that now. Those that take the kernel source from AOSP and include should have the fix - and at the least we should see the fwrev if populating correctly. If we verify that it's in and still don't see this then it's a strong indication that something is still not right.

For the AOSP kernel I'm still hoping we'll get something back from Mr. Sumrall. Since he got the code originally from Samsung I'm hoping he's taking the notes back to them.

#2) I'm going to give a little on this given I'm the amateur among professionals.

If a Google SE - with 20+ years experience (based on his LinkedIn profile) and extensive knowledge on his OS - says this would corrupt the file system to an uncorrectable point I have to give credence to it. The fact that the fix came from Samsung and he applied it would also suggest he's had at least some, if not extended, dialogue with their dev team on this issue.

Please don't misunderstand this as trying to dismiss your credentials either. It's why I didn't believe that it's a physical issue to begin with - and I was always taught in my college classes that short of physical damage it's feasible to fix it.

So here's where I think it stands: It's feasible but the sure-fire solution hasn't been developed yet. I think that would involve modifying the eMMC's embedded controller firmware and without specs on that we're speculating at best whether that's even possible or not. We have a hint in the code that the workaround was the best they could do, which would suggest the controller's firmware either cannot be modified at all or easily. There may be a way to manipulate the controller to get the desired effect - but that will probably have to come from Samsung unless they release the specs to the public.

Sorry for the long post - I believe it too is unlikely that it's impossible... but I can believe that with eMMC specifications around 3 years old and apparently Sammy not conforming to that (as mentioned in the code fix) that it is plausible that they simply haven't found a way to fix it permanently yet.

#3 has been bugging me for a while because I know it's not the AOSP that I'm looking at. I've gotten from the AOSP changelog that it's there - I'm just linking to a bitbucket that had it - but I haven't been able to get repo to work from my office to pull the AOSP repository. And yes, there are multiple reports of file system corruption on the Nexus - took me less than 5 minutes to find similar examples by searching.

Still working on this and if I don't see it there it would explain a lot. It may end up waiting until Samsung releases their source to confirm.

---------- Post added at 06:16 PM ---------- Previous post was at 06:12 PM ----------

BTW I'm an electronics engineer, but I don't have access to the documentation, or to the right tools, and also I don't have the soldering skills to try to develope an UnbrickableMod.
Didn't realize that several of you are EEs - just you were the first to come out and say it. I think having documentation or specs would go a long way to getting past speculation and into a solution. Doubt it will happen though unless Samsung decides it's worth getting the community involved to try and get a permanent fix designed. (Best guess is that they don't want to release the IP associated with the eMMC and embedded controller specifications)

Hope I'm not patronizing you guys. I'll just say I know I'm the amateur here and just using the grey matter between the ears as I was taught to do and trying to figure this out. To me it shows promise - and in the absence of any other solution it doesn't hurt to use this as a good point to better understand both the problem and possible ways to solve it.
 

mikhil92

Member
May 8, 2012
27
0
0
28
Mumbai
ICS

Well... ive used Ics for a week now...
it runs perfect.. but am juz not satisfied with ICS itself.. too much battery drain n phone gets heated way too easily... wud suggest ppl to go back to GB
 

prabhu1980

Senior Member
Dec 13, 2009
288
127
0
Indore
Hi All,

I feel this emmc bug is more related to partitioning rather than frying up of EMMC.

I hate to listen the idea of frying. I have never fried up my harddisks earlier.

I m looking for a solid proof of frying ...

a) All posts says maybe a bug in emmc driver, might have fried up emmc.

b) 32kb of zeroes ... Insert and Prove it Boss.

I am tired and sick of listening to hardware issues because I have fried up my harddisks earlier with bad boot sectors due to injuries and I have restarted 100% because there were many users who fried their memory disks.

Now a few users have replaced their MoBo too fast I would say.
(Please understand I know the pain of loss - I may be the first to change the Galaxy Note Mobo - Mar '12 - LP1 bug)

How many are there to backup this theory ???

Please dont hate me because I am having a different theory.
I have some queries which I would like to discuss.


I am no one close to even a Low level developer. But believe me I am logical .


Epic4g forums are discussing the bug very close to this manner.
 

Braxos

Senior Member
Nov 18, 2011
693
144
73
Now a few users have replaced their MoBo too fast I would say.
(Please understand I know the pain of loss - I may be the first to change the Galaxy Note Mobo - Mar '12 - LP1 bug)
Hmm so it was lp1 thought it was la1. Good to know I wasn't along on it, took a month to get a 100% functional phone back from service. Pathetic simple.


Sent from my GT-N7000 using Tapatalk 2
 

PoisonWolf

Senior Member
Feb 8, 2009
2,166
274
0
litigation for what? people using modified kernels then having devices brick?
Litigation that the stock kernel is exhibiting the superbrick bug when the phone is under heavy i/o load (see stickied topic). Thus, wiping the phone even from the phone's stock recovery, or even resetting it via settings USING stock kernel for ICS can result in the phone superbricking.

In my mind, that's definitely justified insofar that folks are not allowed to get their motherboards replaced free-of-charge.
 

garwynn

Retired Forum Moderator / Inactive Recognized Deve
Jul 30, 2011
5,182
8,589
0
NE Ohio
www.extra-life.org
Update

Ken Sumrall from Android has gotten back to us with a lot of good information on the Superbrick. More coming soon on the Epic 4G Touch thread that I linked before - I'll repost here immediately after.
 

garwynn

Retired Forum Moderator / Inactive Recognized Deve
Jul 30, 2011
5,182
8,589
0
NE Ohio
www.extra-life.org
Latest from Android Team

Well, it's been some time but thankfully Mr. Sumrall from Android did get back to us on our questions. I think the community will find that this was worth the wait.

Issue: fwrev not set properly.
As we suspected the bugfix is not in our build. (The patch applies this unconditionally.)
Ken Sumrall said:
The patch includes a line in mmc.c setting fwrev to the rights bits from the cid register. Before this patch, the file /sys/class/block/mmcblk0/device/fwrev was not initialized from the CID for emmc devices rev 4 and greater, and thus showed zero.

(On second inquiry)
fwrev is zero until the patch is applied.
Question: Revision didn't match the fix
(Emphasis mine in red as it discusses the superbrick issue.)
Ken Sumrall said:
You probably have the bug, but rev 0x19 was a previous version of the firmware we had in our prototype devices, but we found it had another bug that if you issued an mmc erase command, it could screw up the data structures in the chip and lead to the device locking up until it was powered cycled. We discovered this when many of our developers were doing a fastboot erase userdata while we were developing ICS. So Samsung fixed the problem and moved to firmware revision 0x25. Yes, it is very annoying that 0x19 is decimal 25, and that led to lots of confusion when trying to diagnose emmc firmware issues. I finally learned to _ALWAYS_ refer to emmc version in hexadecimal, and precede the number with 0x just to be unambiguous.

However, even though 0x19 probably has the bug that can insert 32 Kbytes of zeros into the flash, you can't use this patch on devices with firmware revision 0x19. This patch does a very specific hack to two bytes of code in the revision 0x25 firmware, and the patch most likely will not work on 0x19, and will probably cause the chip to malfunction at best, and lose data at worst. There is a reason the selection criteria are so strict for applying this patch to the emmc firmware.

I passed on our results a few days later mentioning that the file system didn't corrupt until the wipe. This is a response to that follow-up.

As I mentioned in the previous post, firmware rev 0x19 has a bug where the emmc chip can lockup after an erase command is given. Not every time, but often enough. Usually, the device can reboot after this, but then lockup during the boot process. Very rarely, it can lockup even before fastboot is loaded. Your tester was unlucky. Since you can't even start fastboot, the device is probably bricked. :-( If he could run fastboot, then the device could probably be recovered with the firmware update code I have, assuming I can share it. I'll ask.
Question: Why the /data partition?
Ken Sumrall (Android SE) said:
Because /data is the place the chip that experiences the most write activity. /system is never written to (except during an system update) and /cache is rarely used (mostly to receiving OTAs).
Question: Why JTAG won't work?
Ken Sumrall said:
As I mention above, the revision 0x19 firmware had a bug that after an emmc erase command, it could leave the internal data structures of the emmc chip in a bad state that cause the chip to lock up when a particular sector was accessed. The only fix was to wipe the chip, and update the firmware. I have code to do that, but I don't know if I can share it. I'll ask.
Question: Can a corrupted file system be repaired (on the eMMC)?
Ken Sumrall said:
e2fsck can repair the filesystem, but often the 32 Kbytes were inserted at the start of a block group, which erased many inodes, and thus running e2fsck would often result in many files getting lost.
So, while the fix doesn't apply to us at the moment, we've been given a great insight into the superbrick issue as well as information that a fix is already developed (hopefully we'll see it released!). The bug likely applies to us and assuming the fix for the 0x19 firmware is given then it would apply to our devices.

On a lighter note, I wanted to include his close:
Ken Sumrall said:
You are getting a glimpse into the exciting life of an Android kernel developer. :) Turns out the job is mostly fighting with buggy hardware. At least, it seems that way sometimes.