Discussion thread for /data EMMC lockup/corruption bug

jgaviota · May 21, 2012

Most probably he won't be allowed to give its the firmware fix, and we cannot apply it because we don't have fastboot, so the follow up question must be:

What is the command to wipe or reset the emmc?

Can it be applied by JTag as we don't have fastboot?

Is there any public documentation available?

Here is where an UnbrickableMod would help, as I recall Adam can use lower level commands from his bootloaders

garwynn · May 21, 2012

Robotu said:
The point is that this can be a solution, and a good one. He did what he could. He do not have enough knowledge, and i do not enough, he have been guided through forum remotely by another guy, wich maybe himself could not be enough skilfull. The point is that the guys that can do that, should give some credit to this method and so 99% could be improved by skillful ones in a matter of days. Ive been trying few days ago to gain your attention (the devs) with this theory, but ive been ignored. Mybe you thought that is only the pain of loss.... Ha, ha, ha...! That it will pass and after that i will understand that the only solutin is replacing the MOBO....... Well, eventualy i will, but not so soon. I am not a linux programer, but yet, in mi had do not have enough space to swallow an issue that is not logic...

Actually, I went over those posts - and it seems to be an attempt to do on the Note what drnull did on the E4GT - repartition to avoid the corrupted area. It hasn't been long term tested nor does it fix the underlying problem which we know now. I'm glad that someone read the potential workaround but it's not a fix - and why at least the E4GT community moved on past it.

We have reason to believe a fix (prior to bricking) does exist but is not available to the public. There are even cases where it may be able to save a bricked device without having to replace the mb - but we need more info on it. Oddly enough, AT&T's Note (I believe the D717 is the Note) have the 0x25 firmware on there and so they're not susceptible to the "superbrick". Anyone that has connections at AT&T might want to ask and see if they have the fix...

So please understand this is why it is going a different direction - your potential fix is several months old and was concluded to be a "bandage" at best, not a "cure."

---------- Post added at 03:16 PM ---------- Previous post was at 03:09 PM ----------

jgaviota said:
Most probably he won't be allowed to give its the firmware fix, and we cannot apply it because we don't have fastboot, so the follow up question must be:

What is the command to wipe or reset the emmc?

Can it be applied by JTag as we don't have fastboot?

Is there any public documentation available?

I'm trying every avenue I can to get it released - We're waiting on a reply from Mr. Sumrall and separately from Samsung, and I'm going to pass this info on to Sprint in about an hour. It doesn't even have to get to the community - if it's code and it can just be added to the ICS build I'd be happy with that as a solution.

Regarding the follow up questions:

1) Find the difference between the global build that doesn't brick and the ones that do - and then apply to all builds for affected devices. I believe Entropy has done a lot of legwork on that already and has a good idea on what it may be.

2) Sounds like no - but we really need the patch code itself to send to someone and have them check. I'm holding out hope that it will work...

3) Very little. Most of the legwork done on this problem so far has been by scouring the internet and available Android code... but even that didn't answer everything.

Robotu · May 21, 2012

garwynn said:
Actually, I went over those posts - and it seems to be an attempt to do on the Note what drnull did on the E4GT - repartition to avoid the corrupted area. It hasn't been long term tested nor does it fix the underlying problem which we know now. I'm glad that someone read the potential workaround but it's not a fix - and why at least the E4GT community moved on past it.

We have reason to believe a fix (prior to bricking) does exist but is not available to the public. There are even cases where it may be able to save a bricked device without having to replace the mb - but we need more info on it. Oddly enough, AT&T's Note (I believe the D717 is the Note) have the 0x25 firmware on there and so they're not susceptible to the "superbrick". Anyone that has connections at AT&T might want to ask and see if they have the fix...

So please understand this is why it is going a different direction - your potential fix is several months old and was concluded to be a "bandage" at best, not a "cure."

Well, this is another thing, i am not a developer nor a programer but i understand almost everything, i have a virus called IT, whatever that means. So in general i give up only when i understand why is that, what is and what is not. Most of the things come to me through intuition, so what you just said already was in my mind for a few days by connecting all the info known and unknown, that are all over around as... And here i am not joking.
My questions are as follow:
1) I do not think that the emmc is hardware bricked, just it can not allow acces to it (maybe because, as a russian guy said, of the 32 kb of zeros wrighten wich can not be read it anymore...), that if it can be formated, at a low level with the right tools, it can be revived; or changing the firmware so the controller could wipe and rewright everything....
2) If yet the emmc is hardware bricked, i think it can be repartitioned, eliminating that section that is corupted; that means finding the exact place where the first bad sector starts and where the last bad sector end, livit blank, non formatted, so the controller never read that section, and moving all the partition in the safe area, i think is need it a lot work for somebody to convince me that non of this theory are not possible...

I do not have enough english to say everything but yet.........

Sent from my SGS2 HD LTE SHV-E120L

mobilevil · May 21, 2012

even if this fix can make the system boot,

imho sooner or later the wear leveler will try to use the corrupted area again and freeze

garwynn · May 21, 2012

Robotu said:
My questions are as follow:
1) I do not think that the emmc is hardware bricked, just it can not allow acces to it (maybe because, as a russian guy said, of the 32 kb of zeros wrighten wich can not be read it anymore...), that if it can be formated, at a low level with the right tools, it can be revived; or changing the firmware so the controller could wipe and rewright everything....

2) If yet the emmc is hardware bricked, i think it can be repartitioned, eliminating that section that is corupted; that means finding the exact place where the first bad sector starts and where the last bad sector end, livit blank, non formatted, so the controller never read that section, and moving all the partition in the safe area, i think is need it a lot work for somebody to convince me that non of this theory are not possible...

I do not have enough english to say everything but yet.........

No worries - you're asking the same questions that have been tossed around here for quite a while. I too was a big supporter of the "it's not firmware, it's software..." side of the story until we got some significant information from Mr. Sumrall of the Android team.

There is a misunderstanding - which I also believed until it was proven otherwise - that the 32 Kb of zeros was somehow corrupting the file system to cause the "superbrick." It didn't make sense to anyone... now we know why. I'll repost Mr. Sumrall's summary here:

Ken Sumrall (Android Kernel Team) said:
Firmware rev 0x19 has a bug where the emmc chip can lockup after an erase command is given. Not every time, but often enough. Usually, the device can reboot after this, but then lockup during the boot process. *Very rarely, it can lockup even before fastboot is loaded. Your tester was unlucky. *Since you can't even start fastboot, the device is probably bricked. :-( If he could run fastboot, then the device could probably be recovered with the firmware update code I have, assuming I can share it. *I'll ask.

A locking issue means that we're beyond just file system at this point and back to the embedded controller. If the controller causes a lock up even by trying to access that memory you can't even flag the sector(s) as bad - which is why drnull's solution worked of just repartitioning and avoiding it altogether. There are potential problems when flashing or using ODIN to put something else on as it may try to restore that area and then cause the device to lock again, so flashing would be out. It would also be susceptible to the same problem (in theory) so one could end up corrupting the data system even further... that's bad news.

Possible? Drnull showed it was. Good idea? If you're in a situation where you need a device until you can replace it in the short term, yes. Long term - not recommended.

I'm by no means an "expert" in Android or these devices - many out there have much higher qualifications than I'll probably ever see. But I'm willing to spitball ideas like you are - as well as being open to listening to those more experienced (and honestly probably smarter) than me. It's how I learn as well and improve. And as long as we move the conversation forward I doubt anyone will go against you for doing so. Just please be willing to understand their position as well and why they're coming to these conclusions.

Robotu · May 21, 2012

garwynn said:
No worries - you're asking the same questions that have been tossed around here for quite a while. I too was a big supporter of the "it's not firmware, it's software..." side of the story until we got some significant information from Mr. Sumrall of the Android team.

There is a misunderstanding - which I also believed until it was proven otherwise - that the 32 Kb of zeros was somehow corrupting the file system to cause the "superbrick." It didn't make sense to anyone... now we know why. I'll repost Mr. Sumrall's summary here:

A locking issue means that we're beyond just file system at this point and back to the embedded controller. If the controller causes a lock up even by trying to access that memory you can't even flag the sector(s) as bad - which is why drnull's solution worked of just repartitioning and avoiding it altogether. There are potential problems when flashing or using ODIN to put something else on as it may try to restore that area and then cause the device to lock again, so flashing would be out. It would also be susceptible to the same problem (in theory) so one could end up corrupting the data system even further... that's bad news.

Possible? Drnull showed it was. Good idea? If you're in a situation where you need a device until you can replace it in the short term, yes. Long term - not recommended.

I'm by no means an "expert" in Android or these devices - many out there have much higher qualifications than I'll probably ever see. But I'm willing to spitball ideas like you are - as well as being open to listening to those more experienced (and honestly probably smarter) than me. It's how I learn as well and improve. And as long as we move the conversation forward I doubt anyone will go against you for doing so. Just please be willing to understand their position as well and why they're coming to these conclusions.

Thank you, very much! I do not know you but i already like you, mostly for your pragmatism...

Out of shame question, who is mister Sumrall, and what is his position, cause i guess he is some kind of Samsung employe...?

Sent from my SGS2 HD LTE SHV-E120L

garwynn · May 21, 2012

Robotu said:
Thank you, very much! I do not know you but i already like you, mostly for your pragmatism...

Out of shame question, who is mister Sumrall, and what is his position, cause i guess he is some kind of Samsung employe...?

Sent from my SGS2 HD LTE SHV-E120L

No shame in asking a question. I know he's a programmer for the Android kernel (works for Google) - a quick look at his Linkedin profile indicates he's been there for a few years but been in the industry for a LONG time. I contacted him in particular as he was the one who checked in the source for the original bug fix we were investigating - which meant he had to have enough knowledge on the issue to at least sign off on the code change.

One thing that I have to give to most developers - they're willing to share their knowledge so long as they have time and it doesn't violate any Non-Disclosure Agreements that may be in place. I also think they make great teachers/professors based on my experiences; I've even thought about it when I get much older but still want to share what I've learned.

Robotu · May 21, 2012

Guys, another guy followed the instructions of forest and he have a fully functional device, camera is working, video recording, voice calls, as far he is saying everything is fine...

sfhub · May 21, 2012

Robotu said:
The point is that this can be a solution, and a good one. He did what he could. He do not have enough knowledge, and i do not enough, he have been guided through forum remotely by another guy, wich maybe himself could not be enough skilfull. The point is that the guys that can do that, should give some credit to this method and so 99% could be improved by skillful ones in a matter of days. Ive been trying few days ago to gain your attention (the devs) with this theory, but ive been ignored. Mybe you thought that is only the pain of loss.... Ha, ha, ha...! That it will pass and after that i will understand that the only solutin is replacing the MOBO....... Well, eventualy i will, but not so soon. I am not a linux programer, but yet, in mi had do not have enough space to swallow an issue that is not logic...

It has already been known that you can change the partition tables to work around the bad area. We did that early on (ie months ago) in the odin hangs on data.img thread. The issue is so much of /data was unwritable that we effectively ended up with a few mb of /data. I'm sure we can create kernels that remap /data to the external SD if we wanted to, but I don't consider a real "solution" as you won't be able to get any upgrades. You'll have to always wait for someone to give you custom kernels to workaround the problem.

sfhub · May 21, 2012

Garwynn, not sure if this is in your followup questions, but I would like to make sure Samsung and Google are aware that whatever bugs are in the EMMC firmware, they don't appear to be triggered by the sequence of EMMC commands issued in the GB kernel. They can theoretically avoid all these problems if they spend the time to investigate what changed between GB and ICS for the code path generated by wipe data/factory reset in recovery.

I think most of us are aware of this on this thread, but it may not have been obvious to Samsung and Google because we have been asking specific questions about a fix that got checked in rather than spending time describing the history of the problem we are seeing.

Robotu · May 21, 2012

sfhub said:
It has already been known that you can change the partition tables to work around the bad area. We did that early on (ie months ago) in the odin hangs on data.img thread. The issue is so much of /data was unwritable that we effectively ended up with a few mb of /data. I'm sure we can create kernels that remap /data to the external SD if we wanted to, but I don't consider a real "solution" as you won't be able to get any upgrades. You'll have to always wait for someone to give you custom kernels to workaround the problem.

I understand what all of you are saying, it is very specific and comprehensiv, and maybe if i would have had enough time to read the entire thread i would have come to the same conclusion, but i read it only a few posts (in so many threads) cause trying to understand/fix this phone already charged me more time/money than changing the motherboard... So i am trying to end this adventure cause i am losing more money like this. So please, related to the few posts above, and more specific, what is your frankly and directly diagnostic and recommendation for this situation, i would like to here a few profy and straightly opinions?
Mi device still can do anything else, CWM, DM, adb....., except it can not boot.
Considering what garwynn seize, even if samsung and google will give you more info about firmware and stuff..., even so, the only solution for mine or other Note in the same situation is to replace motherboard, or after you (devs) will have that informations (if ever) you can (and/or you will) certainly find a solution for devices in that stage?
Thanks in advance!

garwynn · May 21, 2012

sfhub said:
Garwynn, not sure if this is in your followup questions, but I would like to make sure Samsung and Google are aware that whatever bugs are in the EMMC firmware, they don't appear to be triggered by the sequence of EMMC commands issued in the GB kernel. They can theoretically avoid all these problems if they spend the time to investigate what changed between GB and ICS for the code path generated by wipe data/factory reset in recovery.

I think most of us are aware of this on this thread, but it may not have been obvious to Samsung and Google because we have been asking specific questions about a fix that got checked in rather than spending time describing the history of the problem we are seeing.

This is the #1 question that I'm going to add. I just want to go over Entropy's notes and see what he's noticed already and then sum it up so we can hopefully point in a direction without telling a long story. I'm still waiting for a response from Samsung and I left a VM for the person at Sprint - I'll be passing this on to them as well. (From Sprint's angle using the GB code is a good solution if they want to go OTA - patching the eMMC would probably be better in the long term if possible.)

I'll give it until tomorrow around 9 am CDT in case anyone else has some lingering questions - then send it off. Figured it was better to make sure everyone gets a chance to catch up after the weekend before doing so.

jgaviota · May 21, 2012

Sorry garwynn, I did not make myself clear...

garwynn said:
I'm trying every avenue I can to get it released - We're waiting on a reply from Mr. Sumrall and separately from Samsung, and I'm going to pass this info on to Sprint in about an hour. It doesn't even have to get to the community - if it's code and it can just be added to the ICS build I'd be happy with that as a solution.

Regarding the follow up questions:

jgaviota said:

Most probably he won't be allowed to give its the firmware fix, and we cannot apply it because we don't have fastboot, so the follow up question must be:

What is the command to wipe or reset the emmc?

Can it be applied by JTag as we don't have fastboot?

Is there any public documentation available?

Here is where an UnbrickableMod would help, as I recall Adam can use lower level commands from his bootloaders

Click to expand...

Click to collapse

1) Find the difference between the global build that doesn't brick and the ones that do - and then apply to all builds for affected devices. I believe Entropy has done a lot of legwork on that already and has a good idea on what it may be.

2) Sounds like no - but we really need the patch code itself to send to someone and have them check. I'm holding out hope that it will work...

3) Very little. Most of the legwork done on this problem so far has been by scouring the internet and available Android code... but even that didn't answer everything.

Please relay those questions to Mr. Sumrall

1) Doesn't have to do with Android source, but actual commands sent to the emmc to reset it.
2 and 3) He should have access to that documentation or know some way to get the documentation.

Also ask him if there is documentation for the development board of the SGSII or Note

Again: with the emmc documentation a fix can be made, ask him if there is some way to get it, even if he cannot release the firmware upgrade

garwynn · May 21, 2012

jgaviota said:
Sorry garwynn, I did not make myself clear...

Please relay those questions to Mr. Sumrall

1) Doesn't have to do with Android source, but actual commands sent to the emmc to reset it.
2 and 3) He should have access to that documentation or know some way to get the documentation.

Also ask him if there is documentation for the development board of the SGSII or Note

Again: with the emmc documentation a fix can be made, ask him if there is some way to get it, even if he cannot release the firmware upgrade

Got it, sorry for any misunderstanding.

Sent from my SPH-D710 using XDA

Entropy512 · May 21, 2012

garwynn said:
This is the #1 question that I'm going to add. I just want to go over Entropy's notes and see what he's noticed already and then sum it up so we can hopefully point in a direction without telling a long story. I'm still waiting for a response from Samsung and I left a VM for the person at Sprint - I'll be passing this on to them as well. (From Sprint's angle using the GB code is a good solution if they want to go OTA - patching the eMMC would probably be better in the long term if possible.)

I'll give it until tomorrow around 9 am CDT in case anyone else has some lingering questions - then send it off. Figured it was better to make sure everyone gets a chance to catch up after the weekend before doing so.

What I've been able to figure out so far:
The main difference between I9100 update4 source and SHW-M250S Update5 source is that in I9100 update4, MMC_CAP_ERASE is not enabled in either the MSHCI drivers or SDHCI drivers. The MSHCI driver is the driver for our internal storage.

Even before Mr. Sumrall's responses, based on the fact this usually happened during a wipe/format, I suspected it was MMC_CAP_ERASE.

However, in Gingerbread kernels for I777/I9100, the MSHCI driver also has MMC_CAP_ERASE enabled. The driver does appear to be heavily customized with Samsung code, however. So I'm kind of baffled as to why MMC_CAP_ERASE seems safe on Gingerbread - Either the customization in the driver works around fwrev 0x19's bug, OR there's something elsewhere in the kernel that causes ERASE commands not to get fired at the chip even though it's enabled in the MSHCI driver.

The fact that Gingerbread was safe caused me to move away from MMC_CAP_ERASE until Mr. Sumrall's response regarding the ERASE bug in 0x19.

Either way, I believe that removing the MMC_CAP_ERASE from the MSHCI driver will render it safe. I'm just curious as to why Gingerbread isn't dangerous despite having this feature flagged as enabled in the MSHCI driver.

What I know for sure:
GT-I9100 Update3 - Gingerbread - MMC_CAP_ERASE enabled but still somehow safe
GT-I9100 Update4 - ICS - MMC_CAP_ERASE disabled and safe
SHW-M250S Update5 - ICS - MMC_CAP_ERASE enabled and dangerous

What we really need at this point is a way for the JTAG services to do a full reset of the chip - Even if they can't update the firmware past 0x19, as long as they can recover the chip to a normal working state, then people can have damaged devices repaired for $30-50 instead of $200+.

RootTheMachine · May 21, 2012

I think this explains the issue we are having on original Galaxy S phones (i9000, Captivate, Vibrant etc.) and other devices & tablets that are getting the ICS "Encryption Unsuccessful" error (having nothing to do with the use of encryption), and in the process, losing /data and the internal sd card. We have already sent an affected device to Adam Outler for UBM, and it wasn't recoverable. Since yesterday, I have been asking people for their fwrev, and they all seem to be 0x0. Mine is also 0x0, but I don't have the issue. We have also had cases where people reboot and everything they once lost is back in place, or at least the partitions are accessible. Is there anything we can do to investigate this further on our devices? We aren't very well informed of the information discussed in your thread, and how it's relevant to us.

These are our threads
http://xdaforums.com/showthread.php?t=1447303
http://xdaforums.com/showthread.php?t=1649613

garwynn · May 21, 2012

korockinout13 said:
I think this explains the issue we are having on original Galaxy S phones (i9000, Captivate, Vibrant etc.) and other devices & tablets that are getting the ICS "Encryption Unsuccessful" error (having nothing to do with the use of encryption), and in the process, losing /data and the internal sd card. We have already sent an affected device to Adam Outler for UBM, and it wasn't recoverable. Since yesterday, I have been asking people for their fwrev, and they all seem to be 0x0. Mine is also 0x0, but I don't have the issue. We have also had cases where people reboot and everything they once lost is back in place, or at least the partitions are accessible. Is there anything we can do to investigate this further on our devices? We aren't very well informed of the information discussed in your thread, and how it's relevant to us.

These are our threads
http://xdaforums.com/showthread.php?t=1447303
http://xdaforums.com/showthread.php?t=1649613

The eMMC models - at least, those posted so far - don't match the ones that were identified in this discussion so it would be a guess at best if these were included at the moment. I'll try to dig more into those threads when I have a chance - saw one was 60 pages long.

sfhub · May 21, 2012

Robotu said:
I understand what all of you are saying, it is very specific and comprehensiv, and maybe if i would have had enough time to read the entire thread i would have come to the same conclusion, but i read it only a few posts (in so many threads) cause trying to understand/fix this phone already charged me more time/money than changing the motherboard... So i am trying to end this adventure cause i am losing more money like this. So please, related to the few posts above, and more specific, what is your frankly and directly diagnostic and recommendation for this situation, i would like to here a few profy and straightly opinions?
Mi device still can do anything else, CWM, DM, adb....., except it can not boot.
Considering what garwynn seize, even if samsung and google will give you more info about firmware and stuff..., even so, the only solution for mine or other Note in the same situation is to replace motherboard, or after you (devs) will have that informations (if ever) you can (and/or you will) certainly find a solution for devices in that stage?
Thanks in advance!

If it were me, I would just get the m/b replaced. Things may crop up later you don't realize at first with these workarounds and I'm actually not that confident we'll even see something that can repair a damaged unit anytime soon. I think a realistic goal is just to get some changes made so it doesn't happen in the future, but that wouldn't help an already damaged unit.

Entropy512 · May 21, 2012

garwynn said:
The eMMC models - at least, those posted so far - don't match the ones that were identified in this discussion so it would be a guess at best if these were included at the moment. I'll try to dig more into those threads when I have a chance - saw one was 60 pages long.

The original GalaxyS series has a completely different flash architecture than Exynos devices. They don't even use eMMC, they use oneNAND instead.

Whatever issue they are encountering is something completely different.

garwynn · May 21, 2012

Entropy512 said:
The original GalaxyS series has a completely different flash architecture than Exynos devices. They don't even use eMMC, they use oneNAND instead.

Whatever issue they are encountering is something completely different.

Thanks - explains some of the information being posted.

Discussion thread for /data EMMC lockup/corruption bug

Senior Member

Retired Forum Mod / Inactive Recognized Developer

Senior Member

Senior Member

Retired Forum Mod / Inactive Recognized Developer

Senior Member

Retired Forum Mod / Inactive Recognized Developer

Senior Member

Senior Member

Senior Member

Senior Member

Retired Forum Mod / Inactive Recognized Developer

Senior Member

Retired Forum Mod / Inactive Recognized Developer

Senior Recognized Developer

Senior Member

Retired Forum Mod / Inactive Recognized Developer

Senior Member

Senior Recognized Developer

Retired Forum Mod / Inactive Recognized Developer

Similar threads

Top Liked Posts