Discussion thread for /data EMMC lockup/corruption bug

Search This thread

jgaviota

Senior Member
Jan 4, 2008
387
65
Most probably he won't be allowed to give its the firmware fix, and we cannot apply it because we don't have fastboot, so the follow up question must be:

What is the command to wipe or reset the emmc?

Can it be applied by JTag as we don't have fastboot?

Is there any public documentation available?


Here is where an UnbrickableMod would help, as I recall Adam can use lower level commands from his bootloaders
 

garwynn

Retired Forum Mod / Inactive Recognized Developer
Jul 30, 2011
5,179
8,589
NE Ohio
www.extra-life.org
The point is that this can be a solution, and a good one. He did what he could. He do not have enough knowledge, and i do not enough, he have been guided through forum remotely by another guy, wich maybe himself could not be enough skilfull. The point is that the guys that can do that, should give some credit to this method and so 99% could be improved by skillful ones in a matter of days. Ive been trying few days ago to gain your attention (the devs) with this theory, but ive been ignored. Mybe you thought that is only the pain of loss.... Ha, ha, ha...! That it will pass and after that i will understand that the only solutin is replacing the MOBO....... Well, eventualy i will, but not so soon. I am not a linux programer, but yet, in mi had do not have enough space to swallow an issue that is not logic...

Actually, I went over those posts - and it seems to be an attempt to do on the Note what drnull did on the E4GT - repartition to avoid the corrupted area. It hasn't been long term tested nor does it fix the underlying problem which we know now. I'm glad that someone read the potential workaround but it's not a fix - and why at least the E4GT community moved on past it.

We have reason to believe a fix (prior to bricking) does exist but is not available to the public. There are even cases where it may be able to save a bricked device without having to replace the mb - but we need more info on it. Oddly enough, AT&T's Note (I believe the D717 is the Note) have the 0x25 firmware on there and so they're not susceptible to the "superbrick". Anyone that has connections at AT&T might want to ask and see if they have the fix...

So please understand this is why it is going a different direction - your potential fix is several months old and was concluded to be a "bandage" at best, not a "cure."

---------- Post added at 03:16 PM ---------- Previous post was at 03:09 PM ----------

Most probably he won't be allowed to give its the firmware fix, and we cannot apply it because we don't have fastboot, so the follow up question must be:

What is the command to wipe or reset the emmc?

Can it be applied by JTag as we don't have fastboot?

Is there any public documentation available?

I'm trying every avenue I can to get it released - We're waiting on a reply from Mr. Sumrall and separately from Samsung, and I'm going to pass this info on to Sprint in about an hour. It doesn't even have to get to the community - if it's code and it can just be added to the ICS build I'd be happy with that as a solution.

Regarding the follow up questions:

1) Find the difference between the global build that doesn't brick and the ones that do - and then apply to all builds for affected devices. I believe Entropy has done a lot of legwork on that already and has a good idea on what it may be.

2) Sounds like no - but we really need the patch code itself to send to someone and have them check. I'm holding out hope that it will work...

3) Very little. Most of the legwork done on this problem so far has been by scouring the internet and available Android code... but even that didn't answer everything.
 

Robotu

Senior Member
Apr 24, 2010
156
21
45
Bucharest
Actually, I went over those posts - and it seems to be an attempt to do on the Note what drnull did on the E4GT - repartition to avoid the corrupted area. It hasn't been long term tested nor does it fix the underlying problem which we know now. I'm glad that someone read the potential workaround but it's not a fix - and why at least the E4GT community moved on past it.

We have reason to believe a fix (prior to bricking) does exist but is not available to the public. There are even cases where it may be able to save a bricked device without having to replace the mb - but we need more info on it. Oddly enough, AT&T's Note (I believe the D717 is the Note) have the 0x25 firmware on there and so they're not susceptible to the "superbrick". Anyone that has connections at AT&T might want to ask and see if they have the fix...

So please understand this is why it is going a different direction - your potential fix is several months old and was concluded to be a "bandage" at best, not a "cure."

Well, this is another thing, i am not a developer nor a programer but i understand almost everything, i have a virus called IT, whatever that means. So in general i give up only when i understand why is that, what is and what is not. Most of the things come to me through intuition, so what you just said already was in my mind for a few days by connecting all the info known and unknown, that are all over around as... And here i am not joking.
My questions are as follow:
1) I do not think that the emmc is hardware bricked, just it can not allow acces to it (maybe because, as a russian guy said, of the 32 kb of zeros wrighten wich can not be read it anymore...), that if it can be formated, at a low level with the right tools, it can be revived; or changing the firmware so the controller could wipe and rewright everything....
2) If yet the emmc is hardware bricked, i think it can be repartitioned, eliminating that section that is corupted; that means finding the exact place where the first bad sector starts and where the last bad sector end, livit blank, non formatted, so the controller never read that section, and moving all the partition in the safe area, i think is need it a lot work for somebody to convince me that non of this theory are not possible...

I do not have enough english to say everything but yet.........

Sent from my SGS2 HD LTE SHV-E120L
 
Last edited:

mobilevil

Senior Member
Nov 3, 2010
113
13
Hong Kong
even if this fix can make the system boot,

imho sooner or later the wear leveler will try to use the corrupted area again and freeze
 

garwynn

Retired Forum Mod / Inactive Recognized Developer
Jul 30, 2011
5,179
8,589
NE Ohio
www.extra-life.org
My questions are as follow:
1) I do not think that the emmc is hardware bricked, just it can not allow acces to it (maybe because, as a russian guy said, of the 32 kb of zeros wrighten wich can not be read it anymore...), that if it can be formated, at a low level with the right tools, it can be revived; or changing the firmware so the controller could wipe and rewright everything....

2) If yet the emmc is hardware bricked, i think it can be repartitioned, eliminating that section that is corupted; that means finding the exact place where the first bad sector starts and where the last bad sector end, livit blank, non formatted, so the controller never read that section, and moving all the partition in the safe area, i think is need it a lot work for somebody to convince me that non of this theory are not possible...

I do not have enough english to say everything but yet.........

No worries - you're asking the same questions that have been tossed around here for quite a while. I too was a big supporter of the "it's not firmware, it's software..." side of the story until we got some significant information from Mr. Sumrall of the Android team.

There is a misunderstanding - which I also believed until it was proven otherwise - that the 32 Kb of zeros was somehow corrupting the file system to cause the "superbrick." It didn't make sense to anyone... now we know why. I'll repost Mr. Sumrall's summary here:

Ken Sumrall (Android Kernel Team) said:
Firmware rev 0x19 has a bug where the emmc chip can lockup after an erase command is given. Not every time, but often enough. Usually, the device can reboot after this, but then lockup during the boot process. *Very rarely, it can lockup even before fastboot is loaded. Your tester was unlucky. *Since you can't even start fastboot, the device is probably bricked. :-( If he could run fastboot, then the device could probably be recovered with the firmware update code I have, assuming I can share it. *I'll ask.

A locking issue means that we're beyond just file system at this point and back to the embedded controller. If the controller causes a lock up even by trying to access that memory you can't even flag the sector(s) as bad - which is why drnull's solution worked of just repartitioning and avoiding it altogether. There are potential problems when flashing or using ODIN to put something else on as it may try to restore that area and then cause the device to lock again, so flashing would be out. It would also be susceptible to the same problem (in theory) so one could end up corrupting the data system even further... that's bad news.

Possible? Drnull showed it was. Good idea? If you're in a situation where you need a device until you can replace it in the short term, yes. Long term - not recommended.

I'm by no means an "expert" in Android or these devices - many out there have much higher qualifications than I'll probably ever see. But I'm willing to spitball ideas like you are - as well as being open to listening to those more experienced (and honestly probably smarter) than me. It's how I learn as well and improve. And as long as we move the conversation forward I doubt anyone will go against you for doing so. Just please be willing to understand their position as well and why they're coming to these conclusions.
 

Robotu

Senior Member
Apr 24, 2010
156
21
45
Bucharest
No worries - you're asking the same questions that have been tossed around here for quite a while. I too was a big supporter of the "it's not firmware, it's software..." side of the story until we got some significant information from Mr. Sumrall of the Android team.

There is a misunderstanding - which I also believed until it was proven otherwise - that the 32 Kb of zeros was somehow corrupting the file system to cause the "superbrick." It didn't make sense to anyone... now we know why. I'll repost Mr. Sumrall's summary here:



A locking issue means that we're beyond just file system at this point and back to the embedded controller. If the controller causes a lock up even by trying to access that memory you can't even flag the sector(s) as bad - which is why drnull's solution worked of just repartitioning and avoiding it altogether. There are potential problems when flashing or using ODIN to put something else on as it may try to restore that area and then cause the device to lock again, so flashing would be out. It would also be susceptible to the same problem (in theory) so one could end up corrupting the data system even further... that's bad news.

Possible? Drnull showed it was. Good idea? If you're in a situation where you need a device until you can replace it in the short term, yes. Long term - not recommended.

I'm by no means an "expert" in Android or these devices - many out there have much higher qualifications than I'll probably ever see. But I'm willing to spitball ideas like you are - as well as being open to listening to those more experienced (and honestly probably smarter) than me. It's how I learn as well and improve. And as long as we move the conversation forward I doubt anyone will go against you for doing so. Just please be willing to understand their position as well and why they're coming to these conclusions.

Thank you, very much! I do not know you but i already like you, mostly for your pragmatism...

Out of shame question, who is mister Sumrall, and what is his position, cause i guess he is some kind of Samsung employe...?

Sent from my SGS2 HD LTE SHV-E120L
 

garwynn

Retired Forum Mod / Inactive Recognized Developer
Jul 30, 2011
5,179
8,589
NE Ohio
www.extra-life.org
Thank you, very much! I do not know you but i already like you, mostly for your pragmatism...

Out of shame question, who is mister Sumrall, and what is his position, cause i guess he is some kind of Samsung employe...?

Sent from my SGS2 HD LTE SHV-E120L

No shame in asking a question. I know he's a programmer for the Android kernel (works for Google) - a quick look at his Linkedin profile indicates he's been there for a few years but been in the industry for a LONG time. I contacted him in particular as he was the one who checked in the source for the original bug fix we were investigating - which meant he had to have enough knowledge on the issue to at least sign off on the code change.

One thing that I have to give to most developers - they're willing to share their knowledge so long as they have time and it doesn't violate any Non-Disclosure Agreements that may be in place. I also think they make great teachers/professors based on my experiences; I've even thought about it when I get much older but still want to share what I've learned.
 
  • Like
Reactions: Robotu

Robotu

Senior Member
Apr 24, 2010
156
21
45
Bucharest
Guys, another guy followed the instructions of forest and he have a fully functional device, camera is working, video recording, voice calls, as far he is saying everything is fine...
 

sfhub

Senior Member
Oct 23, 2008
5,350
7,231
The point is that this can be a solution, and a good one. He did what he could. He do not have enough knowledge, and i do not enough, he have been guided through forum remotely by another guy, wich maybe himself could not be enough skilfull. The point is that the guys that can do that, should give some credit to this method and so 99% could be improved by skillful ones in a matter of days. Ive been trying few days ago to gain your attention (the devs) with this theory, but ive been ignored. Mybe you thought that is only the pain of loss.... Ha, ha, ha...! That it will pass and after that i will understand that the only solutin is replacing the MOBO....... Well, eventualy i will, but not so soon. I am not a linux programer, but yet, in mi had do not have enough space to swallow an issue that is not logic...
It has already been known that you can change the partition tables to work around the bad area. We did that early on (ie months ago) in the odin hangs on data.img thread. The issue is so much of /data was unwritable that we effectively ended up with a few mb of /data. I'm sure we can create kernels that remap /data to the external SD if we wanted to, but I don't consider a real "solution" as you won't be able to get any upgrades. You'll have to always wait for someone to give you custom kernels to workaround the problem.
 
  • Like
Reactions: Robotu

sfhub

Senior Member
Oct 23, 2008
5,350
7,231
Garwynn, not sure if this is in your followup questions, but I would like to make sure Samsung and Google are aware that whatever bugs are in the EMMC firmware, they don't appear to be triggered by the sequence of EMMC commands issued in the GB kernel. They can theoretically avoid all these problems if they spend the time to investigate what changed between GB and ICS for the code path generated by wipe data/factory reset in recovery.

I think most of us are aware of this on this thread, but it may not have been obvious to Samsung and Google because we have been asking specific questions about a fix that got checked in rather than spending time describing the history of the problem we are seeing.
 

Robotu

Senior Member
Apr 24, 2010
156
21
45
Bucharest
It has already been known that you can change the partition tables to work around the bad area. We did that early on (ie months ago) in the odin hangs on data.img thread. The issue is so much of /data was unwritable that we effectively ended up with a few mb of /data. I'm sure we can create kernels that remap /data to the external SD if we wanted to, but I don't consider a real "solution" as you won't be able to get any upgrades. You'll have to always wait for someone to give you custom kernels to workaround the problem.

I understand what all of you are saying, it is very specific and comprehensiv, and maybe if i would have had enough time to read the entire thread i would have come to the same conclusion, but i read it only a few posts (in so many threads) cause trying to understand/fix this phone already charged me more time/money than changing the motherboard... So i am trying to end this adventure cause i am losing more money like this. So please, related to the few posts above, and more specific, what is your frankly and directly diagnostic and recommendation for this situation, i would like to here a few profy and straightly opinions?
Mi device still can do anything else, CWM, DM, adb....., except it can not boot.
Considering what garwynn seize, even if samsung and google will give you more info about firmware and stuff..., even so, the only solution for mine or other Note in the same situation is to replace motherboard, or after you (devs) will have that informations (if ever) you can (and/or you will) certainly find a solution for devices in that stage?
Thanks in advance!
 

garwynn

Retired Forum Mod / Inactive Recognized Developer
Jul 30, 2011
5,179
8,589
NE Ohio
www.extra-life.org
Garwynn, not sure if this is in your followup questions, but I would like to make sure Samsung and Google are aware that whatever bugs are in the EMMC firmware, they don't appear to be triggered by the sequence of EMMC commands issued in the GB kernel. They can theoretically avoid all these problems if they spend the time to investigate what changed between GB and ICS for the code path generated by wipe data/factory reset in recovery.

I think most of us are aware of this on this thread, but it may not have been obvious to Samsung and Google because we have been asking specific questions about a fix that got checked in rather than spending time describing the history of the problem we are seeing.

This is the #1 question that I'm going to add. I just want to go over Entropy's notes and see what he's noticed already and then sum it up so we can hopefully point in a direction without telling a long story. I'm still waiting for a response from Samsung and I left a VM for the person at Sprint - I'll be passing this on to them as well. (From Sprint's angle using the GB code is a good solution if they want to go OTA - patching the eMMC would probably be better in the long term if possible.)

I'll give it until tomorrow around 9 am CDT in case anyone else has some lingering questions - then send it off. Figured it was better to make sure everyone gets a chance to catch up after the weekend before doing so.
 

jgaviota

Senior Member
Jan 4, 2008
387
65
Sorry garwynn, I did not make myself clear...

I'm trying every avenue I can to get it released - We're waiting on a reply from Mr. Sumrall and separately from Samsung, and I'm going to pass this info on to Sprint in about an hour. It doesn't even have to get to the community - if it's code and it can just be added to the ICS build I'd be happy with that as a solution.

Regarding the follow up questions:

Most probably he won't be allowed to give its the firmware fix, and we cannot apply it because we don't have fastboot, so the follow up question must be:

What is the command to wipe or reset the emmc?

Can it be applied by JTag as we don't have fastboot?

Is there any public documentation available?


Here is where an UnbrickableMod would help, as I recall Adam can use lower level commands from his bootloaders


1) Find the difference between the global build that doesn't brick and the ones that do - and then apply to all builds for affected devices. I believe Entropy has done a lot of legwork on that already and has a good idea on what it may be.

2) Sounds like no - but we really need the patch code itself to send to someone and have them check. I'm holding out hope that it will work...

3) Very little. Most of the legwork done on this problem so far has been by scouring the internet and available Android code... but even that didn't answer everything.

Please relay those questions to Mr. Sumrall

1) Doesn't have to do with Android source, but actual commands sent to the emmc to reset it.
2 and 3) He should have access to that documentation or know some way to get the documentation.

Also ask him if there is documentation for the development board of the SGSII or Note

Again: with the emmc documentation a fix can be made, ask him if there is some way to get it, even if he cannot release the firmware upgrade
 

garwynn

Retired Forum Mod / Inactive Recognized Developer
Jul 30, 2011
5,179
8,589
NE Ohio
www.extra-life.org
Sorry garwynn, I did not make myself clear...



Please relay those questions to Mr. Sumrall

1) Doesn't have to do with Android source, but actual commands sent to the emmc to reset it.
2 and 3) He should have access to that documentation or know some way to get the documentation.

Also ask him if there is documentation for the development board of the SGSII or Note

Again: with the emmc documentation a fix can be made, ask him if there is some way to get it, even if he cannot release the firmware upgrade

Got it, sorry for any misunderstanding.

Sent from my SPH-D710 using XDA
 

Entropy512

Senior Recognized Developer
Aug 31, 2007
14,088
25,086
Owego, NY
This is the #1 question that I'm going to add. I just want to go over Entropy's notes and see what he's noticed already and then sum it up so we can hopefully point in a direction without telling a long story. I'm still waiting for a response from Samsung and I left a VM for the person at Sprint - I'll be passing this on to them as well. (From Sprint's angle using the GB code is a good solution if they want to go OTA - patching the eMMC would probably be better in the long term if possible.)

I'll give it until tomorrow around 9 am CDT in case anyone else has some lingering questions - then send it off. Figured it was better to make sure everyone gets a chance to catch up after the weekend before doing so.

What I've been able to figure out so far:
The main difference between I9100 update4 source and SHW-M250S Update5 source is that in I9100 update4, MMC_CAP_ERASE is not enabled in either the MSHCI drivers or SDHCI drivers. The MSHCI driver is the driver for our internal storage.

Even before Mr. Sumrall's responses, based on the fact this usually happened during a wipe/format, I suspected it was MMC_CAP_ERASE.

However, in Gingerbread kernels for I777/I9100, the MSHCI driver also has MMC_CAP_ERASE enabled. The driver does appear to be heavily customized with Samsung code, however. So I'm kind of baffled as to why MMC_CAP_ERASE seems safe on Gingerbread - Either the customization in the driver works around fwrev 0x19's bug, OR there's something elsewhere in the kernel that causes ERASE commands not to get fired at the chip even though it's enabled in the MSHCI driver.

The fact that Gingerbread was safe caused me to move away from MMC_CAP_ERASE until Mr. Sumrall's response regarding the ERASE bug in 0x19.

Either way, I believe that removing the MMC_CAP_ERASE from the MSHCI driver will render it safe. I'm just curious as to why Gingerbread isn't dangerous despite having this feature flagged as enabled in the MSHCI driver.

What I know for sure:
GT-I9100 Update3 - Gingerbread - MMC_CAP_ERASE enabled but still somehow safe
GT-I9100 Update4 - ICS - MMC_CAP_ERASE disabled and safe
SHW-M250S Update5 - ICS - MMC_CAP_ERASE enabled and dangerous

What we really need at this point is a way for the JTAG services to do a full reset of the chip - Even if they can't update the firmware past 0x19, as long as they can recover the chip to a normal working state, then people can have damaged devices repaired for $30-50 instead of $200+.
 
  • Like
Reactions: sfhub and garwynn

RootTheMachine

Senior Member
Oct 4, 2011
2,088
533
I think this explains the issue we are having on original Galaxy S phones (i9000, Captivate, Vibrant etc.) and other devices & tablets that are getting the ICS "Encryption Unsuccessful" error (having nothing to do with the use of encryption), and in the process, losing /data and the internal sd card. We have already sent an affected device to Adam Outler for UBM, and it wasn't recoverable. Since yesterday, I have been asking people for their fwrev, and they all seem to be 0x0. Mine is also 0x0, but I don't have the issue. We have also had cases where people reboot and everything they once lost is back in place, or at least the partitions are accessible. Is there anything we can do to investigate this further on our devices? We aren't very well informed of the information discussed in your thread, and how it's relevant to us.


These are our threads
http://xdaforums.com/showthread.php?t=1447303
http://xdaforums.com/showthread.php?t=1649613
 

garwynn

Retired Forum Mod / Inactive Recognized Developer
Jul 30, 2011
5,179
8,589
NE Ohio
www.extra-life.org
I think this explains the issue we are having on original Galaxy S phones (i9000, Captivate, Vibrant etc.) and other devices & tablets that are getting the ICS "Encryption Unsuccessful" error (having nothing to do with the use of encryption), and in the process, losing /data and the internal sd card. We have already sent an affected device to Adam Outler for UBM, and it wasn't recoverable. Since yesterday, I have been asking people for their fwrev, and they all seem to be 0x0. Mine is also 0x0, but I don't have the issue. We have also had cases where people reboot and everything they once lost is back in place, or at least the partitions are accessible. Is there anything we can do to investigate this further on our devices? We aren't very well informed of the information discussed in your thread, and how it's relevant to us.


These are our threads
http://xdaforums.com/showthread.php?t=1447303
http://xdaforums.com/showthread.php?t=1649613

The eMMC models - at least, those posted so far - don't match the ones that were identified in this discussion so it would be a guess at best if these were included at the moment. I'll try to dig more into those threads when I have a chance - saw one was 60 pages long.
 

sfhub

Senior Member
Oct 23, 2008
5,350
7,231
I understand what all of you are saying, it is very specific and comprehensiv, and maybe if i would have had enough time to read the entire thread i would have come to the same conclusion, but i read it only a few posts (in so many threads) cause trying to understand/fix this phone already charged me more time/money than changing the motherboard... So i am trying to end this adventure cause i am losing more money like this. So please, related to the few posts above, and more specific, what is your frankly and directly diagnostic and recommendation for this situation, i would like to here a few profy and straightly opinions?
Mi device still can do anything else, CWM, DM, adb....., except it can not boot.
Considering what garwynn seize, even if samsung and google will give you more info about firmware and stuff..., even so, the only solution for mine or other Note in the same situation is to replace motherboard, or after you (devs) will have that informations (if ever) you can (and/or you will) certainly find a solution for devices in that stage?
Thanks in advance!
If it were me, I would just get the m/b replaced. Things may crop up later you don't realize at first with these workarounds and I'm actually not that confident we'll even see something that can repair a damaged unit anytime soon. I think a realistic goal is just to get some changes made so it doesn't happen in the future, but that wouldn't help an already damaged unit.
 
  • Like
Reactions: Robotu and garwynn

Entropy512

Senior Recognized Developer
Aug 31, 2007
14,088
25,086
Owego, NY
The eMMC models - at least, those posted so far - don't match the ones that were identified in this discussion so it would be a guess at best if these were included at the moment. I'll try to dig more into those threads when I have a chance - saw one was 60 pages long.

The original GalaxyS series has a completely different flash architecture than Exynos devices. They don't even use eMMC, they use oneNAND instead.

Whatever issue they are encountering is something completely different.
 

Top Liked Posts

  • There are no posts matching your filters.
  • 70
    This post will evolve over time as more info is found:

    Latest Updates
    6/24 Custom PIT repartition workaround posted by Jaymoon.
    You lose 8GB (some of which might be further recoverable with extra work) and restores using the original PIT will lockup your phone again (a scenario that could happen if you brought your phone back to Sprint for some unrelated problem) so if you have the opportunity to get your phone replaced with little to no cost, IMO that should be your primary option.
    http://xdaforums.com/showthread.php?p=27852689#post27852689

    E4GT specific PIT file here (theoretically instead of losing 8GB, you'll only lose 2GB):
    http://xdaforums.com/showpost.php?p=28070569&postcount=654

    6/8 Update for other platforms waiting for fix
    Codeworkx's contact with Samsung got following response [discussion]
    Update 14:56 CEST:
    Patches will be out in form of new official ROMs and also sourcecode releases after testing, which might take some time.

    6/7 Update
    Test plan posted - see bottom of post for results so far (esoteric68, krazy_smokezalot report success)
    BIG THANKS to Esoteric68 (and robertm2011 before her) who took the plunge to benefit everyone else. She has completed the test plan and more. 6 flashes of CM9, 3 flashes of AOKP, 3 wipe data/factory resets, and 3 nandroid restores, 1 stock FF02 flash, all successful. We are ready to have more testers try out the test ROM installs. We are getting more confident the code analysis was correct.

    6/2 Update
    Less technical summary and preparation for new round of testing

    5/31 Lots of discussion on the code path detailing how the problem occurs and where to put the workaround, select posts below
    Call trace for CWM Recovery - wipe data/factory reset
    Call trace for CWM Recovery - restore
    Section of update-binary afflicted by same issue as wipe data/factory reset
    Recap of where workarounds can be placed
    MD5s of various update-binary executables
    Pros/Cons of placing workaround in kernel vs libext4_utils.a
    Are ICS nandroid backup/restores safe?
    Are ICS recoveries safe?
    Why do CM9/AOKP installs often brick in ICS but not in GB?

    5/24 Update pretty much ties up all the loose ends - Thanks Mr. Sumrall, Garwynn, Entropy, and everyone else who pitched in!
    http://xdaforums.com/showthread.php?p=26521643#post26521643

    Potentially very GOOD NEWS
    It appears Sprint/Samsung tested the EMMC brick issue, confirmed the problem, and tested a fix that appears to resolve the problem:
    http://xdaforums.com/showthread.php?p=26465085#post26465085
    To clarify this...in testing done over the weekend, there was a small "subtest" group which consisted of 20 devices. This group was put together STRICTLY for the propose of testing the emmc bug and fix. The devices were all programmed with the data known to have cause bricks when wiping. Of those 20, all but 6 also had the code patch to resolve that issue, so there was a possibility for 6 hard bricks, only 4 actually bricked, therefore, on the build currently being tested, the "emmc break issue" has been deemed "resolved"



    We now have an update on why this bug is happening and which PRV/fwrevs are affected. PRV/fwrev 0x19 are susceptible to the EMMC /data corruption issue (which should now be referred to as EMMC lockup issue). PRV/fwrev 0x25 has the fix for the lockup issue but has a separate 32KB of zeros data corruption issue, which is being patched in the kernel (our kernels don't have that patch). All these problems are in the EMMC firmware. It can potentially be updated, but nothing is publicly available. EMMC lockup issue is triggered on erasing the EMMC. The only piece we have not been able to explain is why GB-based kernels seem immune to the EMMC lockup problem whereas ICS seems more susceptible to the problem. Presumably both are doing ERASE commands, but possibly in slightly different ways. See these posts for more details [#1 / #2]

    To get your PRV/fwrev, you can use this (if you have busybox installed):
    shell@android:/ $ su
    shell@android:/ # cd /sys/class/block/mmcblk0/device

    shell@android:/sys/class/block/mmcblk0/device # cat cid | cut -b 19,20
    19
    If you don't have busybox installed just visually parse the line, match the serial # (0xd3f24fe6 - example only - yours will be different) with the cid, and look at the 2 numbers before the serial #.
    shell@android:/ $ su
    shell@android:/ # cd /sys/class/block/mmcblk0/device

    shell@android:/sys/class/block/mmcblk0/device # cat serial cid
    0xd3f24fe6
    1501004d414734464119d3f24fe68e8b

    It appears after looking at the code more closely and examining the results of the card info dumps, we do not have this fix in our kernel. It isn't clear whether the fix would resolve our /data EMMC brick issues, but the point is moot right now because we don't have the fix.

    Possible BRICK here. Please do NOT do any more testing until further notice. Please do NOT use Wipe Data/Factory Reset. It is the main difference between first and 2nd round of testing and is the current suspect

    FE10 repacks added to Resource section

    Esoteric68, azyouthinkeyeiz, and robertm2011 are testing flashing different ROMs with FE07/FE10 repacked with unlocked recovery. We all owe them our thanks for risking their phones to help the community (taking one for the team) No bricks so far.

    Separately we are still discussing whether the fix Samsung checked in will get applied to our phone. No firm conclusions yet. Even if it doesn't apply, the hope is the data we get from testing will help us produce more flexible "safe" flashing practices.

    Please do NOT test CWM Touch for now. We want to isolate just the FE07 kernel and unlocked stock recovery before introducing new variables.


    Executive Summary
    Garwynn has found a recent checkin from Samsung in the kernel code handling EMMC memory that fixes a data corruption problem. It is possible this might fix the /data EMMC corruption we have been seeing, but we aren't sure if it is fixing the same problem. The first release to include that checked in code is FE07. There has been some communication with the developers in charge of that area to gather further info.

    This thread's purpose is to foster discussion on the issue and to determine if the potential fix actually does fix our issue. Even if the fix doesn't address the issue, it is hoped in the process we are able to gather more info into specific "safe" and "unsafe" scenarios.

    Please do NOT jump ahead and think it is fixed. It is TOO EARLY to make that claim.


    Background
    As many of you are aware since ICS has come out, there has been a nagging issue where in some situations flashing ROMs with an ICS-based kernel and custom recovery has left the phone with EMMC corruption. This EMMC corruption is so far non-recoverable, even with JTAG bit blasting, which should bypass all but hardware issues.

    This problem is NOT limited to the Epic 4G Touch. Other GS2 models as well as Galaxy Note are experiencing the same thing as can be seen by this Public Service Announcement in the Galaxy Note section.

    The problem first cropped up when people used ROM Manager to temporary "fake" flash CWM Touch onto an ICS-based kernel to do their flashing needs. In particular wipe data/factory reset seemed to often trigger the /data EMMC corruption. However later we found it wasn't limited to just CWM Touch and temporary flashing as CWM repacks with the ICS-based kernel also exhibited that behavior, albeit not as often.

    Even more frustrating is that this bug is not always deterministic, in that you could do some operation 3 times and have it work fine, then on the 4th, trigger the /data EMMC corruption.

    Complicating the testing/debugging is the issue that once the problem is triggered, your phone is basically not recoverable. You can try and ODIN a stock ROM on top which will basically work for all the components except the /data partition. Once it reaches the /data partition, ODIN will hang. Similarly if you try and wipe data/factory reset, it will hang or timeout after a while. Attempts to repartition and reformat using ODIN have not changed this behavior. Attempts to edit the partition info manually have not been successful. JTAG bit blasting has not been successful.

    You can read about the past experiences in the Stuck at "Data.img" thru odin thread. By the time you get to ODIN, the damage to /data EMMC is already done. ODIN is NOT causing the damage. ODIN is hanging on data.img because the hardware won't let it write successfully to that area of EMMC.

    This has led to many custom ROMs giving special procedures to go back to a GB-based kernel repacked with CWM recovery to do all your flashing (EL26+CWM). It is also the motivation for the How Not To Brick Your E4GT thread.


    Details
    The code checkin that has piqued our interest is in regards to data corruption caused by problem in the wear-level firmware code of the emmc. This is low-level code that runs on a processor in the emmc module. It basically tries to spread out the data writes so you get an even distribution of writes so as any one section of emmc memory does not get worn out prematurely. This code apparently can corrupt data by writing 32KB of incorrect data under some situations.

    https://bitbucket.org/franciscofranco/android-tuna-omap/changeset/cea631bdac53

    The code appears to restrict the firmware fix to only certain "affected" emmc modules. Also it is not able to persistently/permanently patch the firmware so this code must run at each startup. The following modules were identified in the code:

    Name: VYL00M
    HwRev: 0x0
    FwRev: 0x25

    Name: KYL00M
    HwRev: 0x0
    FwRev: 0x25

    Name: MAG4FA
    HwRev: 0x0
    FwRev: 0x25

    Unfortunately during ad-hoc polling we have found a case of an EMMC /data bricked phone with fwrev 0x0, so either we are not understanding what Samsung's fix is doing or they may not have addressed the full scope of the problem. Do NOT assume if your fwrev is 0x0 you are safe.

    At this point, this does NOT mean the fix is not applicable. We might be looking at the wrong data. The kernel might not be exporting the data to us. The fix might need to be expanded to more modules. The fix could be for something else entirely but we might be able to avoid the bug anyway using stock recovery.

    To determine what version you have (keep in mind we are at the preliminary stage, so this info might not be the right info to collect or could be meaningless for the /data EMMC corruption issue)
    shell@android:/ $ su
    shell@android:/ # cd /sys/class/block/mmcblk0/device

    shell@android:/sys/class/block/mmcblk0/device # cat name hwrev fwrev manfid oemid date type serial cid
    MAG4FA
    0x0
    0x0
    0x000015
    0x0100
    08/2011
    MMC
    0xd3f24fe6
    1501004d414734464119d3f24fe68e8b

    The comments for the code checkin give the following info:
    /*
    * There is a bug in some Samsung emmc chips where the wear leveling
    * code can insert 32 Kbytes of zeros into the storage. We can patch
    * the firmware in such chips each time they are powered on to prevent
    * the bug from occurring. Only apply this patch to a particular
    * revision of the firmware of the specified chips. Date doesn't
    * matter, so include all possible dates in min and max fields.
    */

    The critical piece of code appears to be the following:
    Code:
    	/* set value 0x000000FF : It's hidden data
    	 * When in vendor command mode, the erase command is used to
    	 * patch the firmware in the internal sram.
    	 */
    	err = mmc_movi_erase_cmd(card, 0x0004DD9C, 0x000000FF);
    	if (err) {
    		pr_err("Fail to Set WL value1\n");
    		goto err_set_wl;
    	}
    	/* set value 0xD20228FF : It's hidden data */
    	err = mmc_movi_erase_cmd(card, 0x000379A4, 0xD20228FF);
    	if (err) {
    		pr_err("Fail to Set WL value2\n");
    		goto err_set_wl;
    	}

    Action items
    At this point we would like to

    1) gather more info on which emmc modules folks have and see if we can detect any patterns, so if you could post your EMMC info and optionally include whether you have the ability to do testing (presumably because you have a way to replace your phone if it is damaged)

    2) solicit one volunteer to try different flashing scenarios using the unlocked stock recovery and FE07 kernel repack (bigpeng indicated earlier he would be willing to do this for the community, but that was before the fwrev info, so he might have had a false sense of security, so no pressure on him if he changed his mind)

    If we find that the volunteer does not see any corruption despite trying to do so, then we can expand testing to a few more people and also work on getting CWM repacks.

    If the volunteer hits the bug, then we will know the issue is still there even with stock recovery and FE07 kernel.

    Keep in mind, at some point someone will need to take one for the team or we will be forever in fear of bricking our phones using ICS-based kernels.


    Resources

    1) FE07-based repacks
    Unlocked Recovery Only [update.zip / tar]
    Plus (unlocked recovery, init.d, adb-root) [update.zip / tar]

    2) FE10-based repacks
    Unlocked Recovery Only [update.zip / tar]
    Plus (unlocked recovery, init.d, adb-root) [update.zip / tar]

    3) JEDEC eMMC documentation

    Related threads
    Galaxy Note CID investigation thread
    30
    I posted this less technical writeup on agat63's cwm repacked thread but figured it would be useful to have here also. I am working with CM9/AOKP to have their install scripts replace format("/system") with delete_recursive("/system") After that I think if we still have volunteers, we are ready to do some more testing.

    I'll provide more details when the pieces become available, but if you'd like to take one for the team and help test, please post. If our understanding of the problem is accurate, you should be safe this time around, but there is always the chance we are not understanding the problem completely.

    ====

    I would like to stress that the information I gave agat was based on code tracing and NOT based on real-world testing. You should treat this as a *testing period* to confirm the analysis. You MAY BRICK if the analysis is incorrect.

    The following is all assuming you are repacked with ICS kernel (ie we aren't talking about the GB kernel)...

    Background
    The nature of the problem is a call to the function make_ext4fs(). This function isn't provided by the kernel, rather it is provided as a library (libext4_utils.a) that is used when compiling Recovery and the update installer. It does end up eventually calling kernel mmc driver routines, which then trigger the EMMC firmware lockup/superbrick bug.

    The make_ext4fs() function changed between GB and ICS. In GB the function didn't try to erase the partition before creating the EXT4 fs. In ICS it tries to erase it first. The erase is triggering an EMMC firmware bug that was always there via the kernel MMC driver. GB is also "doubly" safe in that not only do Recovery and update-binary never attempt to do the erase, even if they did, the request to erase is blocked in the GB-kernel and never run.

    The EMMC firmware bug will lockup your phone and corrupt internal EMMC meta-data which cannot be accessed or repaired at this time. It isn't crashing your hard drive per-se, it is crashing your hard drive controller in a way that prevents the hard drive controller from accessing parts of your disk. We don't have any way of updating the EMMC hard drive controller at this time.

    The EMMC firmware lockup/superbrick bug is likely contained in the wear-level firmware code which shifts mmc-internal memory block usage around to prevent any one area from overuse. The bug MIGHT NOT be triggered every time, so you can do the same operation with no issues then on your Nth attempt it bricks.

    Details
    So what does that mean for you? There are 2 executables we are concerned with, Custom Recovery and the update installer (update-binary)

    Custom Recovery is responsible for 2 potential bricking points:
    1) wipe data/factory reset
    2) nandroid backup/restore

    These are both handled by the Recovery itself so if your Recovery is "safe" then these operations should be safe. The nandroid backup is safe regardless. Our concern is only for wipe data/factory reset and nandroid restore. Both of these make the call to make_ext4fs(), so if they are using the GB-based version they are safe. If they are using the ICS version, they are not safe (when used with ICS kernel) Agat has made the effort to make sure the recovery he has provided is compiled against GB CM7 source.

    You may ask what about Installing ROMs, you thought Recovery was responsible for that too?
    This is only partially true. You use the menu option in Recovery to choose to install an update.zip. Recovery is responsible for providing the location of the update.zip and verifying the signature, but when it comes to actually "installing" the update.zip, Recovery uses a "helper app" called update-binary contained in the update.zip.

    This update-binary helper app is responsible for running the Edify install script in the update.zip. It communicates with Recovery just to update the progress bar, output ui messages, and set up the updating of firmware. The rest of the script functions, it handles by itself directly, so Recovery isn't involved.

    update-binary also calls make_ext4fs() so it can also do potentially "unsafe" operations, just like we discussed for Recovery above. If the update-binary, that was included in the update.zip, was compiled using GB-sources, then it is "safe". If it was compiled against ICS sources then there is one function in the Edify script that can potentially cause bricking, format().

    To be clear, Recovery has no control over the update-binary that is included in the update.zip. Whomever built the ROM update.zip package made that decision. So this is why even with a "safe" Recovery, you can brick your phone installing ROMs (with an ICS kernel).

    Even if the Recovery is "safe", if you ask it to use an "unsafe" update-binary to install a ROM AND that ROM install script chose to do a format(), then the EMMC lockup/superbrick bug can be triggered.

    The reason why most stock-based ROMs don't brick in ICS is because
    1) most of them probably include a GB-based update-binary
    2) most of them are not performing a format() within their Edify updater-script

    So a ROM builder has 2 ways to make a ROM update.zip install "safe" to install in ICS. Either package a known GB-based update-binary OR eliminate format(), if present, from the Edify install script.

    So why does Calk's format_all seem to never brick even on ICS? Given the date on the update-binary and when he created the package, it is most likely using a GB-based update-binary

    So why does CM9/AOKP seem to brick more often than stock-based ICS ROM installs? The Edify install script for CM9/AOKP uses new functions that were introduced in the ICS update-binary. This in turn is why they bundle the ICS-based update-binary. They could still potentially be safe, but in the install script a format("/system") is performed. If that format is run under an ICS kernel it will trigger the EMMC firmware lockup/superbrick bug. Under a GB-kernel, the request to erase "/system" is blocked by the GB-kernel.

    What can CM9/AOKP do to make their installs "safe" to install in ICS? All they need to do is replace the format("system") with delete_recursive("/system"). They could also replace the ICS-based update-binary with a GB-based update-binary, but that would require more rewrites to the install script. Replacing the format() call is simpler/easier.

    Why are some superbricks blue-light specials and others only make ODIN hang at data.img? This likely has to do with whether you got your brick from

    1) the format() in the CM9/AOKP install
    2) restoring nandroid backup
    3) doing the wipe data/factory reset in Recovery

    The first two tend to be blue-light specials as they affect /system and/or kernel. The last one tends to affect /data and/or /cache.

    So how do you make sure you are totally safe?
    1) make sure you are using a "safe" recovery repacked with the stock ICS kernel. This is a Recovery that was compiled against GB-based libext4_utils.a (ie GB source) This will assure you that wipe data/factory reset and nandroid restores are safe
    2) whenever you install a ROM for the first time, verify EITHER
    a) the ROM install script is NOT performing any format() calls
    b) the ROM install has bundled a GB-based update-binary

    If neither 2a NOR 2b are true (ie ICS-based update-binary and install performs format) then you DO NOT want to flash that ROM in Recovery while on an ICS kernel. Flash that ROM on a GB-based kernel/recovery.

    Hope that clears things up, and once again, remember, this analysis is only based on tracing code. I may have made a mistake in the analysis or our understanding of the problem could be wrong. We will not be sure if all these statements hold UNTIL WE DO REAL-WORLD TESTING.
    19
    Morning Update

    Well, it's been some time but thankfully Mr. Sumrall from Android did get back to us on our questions. I think the community will find that this was worth the wait.

    Issue: fwrev not set properly.
    As we suspected the bugfix is not in our build. (The patch applies this unconditionally.)
    Ken Sumrall said:
    The patch includes a line in mmc.c setting fwrev to the rights bits from the cid register. Before this patch, the file /sys/class/block/mmcblk0/device/fwrev was not initialized from the CID for emmc devices rev 4 and greater, and thus showed zero.

    (On second inquiry)
    fwrev is zero until the patch is applied.

    Question: Revision didn't match the fix
    (Emphasis mine in red as it discusses the superbrick issue.)
    Ken Sumrall said:
    You probably have the bug, but rev 0x19 was a previous version of the firmware we had in our prototype devices, but we found it had another bug that if you issued an mmc erase command, it could screw up the data structures in the chip and lead to the device locking up until it was powered cycled. We discovered this when many of our developers were doing a fastboot erase userdata while we were developing ICS. So Samsung fixed the problem and moved to firmware revision 0x25. Yes, it is very annoying that 0x19 is decimal 25, and that led to lots of confusion when trying to diagnose emmc firmware issues. I finally learned to _ALWAYS_ refer to emmc version in hexadecimal, and precede the number with 0x just to be unambiguous.

    However, even though 0x19 probably has the bug that can insert 32 Kbytes of zeros into the flash, you can't use this patch on devices with firmware revision 0x19. This patch does a very specific hack to two bytes of code in the revision 0x25 firmware, and the patch most likely will not work on 0x19, and will probably cause the chip to malfunction at best, and lose data at worst. There is a reason the selection criteria are so strict for applying this patch to the emmc firmware.

    I passed on our results a few days later mentioning that the file system didn't corrupt until the wipe. This is a response to that follow-up.

    As I mentioned in the previous post, firmware rev 0x19 has a bug where the emmc chip can lockup after an erase command is given. Not every time, but often enough. Usually, the device can reboot after this, but then lockup during the boot process. Very rarely, it can lockup even before fastboot is loaded. Your tester was unlucky. Since you can't even start fastboot, the device is probably bricked. :-( If he could run fastboot, then the device could probably be recovered with the firmware update code I have, assuming I can share it. I'll ask.

    Question: Why the /data partition?
    Ken Sumrall (Android SE) said:
    Because /data is the place the chip that experiences the most write activity. /system is never written to (except during an system update) and /cache is rarely used (mostly to receiving OTAs).

    Question: Why JTAG won't work?
    Ken Sumrall said:
    As I mention above, the revision 0x19 firmware had a bug that after an emmc erase command, it could leave the internal data structures of the emmc chip in a bad state that cause the chip to lock up when a particular sector was accessed. The only fix was to wipe the chip, and update the firmware. I have code to do that, but I don't know if I can share it. I'll ask.

    Question: Can a corrupted file system be repaired (on the eMMC)?
    Ken Sumrall said:
    e2fsck can repair the filesystem, but often the 32 Kbytes were inserted at the start of a block group, which erased many inodes, and thus running e2fsck would often result in many files getting lost.

    So, while the fix doesn't apply to us at the moment, we've been given a great insight into the superbrick issue as well as information that a fix is already developed (hopefully we'll see it released!). The bug likely applies to us and assuming the fix for the 0x19 firmware is given then it would apply to our devices.

    On a lighter note, I wanted to include his close:
    Ken Sumrall said:
    You are getting a glimpse into the exciting life of an Android kernel developer. :) Turns out the job is mostly fighting with buggy hardware. At least, it seems that way sometimes.
    18
    5/24 Update

    Fresh off the press...

    Q: Do you know the commands to reset the eMMC controller in question?
    Ken Sumrall (Android) said:
    Once the chip has locked up, no command will reset it. It must be power cycled. If instead you are asking how to clear the metadata so the chip works again, the only solution I know is to update the firmware inside the chip, and that also wipes all the data. This probably includes factory calibration data that must be saved before the firmware is updated, and restored after. Also, the boot loader is probably in the chip, and must be restored after the firmware update, or the device will be bricked. This is a dangerous operation, because if something goes wrong, the device will probably be bricked. (emphasis added)

    Q: Is there any documentation available on this issue? If so and it is private is it possible to have it released?
    Ken Sumrall (Android) said:
    It is private, I'm asking if I can release it, along with the code to update the emmc firmware. Don't get your hopes up, my guess is the answer will be no.

    Q: Alternate erase method?
    Ken Sumrall (Android) said:
    If you really want to erase data on a rev 0x19 samsung emmc chip, I suggest you just write zeros to the entire partition.

    Q: Difference between GB/ICS Wipe
    Ken Sumrall (Android) said:
    IIRC, when using recovery to wipe the phone on GB, the make_ext4fs() library would not issue an erase command first, it would just write one a new blank filesystem. This, of course, doesn't really erase all the private data of the user, so we changed make_ext4fs() to first erase the partition, then write out the new filesystem. You can see the code in system/extras/ext4_utils/wipe.c, which didn't exist in gingerbread, but does in honeycomb and later. It is the erase operation on the rev 0x19 firmware that can cause the emmc chip to lockup. (emphasis added)

    Regarding Entropy512's summary of observations:
    Ken Sumrall (Android) said:
    Regarding the notes on MMC_CAP_ERASE, the just lists the cards ability to perform the erase command. In other words, if the mmc_erase() function works. What is more important is if anyone calls the mmc_erase() function. Looking at the mmc driver, in drivers/card/block.c, it is only called when a secure discard or discard request is made. As far as I know, those requests are only sent if the filesystem is mounted with the "discard" option, or if userspace code does an ioctl() to erase a partition, like make_ext4fs does. So check the mount options on the filesystems. *If they don't specify "discard", the erase operations are probably not happening.

    Of course, a simple debugging printk() in the mmc_erase() function will tell you if anyone is calling it.

    Additional info not directly related to questions:
    Ken Sumrall (Android) said:
    The lockup doesn't happen immediately after power-on. The chip doesn't lock up until a sector is referenced that has corrupted wear leveling data inside the chip. Once that sector is referenced, the chip will lockup hard, and the only thing that will get it talking again is to power cycle it. Once it is power cycled, the chip will talk again, until that bogus sector is accessed.
    16
    I was able to confirm with chris41g and T.C.P that the update-binary, included with both the CM9 and AOKP ROM update.zip files, was compiled against ICS source code.

    This means that this update-binary can potentially trigger the EMMC firmware lockup/superbrick bug if "format()" is used in the Edify install script. Both CM9 and AOKP updater-script format("/system") as part of their installs.

    This explains why they brick so often with the bluelight-only style brick, it is because the format("/system") elicits the lockup when make_ext4fs() is called on the /system partition and eventually results in an mmc_erase(). With a boarked /system, you are left with the blue-light special and ODIN will hang on factoryfs.img.

    The other style of brick is the one that hangs on the logo. That one is caused by Wipe Data/Factory Reset from an "unsafe" cWM Recovery. Just as above, the "unsafe" CWM Recovery would have been compiled against the ICS libext4_utils.a and make_ext4fs() would be called on /data and /cache. Eventually that would lead to mmc_erase() being called. With a boarked /data, you are left with a phone that hangs on the log and ODIN will hang on data.img

    As an aside, I now believe the main "unsafe" Recovery we were dealing with was CWM Touch fake-flashed onto the ICS kernel. I don't have the source code for CWM Touch, but I believe it was compiled against the ICS libext4_utils.a. I am not sure if Rogue CWM repacked with an ICS kernel was compiled against ICS sources, but the version that Steady checked into github was not.

    Now previously I recall discussions where it was determined that GB source/kernels also had mmc_erase() and it was an open question why, if both GB and ICS had mmc_erase(), Recovery Wipe Data/Factory Reset would brick one but not the other. Well we finally got the answer to that, the GB version of make_ext4fs() from libext4_utils.a did not perform the ioctl() calls which would eventually lead to mmc_erase() being called, while the ICS version of libext4_utils.a had the ioctl() calls.

    Now if you've been following closely so far, you may be wondering why if the update-binary that is bundled with both CM9 and AOKP ROM update.zip files is using the ICS version of libext4_utils.a (that makes the ioctl() calls which lead to mmc_erase()) it doesn't brick when run under a GB-kernel/CWM. Both ICS and GB have mmc_erase() functionality so it should brick both right?

    Well I wondered the same thing, went digging a little, and believe have found the answer.

    It turns out that even though the GB kernel MMC driver has mmc_erase() the ioctl() calls that would eventually lead to mmc_erase() being called were ifdef'd out. So if you had an "unsafe" update-binary which made the ioctl() calls, they would NOT result in mmc_erase() being called when the ROMs update.zip was run under a GB kernel/recovery.

    Previously the discussion centered around MMC_CAP_ERASE as the explanation, which I believe is a red-herring. This is neither a pre-processor directive nor a direct control mechanism. It is a bit field used to specify whether an MMC device supports the capability of doing erases. It was not ifdef'd out of GB so the functionality was still there.

    The actual conditional compile was on the ioctl() function for mmc cards, which was basically disabled under GB kernels. This is how I came to that conclusion:

    Notice in the kernel config file Samsung provided, both CONFIG_MMC_DISCARD and CONFIG_TARGET_LOCALE_NTT are disabled.

    .config
    Code:
    # CONFIG_MMC_DEBUG is not set
    [B][U]# CONFIG_MMC_DISCARD is not set[/U][/B]
    CONFIG_MMC_UNSAFE_RESUME=y
    # CONFIG_MMC_EMBEDDED_SDIO is not set
    # CONFIG_MMC_PARANOID_SD_INIT is not set
    Code:
    # CONFIG_MACH_C1_NA_SPR_EPIC2_REV00 is not set
    # CONFIG_TARGET_LOCALE_EUR is not set
    # CONFIG_TARGET_LOCALE_KOR is not set
    [B][U]# CONFIG_TARGET_LOCALE_NTT is not set[/U][/B]
    CONFIG_TARGET_LOCALE_NA=y

    Now if you look at the mmc block device driver, due to the kernel config above, the ioctl() function call in the function table is left unpopulated. This basically means ioctl() calls are not supported for the mmc device (based on E4GT GB source, don't know about other platforms).

    drivers/mmc/card/block.c
    Code:
    static const struct block_device_operations mmc_bdops = {
    	.open			= mmc_blk_open,
    	.release		= mmc_blk_release,
    [B][U]#ifdef CONFIG_MMC_DISCARD[/U][/B]
    	.ioctl			= mmc_blk_ioctl,
    [B][U]#endif[/U][/B]
    [B][U]#if defined(CONFIG_TARGET_LOCALE_NTT)[/U][/B]
    #if 0 //def CONFIG_MMC_CPRM
    	.ioctl			= mmc_blk_ioctl,	//int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);  in blkdev.h
    #endif
    [B][U]#endif[/U][/B]

    Now when the "unsafe" update-binary from CM9/AOKP is paired with a GB kernel the wipe.c from libext4_utils.a will try and make these calls:

    system/extras/ext4_utils/wipe.c
    Code:
    int wipe_block_device(int fd, s64 len)
    {
      u64 range[2];
      int ret;
    
      range[0] = 0;
      range[1] = len;
    
      ret = [B][U]ioctl(fd, BLKSECDISCARD, &range);[/U][/B]
    
      if (ret < 0) {
        range[0] = 0;
        range[1] = len;
        ret = [B][U]ioctl(fd, BLKDISCARD, &range);[/U][/B]
      }
      return 0;
    }

    I believe that both those calls have basically been disabled because ioctl() calls are not supported in the mmc block device driver under the GB kernel (for E4GT EL29 GB kernel source).

    So what does that all mean?

    I believe GB systems are doubly "safe". The kernel will not execute mmc_erase() even though the functionality is there because the ioctl() call entry point from user space to kernel space has been disabled and also GB recovery and GB update-binary never even attempt to make the ioctl() calls.

    That explains why, if you pair an "unsafe" update-binary from the CM9/AOKP ROM update.zip with a GB kernel, it is still safe.