FORUMS
Remove All Ads from XDA

Discussion thread for /data EMMC lockup/corruption bug

5,342 posts
Thanks Meter: 7,242
 
By sfhub, Senior Member on 9th May 2012, 01:08 PM
Post Reply Email Thread
30th May 2012, 10:55 AM |#391  
Senior Member
Thanks Meter: 65
 
More
Quote:
Originally Posted by ykk_five

hi sfhub


i agreed with you and i finally found a guy who shares similar thoughts with me. i have read the source codes for /fs/ext4 and some other relevant files last week. in my opinion, it is not a good way to just disable the MMC_CAP_ERASE only, but to, as u quoted before, to return a false when the wipe or wipe_rq function is called and let other old functions to handle the requests. since the wipe_rq and related functions are black boxes actually, we don need to care about the MMC_CAP_ERASE or other variables in other programs since we have stopped every code to call the wipe and format request

hope this helps

Kernel developers should disable MMC_CAP_ERASE because then, no matter how the wipe is triggered it's safe. It also covers the case of deleting a very large file, which is a case that Entropy suspects could cause the emmc fault.

Quote:
Originally Posted by musashiro

just an update guys

They cannot reproduce the fault most probably because they fixed it in their recovery. There is a recovery log that they should post, I bet it says wipe blocked. See a post by hardcore and one by me in the Note's speedmod thread.
 
 
30th May 2012, 12:09 PM |#392  
Senior Member
Thanks Meter: 65
 
More
I think Samsung opted to workaround the problem in the recovery instead of disabling MMC_CAP_ERASE because of a possible loss of performance over time if the Trim command isn't issued, what do you think?

Also if we knew for certain what are ALL the triggers for the emmc bug in the 0x19 firmware we could determine the best place for the fix.
30th May 2012, 12:57 PM |#393  
Member
Flag Cardiff
Thanks Meter: 1
 
More
Quote:
Originally Posted by sfhub

Don't worry about it, I already updated to include the FE10 update-binary and it is the same.

snip

If they patch the EMMC firmware itself, then everything is "safe" regardless. If they patch the kernel MMC driver, then everything packed with that patched kernel is safe, even without recompiling/relinking.

snip

Great analysis and completely agree with you.

But was wondering that even if Samsung patches EMMC firmware itself then still they can't provide it as a flashable solution for everyone, right ? As according to my understanding flashing of eMMC low-level firmware won't be possible from high-levels and needs some sort of JTAG/programmer to program eMMC. So if it is correct then they will do other workarounds (likely what you mentioned) in upper layers. CMIIW.
30th May 2012, 01:54 PM |#394  
Senior Recognized Developer
Flag Owego, NY
Thanks Meter: 25,477
 
Donate to Me
More
Quote:
Originally Posted by jgaviota

I think Samsung opted to workaround the problem in the recovery instead of disabling MMC_CAP_ERASE because of a possible loss of performance over time if the Trim command isn't issued, what do you think?

Also if we knew for certain what are ALL the triggers for the emmc bug in the 0x19 firmware we could determine the best place for the fix.

So far, there's only one case where there is evidence that Samsung worked around the eMMC defect - and they did it in the kernel by disabling MMC_CAP_ERASE. (I9100 update4 sources).

With a kernel fix, you guarantee to have fixed EVERY possible method for triggering the bug on the target device.

If you try to fix it in recovery - there are, as sfhub pointed out, 2-4 different places you have to fix it, and in many cases (update-binaries), there could be "dangerous" items still floating out there. This also means that you have to fork your recovery source code from AOSP and maintain THAT as a separate branch and hope everyone else for your device does too (otherwise, the third-party developer community can still give you a PR fiasco). Fix it in the kernel and you fix it for all situations.

As to performance impacts - all of the evidence we have says that TRIM/ERASE simply aren't issued in any "safe" firmware configuration.
The Following User Says Thank You to Entropy512 For This Useful Post: [ View ]
30th May 2012, 02:10 PM |#395  
Senior Member
Thanks Meter: 65
 
More
Quote:
Originally Posted by Entropy512

So far, there's only one case where there is evidence that Samsung worked around the eMMC defect - and they did it in the kernel by disabling MMC_CAP_ERASE. (I9100 update4 sources).

With a kernel fix, you guarantee to have fixed EVERY possible method for triggering the bug on the target device.

If you try to fix it in recovery - there are, as sfhub pointed out, 2-4 different places you have to fix it, and in many cases (update-binaries), there could be "dangerous" items still floating out there. This also means that you have to fork your recovery source code from AOSP and maintain THAT as a separate branch and hope everyone else for your device does too (otherwise, the third-party developer community can still give you a PR fiasco). Fix it in the kernel and you fix it for all situations.

As to performance impacts - all of the evidence we have says that TRIM/ERASE simply aren't issued in any "safe" firmware configuration.


Are TRIM/ERASE commands issued on normal gingerbread file operations?

And is there evidence that anything else but a wipe or format with mmc_erase causes the bug?
30th May 2012, 02:31 PM |#396  
Inactive Recognized Developer
Thanks Meter: 2,835
 
Donate to Me
More
Quote:
Originally Posted by jgaviota

Kernel developers should disable MMC_CAP_ERASE because then, no matter how the wipe is triggered it's safe. It also covers the case of deleting a very large file, which is a case that Entropy suspects could cause the emmc fault.


MMC_CAP_ERASE is just a boolean that trigger the bug. yes, it can be disabled but u cant say others wont turn it on later in other functions. so once we return a false in the wipe function, we don need to care about the MMC_CAP_ERASE

eg, your computer wont turn on no matter the swith (MMC_CAP_ERASE) is on or off in the front panel, since the power cord is unplugged (deny all wipe requests). that is to say, there is no way to power it up unless it is plugged (accept wipe request)




Quote:
Originally Posted by jgaviota

Are TRIM/ERASE commands issued on normal gingerbread file operations?

i havent check if they exist or working properly in gb, but i can say there were lots of bugs regarding the trim in ext4 before. i dono if these bugs were all fixed already
30th May 2012, 03:08 PM |#397  
Senior Recognized Developer
Flag Owego, NY
Thanks Meter: 25,477
 
Donate to Me
More
Quote:
Originally Posted by ykk_five

MMC_CAP_ERASE is just a boolean that trigger the bug. yes, it can be disabled but u cant say others wont turn it on later in other functions. so once we return a false in the wipe function, we don need to care about the MMC_CAP_ERASE

This makes no sense. MMC_CAP_ERASE is the lowest-level situation where an erase function can be blocked.

Your claimed logic is EXACTLY why disabling MMC_CAP_ERASE is the best option - It prevents ERASE commands from ever reaching the flash chip, no matter what the upper level software tries to do. Dicking around with wipe functions in recovery means that users are still at risk if someone accidentally slips up and puts in an ICS update-binary in one of their CWM zips.

There is no "other functions" "later" - MMC_CAP_ERASE is a capability flag at the lowest level of the software stack, in the MMC driver itself. There is no lower-level way to disable ERASE commands from being sent to the chip (except ugly hacks like commenting out all of the mmc_erase() function...)

Disabling MMC_CAP_ERASE is exactly how one sanely achieves forcing mmc_erase() to always return -ENOTSUPP - There is NO way to override or work around that change without making another change deep down in the kernel driver (either re-adding the flag or totally rewriting the mmc_erase() function, ignoring all of the checks that are already there.)
30th May 2012, 04:23 PM |#398  
Inactive Recognized Developer
Thanks Meter: 2,835
 
Donate to Me
More
Quote:
Originally Posted by Entropy512

This makes no sense. MMC_CAP_ERASE is the lowest-level situation where an erase function can be blocked.

Your claimed logic is EXACTLY why disabling MMC_CAP_ERASE is the best option - It prevents ERASE commands from ever reaching the flash chip, no matter what the upper level software tries to do. Dicking around with wipe functions in recovery means that users are still at risk if someone accidentally slips up and puts in an ICS update-binary in one of their CWM zips.

There is no "other functions" "later" - MMC_CAP_ERASE is a capability flag at the lowest level of the software stack, in the MMC driver itself. There is no lower-level way to disable ERASE commands from being sent to the chip (except ugly hacks like commenting out all of the mmc_erase() function...)

Disabling MMC_CAP_ERASE is exactly how one sanely achieves forcing mmc_erase() to always return -ENOTSUPP - There is NO way to override or work around that change without making another change deep down in the kernel driver (either re-adding the flag or totally rewriting the mmc_erase() function, ignoring all of the checks that are already there.)


that's exactly what i meant, do it on the kernel driver
30th May 2012, 04:35 PM |#399  
Senior Member
Thanks Meter: 65
 
More
I agree with you that MMC_CAP_ERASE is the safest and thoroughest place to fix this.

Now let's look at all of this through Samsung's perspective:

They have a emmc firmware problem.
They probably sent errata to all the affected partners.
They fixed it on stock firmwares (This has to be confirmed with the help of a kernel that logs if mmc_erase is called from stock recovery, but from all I have read it seems that not one brick is from unmodified stock roms, even on the Note)
No one that stays in stock will brick his Note
If someone bricks his phone with leaked or custom roms tough luck.
Everyone (Samsung) is happy.

What is the problem with this scenario?
Easy Samsung forgot to consider us important partners.
If they had made the firmware errata and their fix public, it would have saved us a lot of headaches.

A suggestion for Samsung, be more welcoming of developers and forthcoming with us.
One thing many would love to see for example is a package containing all the needed drivers and documentation to port the next version of Android when they themselves are working with it.
30th May 2012, 04:40 PM |#400  
OP Senior Member
Thanks Meter: 7,242
 
More
You can put the workaround in either place, each has positives and negatives. I suggest you take both approaches.

If you do the workaround int he kernel mmc driver, all Recovery and update-binary executables (even if not recompiled/relinked) will no longer trigger the issue *IF* they are repacked with the new kernel.

However in the case that Samusng didn't place their workaround in the kernel mmc driver, you are then forcing everyone to use a "recompiled" kernel to get the "safe" feature, rather than having the flexibility to use "repacked" stock kernels. Some people prefer "repacked" stock kernels.

If you do the workaround in libext4_utils.a, there are 2 ways you can go about it:
1) just use the pre-existing GB binaries for Recovery and update-binary. This doesn't even need a recompile. This doesn't even need ICS source.
2) put the workaround in the ICS libext4_utils.a. You don't need to make changes in 2-4 places. You just need to make the one change in wipe.c mentioned earlier, then *relink* your existing Recovery and update-binary against the patched libext4_utils.a. There is no need to "fork" Recovery or update-binary, since you just make the change once at the lowest "user-space" level code.

The advantage of the workaround in Recovery and update-binary, is it will work with any kernel (even the existing "unsafe" ones) and you can do it right now, even without needing to access any ICS source code.

You would need to have the people who put together kernel images start using the GB-based Recovery binary and also instruct the people putting together ROM packages to use the GB-based update-binary, the latter is something most ROM developers already do on E4GT, except possibly the AOSP/AOKP folks.

If you opt for the source code change, you are going to fork a file somewhere, either a build make file or some include file in the case of the mmc driver workaround, or a single function in wipe.c. It is no more or less difficult to maintain a fork of the build make file as it is to maintain a fork of wipe.c.

We already know having the workaround in Recovery and update-binary coupled with an "unsafe" kernels that do mmc_erase() works fine as most of us have been running this way since the beginning. We never heard of these EMMC lockup/superbricks in GB and the "workaround" in GB was essentially placed in Recovery and update-binary, because the kernel had MMC_CAP_ERASE enabled.

So why don't you just do both methods? That addresses the most situations and gives the most flexibility in producing "safe" environments.
The Following 2 Users Say Thank You to sfhub For This Useful Post: [ View ] Gift sfhub Ad-Free
30th May 2012, 04:52 PM |#401  
Inactive Recognized Developer
Thanks Meter: 2,835
 
Donate to Me
More
Quote:
Originally Posted by sfhub

You can put the workaround in either place, each has positives and negatives. I suggest you take both approaches.

If you do the workaround int he kernel mmc driver, all Recovery and update-binary executables (even if not recompiled/relinked) will no longer trigger the issue *IF* they are repacked with the new kernel.

However in the case that Samusng didn't place their workaround in the kernel mmc driver, you are then forcing everyone to use a "recompiled" kernel to get the "safe" feature, rather than having the flexibility to use "repacked" stock kernels. Some people prefer "repacked" stock kernels.

If you do the workaround in libext4_utils.a, there are 2 ways you can go about it:
1) just use the pre-existing GB binaries for Recovery and update-binary. This doesn't even need a recompile. This doesn't even need ICS source.
2) put the workaround in the ICS libext4_utils.a. You don't need to make changes in 2-4 places. You just need to make the one change in wipe.c mentioned earlier, then *relink* your existing Recovery and update-binary against the patched libext4_utils.a. There is no need to "fork" Recovery or update-binary, since you just make the change once at the lowest "user-space" level code.

The advantage of the workaround in Recovery and update-binary, is it will work with any kernel (even the existing "unsafe" ones) and you can do it right now, even without needing to access any ICS source code.

You would need to have the people who put together kernel images start using the GB-based Recovery binary and also instruct the people putting together ROM packages to use the GB-based update-binary, the latter is something most ROM developers already do on E4GT, except possibly the AOSP/AOKP folks.

If you opt for the source code change, you are going to fork a file somewhere, either a build make file or some include file in the case of the mmc driver workaround, or a single function in wipe.c. It is no more or less difficult to maintain a fork of the build make file as it is to maintain a fork of wipe.c.

We already know having the workaround in Recovery and update-binary coupled with an "unsafe" kernels that do mmc_erase() works fine as most of us have been running this way since the beginning. We never heard of these EMMC lockup/superbricks in GB and the "workaround" in GB was essentially placed in Recovery and update-binary, because the kernel had MMC_CAP_ERASE enabled.

So why don't you just do both methods? That addresses the most situations and gives the most flexibility in producing "safe" environments.


you are right

and u have pointed out the pros and cons for the 2 approaches clearly

i am pleased with respects (hope i say this correctly)
Post Reply Subscribe to Thread

Guest Quick Reply (no urls or BBcode)
Message:
Previous Thread Next Thread
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes