I posted this less technical writeup on agat63's cwm repacked thread but figured it would be useful to have here also. I am working with CM9/AOKP to have their install scripts replace format("/system") with delete_recursive("/system") After that I think if we still have volunteers, we are ready to do some more testing.
I'll provide more details when the pieces become available, but if you'd like to take one for the team and help test, please post.
If our understanding of the problem is accurate, you should be safe this time around, but there is always the chance we are not understanding the problem completely.
I would like to stress that the information I gave agat was based on code tracing and NOT based on real-world testing. You should treat this as a *testing period* to confirm the analysis. You MAY BRICK if the analysis is incorrect.
The following is all assuming you are repacked with ICS kernel (ie we aren't talking about the GB kernel)...
The nature of the problem is a call to the function make_ext4fs(). This function isn't provided by the kernel, rather it is provided as a library (libext4_utils.a) that is used when compiling Recovery and the update installer. It does end up eventually calling kernel mmc driver routines, which then trigger the EMMC firmware lockup/superbrick bug.
The make_ext4fs() function changed between GB and ICS. In GB the function didn't try to erase the partition before creating the EXT4 fs. In ICS it tries to erase it first. The erase is triggering an EMMC firmware bug that was always there via the kernel MMC driver. GB is also "doubly" safe in that not only do Recovery and update-binary never attempt to do the erase, even if they did, the request to erase is blocked in the GB-kernel and never run.
The EMMC firmware bug will lockup your phone and corrupt internal EMMC meta-data which cannot be accessed or repaired at this time. It isn't crashing your hard drive per-se, it is crashing your hard drive controller in a way that prevents the hard drive controller from accessing parts of your disk. We don't have any way of updating the EMMC hard drive controller at this time.
The EMMC firmware lockup/superbrick bug is likely contained in the wear-level firmware code which shifts mmc-internal memory block usage around to prevent any one area from overuse. The bug MIGHT NOT be triggered every time, so you can do the same operation with no issues then on your Nth attempt it bricks.
So what does that mean for you? There are 2 executables we are concerned with, Custom Recovery and the update installer (update-binary)
Custom Recovery is responsible for 2 potential bricking points:
1) wipe data/factory reset
2) nandroid backup/restore
These are both handled by the Recovery itself so if your Recovery is "safe" then these operations should be safe. The nandroid backup is safe regardless. Our concern is only for wipe data/factory reset and nandroid restore. Both of these make the call to make_ext4fs(), so if they are using the GB-based version they are safe. If they are using the ICS version, they are not safe (when used with ICS kernel) Agat has made the effort to make sure the recovery he has provided is compiled against GB CM7 source.
You may ask what about Installing ROMs, you thought Recovery was responsible for that too?
This is only partially true. You use the menu option in Recovery to choose to install an update.zip. Recovery is responsible for providing the location of the update.zip and verifying the signature, but when it comes to actually "installing" the update.zip, Recovery uses a "helper app" called update-binary contained in the update.zip.
This update-binary helper app is responsible for running the Edify install script in the update.zip. It communicates with Recovery just to update the progress bar, output ui messages, and set up the updating of firmware. The rest of the script functions, it handles by itself directly, so Recovery isn't involved.
update-binary also calls make_ext4fs() so it can also do potentially "unsafe" operations, just like we discussed for Recovery above. If the update-binary, that was included in the update.zip, was compiled using GB-sources, then it is "safe". If it was compiled against ICS sources then there is one function in the Edify script that can potentially cause bricking, format().
To be clear, Recovery has no control over the update-binary that is included in the update.zip. Whomever built the ROM update.zip package made that decision. So this is why even with a "safe" Recovery, you can brick your phone installing ROMs (with an ICS kernel).
Even if the Recovery is "safe", if you ask it to use an "unsafe" update-binary to install a ROM AND that ROM install script chose to do a format(), then the EMMC lockup/superbrick bug can be triggered.
The reason why most stock-based ROMs don't brick in ICS is because
1) most of them probably include a GB-based update-binary
2) most of them are not performing a format() within their Edify updater-script
So a ROM builder has 2 ways to make a ROM update.zip install "safe" to install in ICS. Either package a known GB-based update-binary OR eliminate format(), if present, from the Edify install script.
So why does Calk's format_all seem to never brick even on ICS?
Given the date on the update-binary and when he created the package, it is most likely using a GB-based update-binary
So why does CM9/AOKP seem to brick more often than stock-based ICS ROM installs?
The Edify install script for CM9/AOKP uses new functions that were introduced in the ICS update-binary. This in turn is why they bundle the ICS-based update-binary. They could still potentially be safe, but in the install script a format("/system") is performed. If that format is run under an ICS kernel it will trigger the EMMC firmware lockup/superbrick bug. Under a GB-kernel, the request to erase "/system" is blocked by the GB-kernel.
What can CM9/AOKP do to make their installs "safe" to install in ICS?
All they need to do is replace the format("system") with delete_recursive("/system"). They could also replace the ICS-based update-binary with a GB-based update-binary, but that would require more rewrites to the install script. Replacing the format() call is simpler/easier.
Why are some superbricks blue-light specials and others only make ODIN hang at data.img?
This likely has to do with whether you got your brick from
1) the format() in the CM9/AOKP install
2) restoring nandroid backup
3) doing the wipe data/factory reset in Recovery
The first two tend to be blue-light specials as they affect /system and/or kernel. The last one tends to affect /data and/or /cache.
So how do you make sure you are totally safe?
1) make sure you are using a "safe" recovery repacked with the stock ICS kernel. This is a Recovery that was compiled against GB-based libext4_utils.a (ie GB source) This will assure you that wipe data/factory reset and nandroid restores are safe
2) whenever you install a ROM for the first time, verify EITHER
a) the ROM install script is NOT performing any format() calls
b) the ROM install has bundled a GB-based update-binary
If neither 2a NOR 2b are true (ie ICS-based update-binary and install performs format) then you DO NOT want to flash that ROM in Recovery while on an ICS kernel. Flash that ROM on a GB-based kernel/recovery.
Hope that clears things up, and once again, remember, this analysis is only based on tracing code. I may have made a mistake in the analysis or our understanding of the problem could be wrong. We will not be sure if all these statements hold UNTIL WE DO REAL-WORLD TESTING.