Just got a break and a little late to the excitement here.
Thanks Esoteric68 for being willing to give this a go! And once again thanks to sfhub for the work and being right on top of it. Watching this with excitement. More XDA awesomeness here.
Just got a break and a little late to the excitement here.
Go through steps 1-6 again but with AOKP brickfix and update kernel instead of CM9, yeah?I just uploaded a BRICKFIX AOKP 37 so if you can at the end, ODIN flash AGAT CWM, and just flash AOKP 37 once so we can know it is safe also.
I'm in the groove, may as well give it one more run through just to be sure.If you have the energy to go through the whole set, sure, but I was just going to say flash AOKP BRICKFIXv1 once to make sure it works too (make sure to flash AGAT CWM repack first)
I think if you actually flashed AOKP for the whole test plan, we can be very confident there is no luck involved, as that will be something like 6 flashes of CM9, 3 flashes of AOKP, 3 factory resets, 3 nandroid restores, and 1 flash of mostly stock FF02
I think you already know, but for the Recovery portion, you can use the capacitative buttons.I'm in the groove, may as well give it one more run through just to be sure.
Will report back shortly (btw I might need new volume and power buttons for my phone when I'm done lol)
Thankfully lolI think you already know, but for the Recovery portion, you can use the capacitative buttons.
Up, Down, Back, Select (in left to right order)
You are awesome. Thanks so much for taking the risk.Thankfully lol
Okay the first flash of AOKP brickfix went fine, so I'm going to continue on down the list and see how it goes.
---------- Post added at 11:33 PM ---------- Previous post was at 11:10 PM ----------
Okay, steps 1-6 completed using the AOKP files. No issues to report. Home screen change was present following the Nandroid restore.
My pleasure glad I could help.You are awesome. Thanks so much for taking the risk.
Now I'm off to recruit a few more testers to make sure you don't have a super-phone that is immune to bricks.
Totally understand. Your contribution (sacrifice) is what helped move this thing forward and kept others from bricking during the same process.Looks like you guys may have it licked, Congratulations!
Way to go, Esoteric!
I would offer my phone up for testing, but...
Well, I hope y'all understand...
Just keep in mind the CM9 and AOKP public versions that were posted can still potentially brick your phone. Darchstar and T.C.P have mentioned they will make the necessary change next release.Sounds like the bug is fixed... I'll have to try flashing with AGAT's FF02 repack.
I figured it was safe to use, but i cant afford a replacement if anything were to happen...
If you plan on running through this test plan, you should be able to deal with a superbrick (having TEP or within return period would be best). Through code analysis, it is believed these are all safe operations, but until we test, we don't know for sure. I would like to start with one volunteer, then expand from there if successful.
Downloads
a) SPH-D710.ODIN_FF02_KERNEL_CWM_AGAT.exe
b) update-BRICKFIXv1-cm-9-20120606-SNAPSHOT-epic4gtouch-alpha5-signed.zip
c) update-kernel-CM9a5.zip
d) SPH-D710.FF02_CL663858_ROOTED-oc-sfx.exe
e) AOKP_BRICKFIXv1_Build-37_epic4gtouch.zip
f) update-kernel-AOKP-37.zip
Test Plan
1a) ODIN AGAT FF02 CWM Repack (Power+VolDwn)
1b) boot to recovery (Power+VolUp)
1c) Flash CM9a5 BRICKFIXv1
1d) reboot to CM9
2a) ODIN AGAT FF02 CWM Repack (Power+VolDwn)
2b) boot to recovery (Power+VolUp)
2c) Flash CM9a5 BRICKFIXv1
2d) reboot to CM9
3a) ODIN AGAT FF02 CWM Repack (Power+VolDwn)
3b) reboot to recovery (Power+VolUp)
3c) perform wipe data/factory reset
3d) Flash CM9a5 *kernel* update.zip
3e) boot to CM9
4a) ODIN AGAT FF02 CWM Repack (Power+VolDwn)
4b) boot to recovery (Power+VolUp)
4c) Flash CM9a5 BRICKFIXv1
4d) reboot to CM9
4e) make change to home workspace like adding/removing an app
5a) ODIN AGAT FF02 CWM Repack (Power+VolDwn)
5b) reboot to recovery (Power+VolUp)
5c) Flash CM9a5 *kernel* update.zip
5d) perform nandroid backup
5e) reboot to CM9
6a) ODIN AGAT FF02 CWM Repack (Power+VolDwn)
6b) reboot to recovery (Power+VolUp)
6c) perform nandroid restore
6d) reboot to CM9
6e) confirm your change from 4e is present
7a) ODIN Flash FF02 Stock (Power+VolDown)
7b) will automatically boot to stock FF02
8) Repeat Steps 1-6 using AOKP BRICKFIXv1 instead of CM9 BRICKFIXv1
Sorry I didn't respond earlier but I was asleep lol glad it went well and appreciate you running the test.Hey sfhub or Esoteric68 for steps (1c) & (2c) do I just flash 'CM9a5 BRICKFIXv1' without wiping right just want to double check b4 I perform this.... other than that I understand the instructions CLEARLY.
Edit: works for me and my changes from 4e are present on cm9 and AOKP. Looks like you may have got something here sfhub:thumbup:
Sent from my PaRAnO!D aNDr0id!
Yes, just want to get 2 straight flashes before doing a wipe. Nothing bad would happen if you didn't follow those steps and did a wipe, it is just a choice for what to test.Hey sfhub or Esoteric68 for steps (1c) & (2c) do I just flash 'CM9a5 BRICKFIXv1' without wiping right just want to double check b4 I perform this.... other than that I understand the instructions CLEARLY.
Keep in mind, at some point someone will need to take one for the team or we will be forever in fear of bricking our phones using ICS-based kernels.
I would just like to stress that the *core* EMMC firmware lockup/superbrick bug is still there and unless that is fixed folks will ALWAYS be exposed to it to some extent.A donate button for sfhub will allow him to take MANY for the team . Congrats on the fix!
Sorry I didn't respond earlier but I was asleep lol glad it went well and appreciate you running the test.
Update 14:56 CEST:
Patches will be out in form of new official ROMs and also sourcecode releases after testing, which might take some time.
To clarify this...in testing done over the weekend, there was a small "subtest" group which consisted of 20 devices. This group was put together STRICTLY for the propose of testing the emmc bug and fix. The devices were all programmed with the data known to have cause bricks when wiping. Of those 20, all but 6 also had the code patch to resolve that issue, so there was a possibility for 6 hard bricks, only 4 actually bricked, therefore, on the build currently being tested, the "emmc break issue" has been deemed "resolved"
If you don't have busybox installed just visually parse the line, match the serial # (0xd3f24fe6 - example only - yours will be different) with the cid, and look at the 2 numbers before the serial #.shell@android:/ $ su
shell@android:/ # cd /sys/class/block/mmcblk0/device
shell@android:/sys/class/block/mmcblk0/device # cat cid | cut -b 19,20
19
shell@android:/ $ su
shell@android:/ # cd /sys/class/block/mmcblk0/device
shell@android:/sys/class/block/mmcblk0/device # cat serial cid
0xd3f24fe6
1501004d414734464119d3f24fe68e8b
shell@android:/ $ su
shell@android:/ # cd /sys/class/block/mmcblk0/device
shell@android:/sys/class/block/mmcblk0/device # cat name hwrev fwrev manfid oemid date type serial cid
MAG4FA
0x0
0x0
0x000015
0x0100
08/2011
MMC
0xd3f24fe6
1501004d414734464119d3f24fe68e8b
/*
* There is a bug in some Samsung emmc chips where the wear leveling
* code can insert 32 Kbytes of zeros into the storage. We can patch
* the firmware in such chips each time they are powered on to prevent
* the bug from occurring. Only apply this patch to a particular
* revision of the firmware of the specified chips. Date doesn't
* matter, so include all possible dates in min and max fields.
*/
/* set value 0x000000FF : It's hidden data
* When in vendor command mode, the erase command is used to
* patch the firmware in the internal sram.
*/
err = mmc_movi_erase_cmd(card, 0x0004DD9C, 0x000000FF);
if (err) {
pr_err("Fail to Set WL value1\n");
goto err_set_wl;
}
/* set value 0xD20228FF : It's hidden data */
err = mmc_movi_erase_cmd(card, 0x000379A4, 0xD20228FF);
if (err) {
pr_err("Fail to Set WL value2\n");
goto err_set_wl;
}
Ken Sumrall said:The patch includes a line in mmc.c setting fwrev to the rights bits from the cid register. Before this patch, the file /sys/class/block/mmcblk0/device/fwrev was not initialized from the CID for emmc devices rev 4 and greater, and thus showed zero.
(On second inquiry)
fwrev is zero until the patch is applied.
Ken Sumrall said:You probably have the bug, but rev 0x19 was a previous version of the firmware we had in our prototype devices, but we found it had another bug that if you issued an mmc erase command, it could screw up the data structures in the chip and lead to the device locking up until it was powered cycled. We discovered this when many of our developers were doing a fastboot erase userdata while we were developing ICS. So Samsung fixed the problem and moved to firmware revision 0x25. Yes, it is very annoying that 0x19 is decimal 25, and that led to lots of confusion when trying to diagnose emmc firmware issues. I finally learned to _ALWAYS_ refer to emmc version in hexadecimal, and precede the number with 0x just to be unambiguous.
However, even though 0x19 probably has the bug that can insert 32 Kbytes of zeros into the flash, you can't use this patch on devices with firmware revision 0x19. This patch does a very specific hack to two bytes of code in the revision 0x25 firmware, and the patch most likely will not work on 0x19, and will probably cause the chip to malfunction at best, and lose data at worst. There is a reason the selection criteria are so strict for applying this patch to the emmc firmware.
I passed on our results a few days later mentioning that the file system didn't corrupt until the wipe. This is a response to that follow-up.
As I mentioned in the previous post, firmware rev 0x19 has a bug where the emmc chip can lockup after an erase command is given. Not every time, but often enough. Usually, the device can reboot after this, but then lockup during the boot process. Very rarely, it can lockup even before fastboot is loaded. Your tester was unlucky. Since you can't even start fastboot, the device is probably bricked. :-( If he could run fastboot, then the device could probably be recovered with the firmware update code I have, assuming I can share it. I'll ask.
Ken Sumrall (Android SE) said:Because /data is the place the chip that experiences the most write activity. /system is never written to (except during an system update) and /cache is rarely used (mostly to receiving OTAs).
Ken Sumrall said:As I mention above, the revision 0x19 firmware had a bug that after an emmc erase command, it could leave the internal data structures of the emmc chip in a bad state that cause the chip to lock up when a particular sector was accessed. The only fix was to wipe the chip, and update the firmware. I have code to do that, but I don't know if I can share it. I'll ask.
Ken Sumrall said:e2fsck can repair the filesystem, but often the 32 Kbytes were inserted at the start of a block group, which erased many inodes, and thus running e2fsck would often result in many files getting lost.
Ken Sumrall said:You are getting a glimpse into the exciting life of an Android kernel developer. Turns out the job is mostly fighting with buggy hardware. At least, it seems that way sometimes.
Ken Sumrall (Android) said:Once the chip has locked up, no command will reset it. It must be power cycled. If instead you are asking how to clear the metadata so the chip works again, the only solution I know is to update the firmware inside the chip, and that also wipes all the data. This probably includes factory calibration data that must be saved before the firmware is updated, and restored after. Also, the boot loader is probably in the chip, and must be restored after the firmware update, or the device will be bricked. This is a dangerous operation, because if something goes wrong, the device will probably be bricked. (emphasis added)
Ken Sumrall (Android) said:It is private, I'm asking if I can release it, along with the code to update the emmc firmware. Don't get your hopes up, my guess is the answer will be no.
Ken Sumrall (Android) said:If you really want to erase data on a rev 0x19 samsung emmc chip, I suggest you just write zeros to the entire partition.
Ken Sumrall (Android) said:IIRC, when using recovery to wipe the phone on GB, the make_ext4fs() library would not issue an erase command first, it would just write one a new blank filesystem. This, of course, doesn't really erase all the private data of the user, so we changed make_ext4fs() to first erase the partition, then write out the new filesystem. You can see the code in system/extras/ext4_utils/wipe.c, which didn't exist in gingerbread, but does in honeycomb and later. It is the erase operation on the rev 0x19 firmware that can cause the emmc chip to lockup. (emphasis added)
Ken Sumrall (Android) said:Regarding the notes on MMC_CAP_ERASE, the just lists the cards ability to perform the erase command. In other words, if the mmc_erase() function works. What is more important is if anyone calls the mmc_erase() function. Looking at the mmc driver, in drivers/card/block.c, it is only called when a secure discard or discard request is made. As far as I know, those requests are only sent if the filesystem is mounted with the "discard" option, or if userspace code does an ioctl() to erase a partition, like make_ext4fs does. So check the mount options on the filesystems. *If they don't specify "discard", the erase operations are probably not happening.
Of course, a simple debugging printk() in the mmc_erase() function will tell you if anyone is calling it.
Ken Sumrall (Android) said:The lockup doesn't happen immediately after power-on. The chip doesn't lock up until a sector is referenced that has corrupted wear leveling data inside the chip. Once that sector is referenced, the chip will lockup hard, and the only thing that will get it talking again is to power cycle it. Once it is power cycled, the chip will talk again, until that bogus sector is accessed.
# CONFIG_MMC_DEBUG is not set
[B][U]# CONFIG_MMC_DISCARD is not set[/U][/B]
CONFIG_MMC_UNSAFE_RESUME=y
# CONFIG_MMC_EMBEDDED_SDIO is not set
# CONFIG_MMC_PARANOID_SD_INIT is not set
# CONFIG_MACH_C1_NA_SPR_EPIC2_REV00 is not set
# CONFIG_TARGET_LOCALE_EUR is not set
# CONFIG_TARGET_LOCALE_KOR is not set
[B][U]# CONFIG_TARGET_LOCALE_NTT is not set[/U][/B]
CONFIG_TARGET_LOCALE_NA=y
static const struct block_device_operations mmc_bdops = {
.open = mmc_blk_open,
.release = mmc_blk_release,
[B][U]#ifdef CONFIG_MMC_DISCARD[/U][/B]
.ioctl = mmc_blk_ioctl,
[B][U]#endif[/U][/B]
[B][U]#if defined(CONFIG_TARGET_LOCALE_NTT)[/U][/B]
#if 0 //def CONFIG_MMC_CPRM
.ioctl = mmc_blk_ioctl, //int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long); in blkdev.h
#endif
[B][U]#endif[/U][/B]
int wipe_block_device(int fd, s64 len)
{
u64 range[2];
int ret;
range[0] = 0;
range[1] = len;
ret = [B][U]ioctl(fd, BLKSECDISCARD, &range);[/U][/B]
if (ret < 0) {
range[0] = 0;
range[1] = len;
ret = [B][U]ioctl(fd, BLKDISCARD, &range);[/U][/B]
}
return 0;
}