FORUMS
Remove All Ads from XDA

Discussion thread for /data EMMC lockup/corruption bug

5,342 posts
Thanks Meter: 7,242
 
By sfhub, Senior Member on 9th May 2012, 01:08 PM
Post Reply Email Thread
24th May 2012, 08:44 PM |#291  
Member
Thanks Meter: 1
 
More
Quote:
Originally Posted by sfhub

I understand what you are suggesting and it is a reasonable suggestion, except trying to determine the bad block list is just an exercise in frustration because everytime the affected blocks are accessed it just hangs the system. This leads to just lopping of chunks of EMMC and even that is frustrating because of the trial and error and the hangs.

1. I would be supprised if kernel dint complain about it with sector ID.
2. just small script should be able to find the block... something in the likes for... 4kb block do... dd seek=offset bs=4k ; echo block \cr (dont scrooll screen) ...done. when it freezes you got about 256k window where the problem is.
3. Mabe the driver can reset controler/or it could be implemented. (linux kernel likes to do that for IDE/SATA/SCSI drivers when there is problem.
4. e2fsck can take badblocks as an option and you don't need to format partition.
 
 
24th May 2012, 09:31 PM |#292  
OP Senior Member
Thanks Meter: 7,242
 
More
Quote:
Originally Posted by monoko

1. I would be supprised if kernel dint complain about it with sector ID.
2. just small script should be able to find the block... something in the likes for... 4kb block do... dd seek=offset bs=4k ; echo block \cr (dont scrooll screen) ...done. when it freezes you got about 256k window where the problem is.
3. Mabe the driver can reset controler/or it could be implemented. (linux kernel likes to do that for IDE/SATA/SCSI drivers when there is problem.
4. e2fsck can take badblocks as an option and you don't need to format partition.

My understanding of the issue is the kernel is making blocking calls to the underlying functions which aren't returning.

It isn't a small block in many cases. It often is a large block or large set of scattered blocks.

If the emmc driver/controller actually returned an error with errno saying this block can be accessed, then it would be much easier to deal with.

Perhaps you can work with somebody who has the EMMC lockup problem and develop something semi-automated as a proof of concept. Can even get some stats on what percentage of the EMMC is damaged.
24th May 2012, 09:54 PM |#293  
garwynn's Avatar
Retired Forum Moderator / Inactive Recognized Developer / XDA Portal Team
Flag NE Ohio
Thanks Meter: 8,731
 
Donate to Me
More
5/24 Update
Fresh off the press...

Q: Do you know the commands to reset the eMMC controller in question?
Quote:
Originally Posted by Ken Sumrall (Android)

Once the chip has locked up, no command will reset it. It must be power cycled. If instead you are asking how to clear the metadata so the chip works again, the only solution I know is to update the firmware inside the chip, and that also wipes all the data. This probably includes factory calibration data that must be saved before the firmware is updated, and restored after. Also, the boot loader is probably in the chip, and must be restored after the firmware update, or the device will be bricked. This is a dangerous operation, because if something goes wrong, the device will probably be bricked. (emphasis added)

Q: Is there any documentation available on this issue? If so and it is private is it possible to have it released?
Quote:
Originally Posted by Ken Sumrall (Android)

It is private, I'm asking if I can release it, along with the code to update the emmc firmware. Don't get your hopes up, my guess is the answer will be no.

Q: Alternate erase method?
Quote:
Originally Posted by Ken Sumrall (Android)

If you really want to erase data on a rev 0x19 samsung emmc chip, I suggest you just write zeros to the entire partition.

Q: Difference between GB/ICS Wipe
Quote:
Originally Posted by Ken Sumrall (Android)

IIRC, when using recovery to wipe the phone on GB, the make_ext4fs() library would not issue an erase command first, it would just write one a new blank filesystem. This, of course, doesn't really erase all the private data of the user, so we changed make_ext4fs() to first erase the partition, then write out the new filesystem. You can see the code in system/extras/ext4_utils/wipe.c, which didn't exist in gingerbread, but does in honeycomb and later. It is the erase operation on the rev 0x19 firmware that can cause the emmc chip to lockup. (emphasis added)

Regarding Entropy512's summary of observations:
Quote:
Originally Posted by Ken Sumrall (Android)

Regarding the notes on MMC_CAP_ERASE, the just lists the cards ability to perform the erase command. In other words, if the mmc_erase() function works. What is more important is if anyone calls the mmc_erase() function. Looking at the mmc driver, in drivers/card/block.c, it is only called when a secure discard or discard request is made. As far as I know, those requests are only sent if the filesystem is mounted with the "discard" option, or if userspace code does an ioctl() to erase a partition, like make_ext4fs does. So check the mount options on the filesystems. *If they don't specify "discard", the erase operations are probably not happening.

Of course, a simple debugging printk() in the mmc_erase() function will tell you if anyone is calling it.

Additional info not directly related to questions:
Quote:
Originally Posted by Ken Sumrall (Android)

The lockup doesn't happen immediately after power-on. The chip doesn't lock up until a sector is referenced that has corrupted wear leveling data inside the chip. Once that sector is referenced, the chip will lockup hard, and the only thing that will get it talking again is to power cycle it. Once it is power cycled, the chip will talk again, until that bogus sector is accessed.

The Following 18 Users Say Thank You to garwynn For This Useful Post: [ View ] Gift garwynn Ad-Free
24th May 2012, 11:11 PM |#294  
Senior Member
Thanks Meter: 65
 
More
Quote:
Originally Posted by garwynn

Q: Do you know the commands to reset the eMMC controller in question?

Quote:
Originally Posted by Ken Sumrall (Android)

Once the chip has locked up, no command will reset it. It must be power cycled. If instead you are asking how to clear the metadata so the chip works again, the only solution I know is to update the firmware inside the chip, and that also wipes all the data. This probably includes factory calibration data that must be saved before the firmware is updated, and restored after. Also, the boot loader is probably in the chip, and must be restored after the firmware update, or the device will be bricked. This is a dangerous operation, because if something goes wrong, the device will probably be bricked.

Q: Is there any documentation available on this issue? If so and it is private is it possible to have it released?
Quote:
Originally Posted by Ken Sumrall (Android)

It is private, I'm asking if I can release it, along with the code to update the emmc firmware. Don't get your hopes up, my guess is the answer will be no.

Ok this two questions clear up that if a device is already bricked the steps to fix the issue would be:
1) Turn off the device
2) with JTag backup the factory calibration data and stuff
3) update the emmc firmware
4) Restore the backup
5) with JTag flash the whole stock rom back including bootloaders

The people that have attempted JTag repairs haven't been successfull because they cannot make step 3 as the firmware is not available.

Quote:
Originally Posted by garwynn

Q: Alternate erase method?

Quote:
Originally Posted by Ken Sumrall (Android)

If you really want to erase data on a rev 0x19 samsung emmc chip, I suggest you just write zeros to the entire partition.

Q: Difference between GB/ICS Wipe
Quote:
Originally Posted by Ken Sumrall (Android)

IIRC, when using recovery to wipe the phone on GB, the make_ext4fs() library would not issue an erase command first, it would just write one a new blank filesystem. This, of course, doesn't really erase all the private data of the user, so we changed make_ext4fs() to first erase the partition, then write out the new filesystem. You can see the code in system/extras/ext4_utils/wipe.c, which didn't exist in gingerbread, but does in honeycomb and later. It is the erase operation on the rev 0x19 firmware that can cause the emmc chip to lockup. (emphasis added)

Regarding Entropy512's summary of observations:
Quote:
Originally Posted by Ken Sumrall (Android)

Regarding the notes on MMC_CAP_ERASE, the just lists the cards ability to perform the erase command. In other words, if the mmc_erase() function works. What is more important is if anyone calls the mmc_erase() function. Looking at the mmc driver, in drivers/card/block.c, it is only called when a secure discard or discard request is made. As far as I know, those requests are only sent if the filesystem is mounted with the "discard" option, or if userspace code does an ioctl() to erase a partition, like make_ext4fs does. So check the mount options on the filesystems. *If they don't specify "discard", the erase operations are probably not happening.

Of course, a simple debugging printk() in the mmc_erase() function will tell you if anyone is calling it.

All of this is kernel work.

1) replace the mmc_erase() function with something that either writes 0s or does nothing

2) Compare GB and ICS make_ext4fs() function

With that the kernels would be safe

Quote:
Originally Posted by garwynn

Additional info not directly related to questions:

Quote:
Originally Posted by Ken Sumrall (Android)

The lockup doesn't happen immediately after power-on. The chip doesn't lock up until a sector is referenced that has corrupted wear leveling data inside the chip. Once that sector is referenced, the chip will lockup hard, and the only thing that will get it talking again is to power cycle it. Once it is power cycled, the chip will talk again, until that bogus sector is accessed.

The most realistic option for people with bricked devices as of this moment is to have Samsung replace their motherboards. They should replace them free of charge because it is a hardware defect (or a software defect in hardware actually )
It's a long shot that Samsung would release a firmware upgrade to the public, as that would probably create even more problems. I think it wouldn't be an easy install. Even if they did release it, people with bricked devices would have a very hard time trying to fix the issue.

Given enough time another thing that could start happening is Samsung Service centres doing all the JTag and firmware updating stuff.
24th May 2012, 11:23 PM |#295  
Senior Member
Enon, OH
Thanks Meter: 78
 
Donate to Me
More
Quote:
Originally Posted by jgaviota

Ok this two questions clear up that if a device is already bricked the steps to fix the issue would be:
1) Turn off the device
2) with JTag backup the factory calibration data and stuff
3) update the emmc firmware
4) Restore the backup
5) with JTag flash the whole stock rom back including bootloaders

The people that have attempted JTag repairs haven't been successfull because they cannot make step 3 as the firmware is not available.



All of this is kernel work.

1) replace the mmc_erase() function with something that either writes 0s or does nothing

2) Compare GB and ICS make_ext4fs() function

With that the kernels would be safe



The most realistic option for people with bricked devices as of this moment is to have Samsung replace their motherboards. They should replace them free of charge because it is a hardware defect (or a software defect in hardware actually )
It's a long shot that Samsung would release a firmware upgrade to the public, as that would probably create even more problems. I think it wouldn't be an easy install. Even if they did release it, people with bricked devices would have a very hard time trying to fix the issue.

I'm sure Samsung's position is going to be that this bug does not occur on any publicly released firmware.

When you flash unreleased software, you roll the dice & take your chances.
24th May 2012, 11:26 PM |#296  
OP Senior Member
Thanks Meter: 7,242
 
More
Fortunately for most users (assuming the fix/workaround Samsung put in works as advertised) they will never know firsthand about this problem due to the efforts of everyone involved.

Also a big thanks to robertm2011 and the other volunteers who risked their phones so we could get more info.
The Following 3 Users Say Thank You to sfhub For This Useful Post: [ View ] Gift sfhub Ad-Free
24th May 2012, 11:40 PM |#297  
RainMotorsports's Avatar
Senior Member
Thanks Meter: 719
 
More
Quote:
Originally Posted by TerryMathews

I'm sure Samsung's position is going to be that this bug does not occur on any publicly released firmware.

When you flash unreleased software, you roll the dice & take your chances.

But it does. The note right? They probably wont release the eMMC info thats private but for say note owners they will extend a repair offer.
25th May 2012, 03:39 AM |#298  
garwynn's Avatar
Retired Forum Moderator / Inactive Recognized Developer / XDA Portal Team
Flag NE Ohio
Thanks Meter: 8,731
 
Donate to Me
More
Quote:
Originally Posted by RainMotorsports

But it does. The note right? They probably wont release the eMMC info thats private but for say note owners they will extend a repair offer.

Samsung will make those who had problems whole, if for no other reason than to protect the brand. How may vary by region and/or carrier...

Sent from my SPH-D710 using XDA
25th May 2012, 04:48 AM |#299  
musashiro's Avatar
Senior Member
Flag Tarlac
Thanks Meter: 89
 
More
I really hope they provide the fix on futher releases... samsung washing their hands of the problem would be the worst scenario..

Sent from my GT-N7000 using XDA
25th May 2012, 07:24 AM |#300  
prabhu1980's Avatar
Senior Member
Flag Indore
Thanks Meter: 128
 
More
Dear Mr Garwynn

can we issue the dd command to write zeroes to the partition as suggested by Ken Sumrall.

dd if=/dev/zero of=/dev/mmcblkxx bs=512 count=1

Also people have tried to recover using -c option which mounts the partition in a read only mode..
Why not use e2fsck -pFcc /dev/mmcblkxx

-cc opens in a read write mode ; so it can check.... and -F to force it ....
and -p to repair it

Another Theory :

eMMC has a bad block management built into the system.
It gets frozen when reading the sector ...

Is it possible to write without reading ... or bypass the chip getting locked while reading a bad sector ....
25th May 2012, 10:05 AM |#301  
OP Senior Member
Thanks Meter: 7,242
 
More
Quote:
Originally Posted by prabhu1980

Dear Mr Garwynn

can we issue the dd command to write zeroes to the partition as suggested by Ken Sumrall.

dd if=/dev/zero of=/dev/mmcblkxx bs=512 count=1

Mr. Sumrall was suggesting writing zeros as a replacement for mmc_erase() *BEFORE* the lockup occurs, not writing zeros after the fact. Once the lockup occurs, any access to the the affected blocks (including your dd command) will lockup your entire phone (requiring power cycle).

Quote:
Originally Posted by prabhu1980

Also people have tried to recover using -c option which mounts the partition in a read only mode..
Why not use e2fsck -pFcc /dev/mmcblkxx

-cc opens in a read write mode ; so it can check.... and -F to force it ....
and -p to repair it

Same as above.

Quote:
Originally Posted by prabhu1980

Another Theory :

eMMC has a bad block management built into the system.
It gets frozen when reading the sector ...

Is it possible to write without reading ... or bypass the chip getting locked while reading a bad sector ....

eMMC firmware itself and it's metadata are the problems. I don't know of anyway to bypass the eMMC firmware. We could try and replace the faulty firmware and that is what Garwynn is trying to attempt to get released.
Post Reply Subscribe to Thread

Guest Quick Reply (no urls or BBcode)
Message:
Previous Thread Next Thread
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes