5,600,053 Members 42,561 Now Online
XDA Developers Android and Mobile Development Forum

[HOWTO] Fix and prevent severe system freezes without wiping the device

Tip us?
 
KrissN
Old
(Last edited by KrissN; 12th December 2013 at 05:31 PM.)
#1  
Member - OP
Thanks Meter 373
Posts: 83
Join Date: Aug 2012
Location: Wrocław
Default [HOWTO] Fix and prevent severe system freezes without wiping the device

Not so long ago I started experiencing severe freezes and lockups on my I9300. The symptoms were well known as pre-SDS lockups. Before SDS workarounds have been released such symptoms would mean a death sentence for the device. While this is not the case any more the freezes still make the phone virtually unusable.

So far the only way to bring the phone back to usability was a full wipe of the device combined with filling the partitions with some data in order to ensure that every sector has been written to. Some have also succeeded to get rid of such behaviour after flashing stock firmware (after having been running AOSP).

Some technical background

The reason for the freezes (at least on my device) were block read errors. This is roughly equivalent to a situation where you HDD starts having bad sectors, however flash memory is a bit different to a magnetic disk drive. The read errors encountered most likely due to data corruption resulting in CRC errors during read. In my case the dmesg (kernel log) showed the following errors:

Code:
[ 5896.313473] c0 mmc0: it occurs a critical error on eMMC it'll try to recover eMMC to normal state
[ 5896.482283] c0 mif: set_hsic_lpa_states: 304: called(exynos4_check_usb_op+0x84/0x100): 
[ 5896.482469] c0 mif: set hsic lpa enter: active state (0), pda active (0)
[ 5896.665712] c0 mmc0: Fixing MoviNAND SDS bug.
[ 5896.708336] c0 mmc0: Verifying SDS patch.
[ 5896.794283] c0 mmc0: recovering eMMC has been done
[ 5896.794412] c0 brq->sbc.opcode=23,brq->cmd.opcode=18.
[ 5896.794539] c0 brq->sbc.error=-131,brq->cmd.error=0, brq->stop.error=0,brq->data.error=0.
[ 5896.794789] c0 mmcblk0: unknown error -131 sending read/write command, card status 0x900
[ 5896.795869] c0 end_request: I/O error, dev mmcblk0, sector 21590248
[ 5896.796034] c0 end_request: I/O error, dev mmcblk0, sector 21590256
[ 5896.796182] c0 end_request: I/O error, dev mmcblk0, sector 21590264
[ 5896.796379] c0 CMD aborting case in MMC's block layer ret 0.
[ 5896.796512] c0 mmcblk0: CMD18, ARG=0x14970e8.
[ 5896.796613] c0 packed CMD type = 0.
[ 5896.796700] c0 mmc0, request returns 4.
Instead of doing a full wipe of the device's internal flash memory partitions I've tried out another method. I was looking for a program that would rewrite all the data without actually wiping it. Such operation would cause the data to be written again and additionally the wear levelling algorithm inside the eMMC controller would have the chance to relocate the data to another physical block on the flash memory chip.

It turned out that the program I was looking for is already there - just not fully thought of being the right one to use in such case.

When bad blocks encounter on a HDD on your PC you normally need to scan the whole media and mark all the bad blocks found on the filesystem so that they can no longer be allocated for file storage. On Linux this task is performed by the e2fsck system utility which uses another utility called badblocks to do the scan and report the unusable blocks. It is the latter that is interesting.

Among several options found in the badblocks utility there are three methods for discovering such broken blocks:
- read-only where the program will attempt to read all blocks one-by-one.
- non-destructive read-write in which case the program will for each block: read it, fill it with a pattern and write back the original content.
- destructive write where all blocks will be filled by pattern(s) destroying any existing content.

The two last modes are interesting for the purpose of rewriting your device's memory partitions to get rid of bad blocks.

The non-destructive read-write mode can be used to attempt to recover without wiping the contents of the partition. This is mostly relevant to the /data partition (and possibly /efs) as all other ones can be easily restored. Note however that when a read error occurs the data in that sector is already lost. When you attempt to recover such partition it will appear to be working again, but some apps may be crashing randomly as some of their data is corrupted. The non-destructive read-write mode can also be used to prevent freezes from happening. All you need is to treat the /data partition this way once a while (not too often - every 6 months without a factory reset).

The destructive write mode can be used in case you don't care for the data. This is equivalent to the methods used so far (wiping the partition and filling it with random junk). This mode can also be used to securely* wipe your phone before selling it or giving it away.

* Due to wear levelling this is actually not a secure wipe as some of the data will still persist on the Flash memory chip. However it is secure from a user point of view as the data is no longer visible and recovering it will most likely require advanced and expensive procedures.

Fix and prevent

Rewriting your flash partition data will fix freezes due to bad blocks. However when freezes start to happen this usually means that some data is already unreadable and therefore - lost. In order to refresh the data the non-destructive write test in badblocks can be used periodically to rewrite the data. This will result in a small speedup of your phone.

Important: You must be careful not to rewrite your partition too often as each flash memory block has a limited number of erase cycles (around 100,000 for SLC and 10,000 for MLC chips). Doing this too often will actually damage your chip in the long term making your phone unusable. I believe that rewriting your /data partition every 6 months without a factory reset should prevent any issues. You do not need to do this if you periodically factory-reset your phone as in such case the data is rewritten anyway.

Step-by-step instructions

DISCLAIMER: This may be a dangerous operation if not executed properly. I do not accept any responsibility for damage that it may cause. Please do this on your own risk.

Prerequisites

Before you begin there are some requirements that need to be fulfilled:
  1. A recent backup of your phone. Remember - there are three kinds of people in this world - those who make backups, those who have not yet lost their data and those morons who still refuse to make backups despite loosing data.
  2. SDS-proof recovery. This is true for all recent recoveries these days, but better make sure. The recovery must also allow ADB connections. While this is true for all recoveries, some have problems with it. In my case I couldn't connect to CWM 6.0.4.4 as ADB kept complaining that the connection is unauthorized. Flashing latest Philz Touch (6.00.8 in my case) solved the problem.
  3. ADB installed on your PC. Please search forum or internet to find instructions.
  4. badblocks binary for Android recovery. Since this utility is not included in the recovery by default you need to obtain it elsewhere. The one I've built is attached to this post.

Instructions

These assume we're working on the /data partition. For other partitions please check their corresponding device node and modify the instructions accordingly.
  1. Reboot to recovery.
  2. Do a nandroid backup if not done before.
  3. Connect to your phone using the USB cable.
  4. Run adb push badblocks sbin/badblocks. In case you have just downloaded the attached ZIP file unzip it first to extract the badblocks binary.
  5. Run adb shell to enter a shell on your phone.
  6. Run chmod 0755 sbin/badblocks to make the badblocks binary executable.
  7. Make sure your /data is not mounted. The quick way is to run mount. It will list all mounted filesystems. If /data on /dev/block/mmcblk0p12 is among them you'll need to unmount it by running: umount /data. If it complains about the filesystem being busy you'll need to reboot the phone and enter recovery again. After that repeat all the steps.
  8. Once you are sure that /data is not mounted run: badblocks -b 4096 -n -t 0xFF -s /dev/block/mmcblk0p12. This command is where the main magic happens. What it does is it reads every sector, fills it with 0xFF bytes and then writes back the original data. This way the data is unchanged, but every block gets rewritten. After running badblocks you should see some percentage indicator about the progress. The operation on /data should take around an hour. Observe any errors that may occur. In my case there were none, but if the flash is heavily worn out there may be some unrecoverable sectors.
  9. In case there are no errors everything should be done. Optionally you may want to trim the /data filesystem as all blocks have been rewritten and the eMMC conroller will think that there are no unused blocks on the device. You can do this by mounting the /data filesystem (mount /data) and running fstrim (fstrim -v /data).
  10. You're good to go. Type exit in the ADB shell and reboot your phone into the system.

In case you do not care for the data on the partition you may run badblocks in destructive mode. In such case in step 7 replace "-n" with "-w" in badblocks command line: badblocks -b 4096 -w -t 0xFF -s /dev/block/mmcblk0p12

The same procedure can be repeated for other filesystems (/system, /cache). I would not recommend doing this on the partitions containing the kernel, recovery, efs or bootloader unless you're asking for trouble.
Attached Files
File Type: zip badblocks-i9300-recovery.zip - [Click for QR Code] (57.1 KB, 35 views)
The Following 2 Users Say Thank You to KrissN For This Useful Post: [ Click to Expand ]
 
KrissN
Old
#2  
Member - OP
Thanks Meter 373
Posts: 83
Join Date: Aug 2012
Location: Wrocław
Reserved for future use.
 
trein91
Old
#3  
Senior Member
Thanks Meter 72
Posts: 363
Join Date: Jun 2012
This should seriously be stickied. Works better than using the dummy file generator that everyone recommends.
 
JustArchi
Old
#4  
JustArchi's Avatar
Recognized Contributor
Thanks Meter 9090
Posts: 4,591
Join Date: Mar 2013
Location: Warsaw

 
DONATE TO ME
Actually, if you're getting SDS freeze on patched firmware with V2 fix (XXEMB2+) it is SDS fix recovering bad block. It's even stated in your kernel message.

This method is good if you want to get rid of misc freezes and fix all blocks at once. Because badblocks forces kernel to rewrite all blocks instead of only one/affected block (during normal freeze).

May be useful, but it's not a magic trick. If somebody gets freezes and kernel can't fix it itself then magic badblocks binary won't help him, unfortunately.
The Following User Says Thank You to JustArchi For This Useful Post: [ Click to Expand ]
 
KrissN
Old
#5  
Member - OP
Thanks Meter 373
Posts: 83
Join Date: Aug 2012
Location: Wrocław
I wonder if you could use e2fsck to mark some permanently broken blocks as bad blocks on the filesystem so that they won't be used for filesystem data. A similar thing used to be done on HDDs. The problem that I see here is that the FTL may relocate that block somewhere and it may end up being reused somewhere else and keep doing damage.

I'm really curious how does the eMMC controller on the I9300 handle permanently broken blocks. Does it have some reservoir of usable blocks (like in modern SSD disks) that can be used as a replacement?
 
JustArchi
Old
#6  
JustArchi's Avatar
Recognized Contributor
Thanks Meter 9090
Posts: 4,591
Join Date: Mar 2013
Location: Warsaw

 
DONATE TO ME
Quote:
Originally Posted by KrissN View Post
I wonder if you could use e2fsck to mark some permanently broken blocks as bad blocks on the filesystem so that they won't be used for filesystem data. A similar thing used to be done on HDDs. The problem that I see here is that the FTL may relocate that block somewhere and it may end up being reused somewhere else and keep doing damage.

I'm really curious how does the eMMC controller on the I9300 handle permanently broken blocks. Does it have some reservoir of usable blocks (like in modern SSD disks) that can be used as a replacement?
This is actually a very good question, I'm wondering as well. If you're clever enough then you may download my latest ArchiDroid 2.X Experimental ROM with built-in debian and try to discover that with native linux gnu utilities.
 
trein91
Old
#7  
Senior Member
Thanks Meter 72
Posts: 363
Join Date: Jun 2012
Quote:
Originally Posted by JustArchi View Post
This is actually a very good question, I'm wondering as well. If you're clever enough then you may download my latest ArchiDroid 2.X Experimental ROM with built-in debian and try to discover that with native linux gnu utilities.
Desctructive mode? Will that be equivalent to a lets say factory reset? ie losing all user made data.

Tags
badblocks, freeze, sds, wipe
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes


XDA PORTAL POSTS

Control TWRP from within Android with TWRP Coordinator

You may recall that back when TWRP2 introduced a couple of years ago, it brought with … more

Keep Track of Everything Your Device Does with Event Logger

Regardless of their OS choice, computing power users generally share one common … more

A More Competitive Spin on the Addictive 2048 Puzzle

You may recall that a few weeks ago, we talked about a rather interesting take on … more

Multiboot in Progress for the Sony Xperia Z1

As we’ve mentioned quite a few times in the past, multiboot is quite the interesting … more