[HOWTO] Fix and prevent severe system freezes without wiping the device
Not so long ago I started experiencing severe freezes and lockups on my I9300. The symptoms were well known as pre-SDS lockups. Before SDS workarounds have been released such symptoms would mean a death sentence for the device. While this is not the case any more the freezes still make the phone virtually unusable.
So far the only way to bring the phone back to usability was a full wipe of the device combined with filling the partitions with some data in order to ensure that every sector has been written to. Some have also succeeded to get rid of such behaviour after flashing stock firmware (after having been running AOSP).
Some technical background
The reason for the freezes (at least on my device) were block read errors. This is roughly equivalent to a situation where you HDD starts having bad sectors, however flash memory is a bit different to a magnetic disk drive. The read errors encountered most likely due to data corruption resulting in CRC errors during read. In my case the dmesg (kernel log) showed the following errors:
[ 5896.313473] c0 mmc0: it occurs a critical error on eMMC it'll try to recover eMMC to normal state
[ 5896.482283] c0 mif: set_hsic_lpa_states: 304: called(exynos4_check_usb_op+0x84/0x100):
[ 5896.482469] c0 mif: set hsic lpa enter: active state (0), pda active (0)
[ 5896.665712] c0 mmc0: Fixing MoviNAND SDS bug.
[ 5896.708336] c0 mmc0: Verifying SDS patch.
[ 5896.794283] c0 mmc0: recovering eMMC has been done
[ 5896.794412] c0 brq->sbc.opcode=23,brq->cmd.opcode=18.
[ 5896.794539] c0 brq->sbc.error=-131,brq->cmd.error=0, brq->stop.error=0,brq->data.error=0.
[ 5896.794789] c0 mmcblk0: unknown error -131 sending read/write command, card status 0x900
[ 5896.795869] c0 end_request: I/O error, dev mmcblk0, sector 21590248
[ 5896.796034] c0 end_request: I/O error, dev mmcblk0, sector 21590256
[ 5896.796182] c0 end_request: I/O error, dev mmcblk0, sector 21590264
[ 5896.796379] c0 CMD aborting case in MMC's block layer ret 0.
[ 5896.796512] c0 mmcblk0: CMD18, ARG=0x14970e8.
[ 5896.796613] c0 packed CMD type = 0.
[ 5896.796700] c0 mmc0, request returns 4.
Instead of doing a full wipe of the device's internal flash memory partitions I've tried out another method. I was looking for a program that would rewrite all the data without actually wiping it. Such operation would cause the data to be written again and additionally the wear levelling algorithm inside the eMMC controller would have the chance to relocate the data to another physical block on the flash memory chip.
It turned out that the program I was looking for is already there - just not fully thought of being the right one to use in such case.
When bad blocks encounter on a HDD on your PC you normally need to scan the whole media and mark all the bad blocks found on the filesystem so that they can no longer be allocated for file storage. On Linux this task is performed by the e2fsck
system utility which uses another utility called badblocks
to do the scan and report the unusable blocks. It is the latter that is interesting.
Among several options found in the badblocks
utility there are three methods for discovering such broken blocks:
- read-only where the program will attempt to read all blocks one-by-one.
- non-destructive read-write in which case the program will for each block: read it, fill it with a pattern and write back the original content.
- destructive write where all blocks will be filled by pattern(s) destroying any existing content.
The two last modes are interesting for the purpose of rewriting your device's memory partitions to get rid of bad blocks.
The non-destructive read-write mode can be used to attempt to recover without wiping the contents of the partition. This is mostly relevant to the /data partition (and possibly /efs) as all other ones can be easily restored. Note however that when a read error occurs the data in that sector is already lost. When you attempt to recover such partition it will appear to be working again, but some apps may be crashing randomly as some of their data is corrupted.
The non-destructive read-write mode can also be used to prevent freezes from happening. All you need is to treat the /data partition this way once a while (not too often - every 6 months without a factory reset).
The destructive write mode can be used in case you don't care for the data. This is equivalent to the methods used so far (wiping the partition and filling it with random junk). This mode can also be used to securely* wipe your phone before selling it or giving it away.
* Due to wear levelling this is actually not a secure wipe as some of the data will still persist on the Flash memory chip. However it is secure from a user point of view as the data is no longer visible and recovering it will most likely require advanced and expensive procedures.
Fix and prevent
Rewriting your flash partition data will fix freezes due to bad blocks. However when freezes start to happen this usually means that some data is already unreadable and therefore - lost. In order to refresh the data the non-destructive write test in badblocks
can be used periodically to rewrite the data. This will result in a small speedup of your phone.
You must be careful not to rewrite your partition too often as each flash memory block has a limited number of erase cycles (around 100,000 for SLC and 10,000 for MLC chips). Doing this too often will actually damage your chip in the long term making your phone unusable
. I believe that rewriting your /data partition every 6 months without a factory reset should prevent any issues. You do not need to do this if you periodically factory-reset your phone as in such case the data is rewritten anyway.
DISCLAIMER: This may be a dangerous operation if not executed properly. I do not accept any responsibility for damage that it may cause. Please do this on your own risk.
Before you begin there are some requirements that need to be fulfilled:
- A recent backup of your phone. Remember - there are three kinds of people in this world - those who make backups, those who have not yet lost their data and those morons who still refuse to make backups despite loosing data.
- SDS-proof recovery. This is true for all recent recoveries these days, but better make sure. The recovery must also allow ADB connections. While this is true for all recoveries, some have problems with it. In my case I couldn't connect to CWM 184.108.40.206 as ADB kept complaining that the connection is unauthorized. Flashing latest Philz Touch (6.00.8 in my case) solved the problem.
- ADB installed on your PC. Please search forum or internet to find instructions.
- badblocks binary for Android recovery. Since this utility is not included in the recovery by default you need to obtain it elsewhere. The one I've built is attached to this post.
These assume we're working on the /data partition. For other partitions please check their corresponding device node and modify the instructions accordingly.
- Reboot to recovery.
- Do a nandroid backup if not done before.
- Connect to your phone using the USB cable.
- Run adb push badblocks sbin/badblocks. In case you have just downloaded the attached ZIP file unzip it first to extract the badblocks binary.
- Run adb shell to enter a shell on your phone.
- Run chmod 0755 sbin/badblocks to make the badblocks binary executable.
- Make sure your /data is not mounted. The quick way is to run mount. It will list all mounted filesystems. If /data on /dev/block/mmcblk0p12 is among them you'll need to unmount it by running: umount /data. If it complains about the filesystem being busy you'll need to reboot the phone and enter recovery again. After that repeat all the steps.
- Once you are sure that /data is not mounted run: badblocks -b 4096 -n -t 0xFF -s /dev/block/mmcblk0p12. This command is where the main magic happens. What it does is it reads every sector, fills it with 0xFF bytes and then writes back the original data. This way the data is unchanged, but every block gets rewritten. After running badblocks you should see some percentage indicator about the progress. The operation on /data should take around an hour. Observe any errors that may occur. In my case there were none, but if the flash is heavily worn out there may be some unrecoverable sectors.
- In case there are no errors everything should be done. Optionally you may want to trim the /data filesystem as all blocks have been rewritten and the eMMC conroller will think that there are no unused blocks on the device. You can do this by mounting the /data filesystem (mount /data) and running fstrim (fstrim -v /data).
- You're good to go. Type exit in the ADB shell and reboot your phone into the system.
In case you do not care for the data on the partition you may run badblocks
in destructive mode. In such case in step 7 replace "-n" with "-w" in badblocks command line: badblocks -b 4096 -w -t 0xFF -s /dev/block/mmcblk0p12
The same procedure can be repeated for other filesystems (/system, /cache). I would not recommend doing this on the partitions containing the kernel, recovery, efs or bootloader unless you're asking for trouble.