Discussion thread for /data EMMC lockup/corruption bug

nCoder · May 28, 2012

Found this e-mails of the guys that sign-off the code:

Hyunsung Jang <hs79.jang@samsung.com>
Hyuk Lee <hyuk1.lee@samsung.com>
Kukjin Kim <kgene.kim@samsung.com>

Could you please contact them with the technical details of the problem to see if they at least reply with more technical details, and hopefully a solution?

Cheers

musashiro · May 28, 2012

why don't you try emailing them. tell them the scenarios specially the bricking on Open German release since thats official..

i really want flawless ics on my note.. im a stock+root lover and this bug literally f*cks us who love it that way..

nCoder · May 28, 2012

musashiro said:
why don't you try emailing them. tell them the scenarios specially the bricking on Open German release since thats official..

i really want flawless ics on my note.. im a stock+root lover and this bug literally f*cks us who love it that way..

My problem, is that I'm not able to have a technical conversation with these guys.

If someone familiar with the problem and what Mr. Sumrall says, then would be much more efficient.

Cheers

garwynn · May 28, 2012

Forwarded all that I had to them - requested the f/w patch if possible but at least stop using the call to mmc_erase() or replace with a format writing all zeros to the partition. Will pass on if I get a response.

CyberpodS2 · May 28, 2012

sfhub said:
DrNull did this on the "ODIN stuck on data.img" thread months ago. It just depends how badly the EMMC is broken. DrNull could only find a few MB for /data on his phone. I believe Note has more EMMC to work with, so you have more areas to relocate. I could theoretically build a custom kernel and relocate /data to external SD and workaround the problem also.

Personally I'd rather just get the thing replaced as you basically have a hard drive that crashed and you've lost some big areas of your EMMC. You may or may not be able to map around it, but it will be a different remap for everyone, assuming they have enough space available to make it useful.

Maybe a stupid question but couldn't a utility be created to just perm map out the bad areas like a low level does on a SCSI drive? I just don't know how you would get access to run the thing if it never starts at all.

Sent from my SPH-D710 using xda premium

sfhub · May 28, 2012

CyberpodS2 said:
Maybe a stupid question but couldn't a utility be created to just perm map out the bad areas like a low level does on a SCSI drive? I just don't know how you would get access to run the thing if it never starts at all.

The issue is finding the bad areas. You could create a tool to help, but it would require some manual involvement and might not be the most intuitive because as you try to find the bad area, everytime you access a bad area, it will lockup the phone to the point you must pull the battery.

If one were to create such a tool, you could probably simplify it (and make it semi-usable by the general user) by scanning up until the first lockup, then scanning down until the first lockup. then hopefully you are left with enough usable area to remap.

Personally I would still rather get the unit replaced while it is still relatively new as there are probably limitations to this approach that may not be immediately apparent. For example, you might need to create a custom .pit file for ODIN to work or the damage might not be limited to the sectors that couldn't be accessed and further wear-level activities might bring them to surface, even with fixed kernels.

sfhub · May 28, 2012

sjsharksfan420 said:
Is it not possible to "extract" the emmc firmware from data already available? Or do we need source code for that?

Even with source code for ICS kernel, it likely wouldn't include anything on updating EMMC firmware. We might be able to "guess" how to do it based on similar code, but it is probably a relatively dangerous operation to try and attempt without documentation resulting in bricked phones while testing.

sfhub · May 29, 2012

I went back and traced the CWM recovery code to see how mmc_erase() was being called.

The call chain looks like this (the code is simplified for readability):

recovery.c

Code:

wipe_data() {
   erase_volume(/data)
   erase_volume(/cache)
}

Code:

erase_volume() {
  format_volume()
}

roots.c

Code:

format_volume() {
  make_ext4fs()
}

system/extras/ext4_utils/make_ext4fs.c

Code:

make_ext4fs() {
  make_ext4fs_internal(wipe=TRUE)
}

Code:

make_ext4fs_internal(wipe=TRUE) {
  write_ext4_image(wipe=TRUE)
}

Code:

write_ext4_image(wipe=TRUE) {
  open_output_file(wipe=TRUE)
}

system/extras/ext4_utils/output_file.c

Code:

open_output_file(wipe=TRUE) {
  if (wipe)
    wipe_block_device(out->fd, info.len);
}

system/extras/ext4_utils/wipe.c

Code:

wipe_block_device() {
  u64 range[2];

  range[0] = 0;
  range[1] = len;

  ioctl(fd, BLKSECDISCARD, &range);
  ioctl(fd, BLKDISCARD, &range);
}

I haven't compiled CWM recovery on ICS so this is just based on examining the CWM code people made available for their GB compiles so it is possible I haven't interpreted things correctly for the ICS-based CWM recovery.

So basically in GB there was no support for automatic "wipe" functionality using mmc_erase() (in kernel mmc driver) via ioctl() (in userspace). This functionality was added in the libext4_utils.a library for ICS. The function make_ext4fs() was modified to unconditionally always enables "wipe" whenever it is called. This change was added on 26-Jan-2011 by Colin Cross [ccross@android.com]

[Diff1 - make_ext4fs.c]
[Diff2 - output_file.c]
[Initial checkin - wipe.c]

Now I am only looking at the GB-based CWM code as I couldn't find any ICS-based CWM checked in for E4GT, so I might have this part wrong, but I'm guessing when people ported CWM to ICS, they linked against the new libext4_utils.a library and therefore called the new make_ext4fs() which unconditionally always "wipes" when called. This eventually results in the ioctl() which triggers the mmc_erase() in the mmc driver in the kernel, which in turn triggers the EMMC firmware lockup/superbrick bug.

Given that, my guess is that Samsung will likely make their patch in libext4_utils.a (and libext4_utils.so, which is not relevant for us since recovery is statically linked, it would only be relevant for Android utilities)

That means the change likely will NOT be in the kernel proper, but rather in the libraries CWM is linked against.

Since CWM is statically linked (at least the copy I saw checked in was) then that means even if Samsung patches libext4_utils.a, the CWM binaries we have now will not get that change.

They will need to be recompiled against the new libext4_utils.a that will be available when Samsung releases the source code.

Also there is actually no need to wait for the Samsung source code. If the people who compile CWM simply NOOP the "wipe" code in their current source tree, their recoveries should then be "safe". This can be done in

/system/extras/ext4_utils/wipe.c

by replacing wipe_block_device() with

Code:

int wipe_block_device(int fd, s64 len)
{
  return 0;
}

The above code would be the simplest change to make recoveries "safe" again.

Now it would be better to replace it with code that writes zeros to the area (which you should be able to do with write() to the file descriptor coupled with some zero buffer and a loop)

Feel free to comment if I've made some mistake in my analysis.

sniper · May 29, 2012

sfhub said:
So basically in GB there was no support for "wipe" functionality using mmc_erase() (in kernel mmc driver) via ioctl() (in userspace). This was added in the libext4_utils.a library for ICS. The function make_ext4fs() unconditionally always enables "wipe" whenever it is called. This change was added on 26-Jan-2011 by Colin Cross [ccross@android.com]

So send all of our hate mail to Colin Cross?

Sfhub, thank you again for all your dedicated hard work; you never cease to amaze us. I read through that and I think I understood most of it

I'm glad we're(you guys, not really me, haha) still making progress with this.

Sent from my SPH-D710 using Tapatalk 2

sfhub · May 29, 2012

Does anybody have the source code for update-binary that is included in the update.zips?

I want to see how that is built. If it is linked against libext4_utils.a there is a possibility it will have the same behavior as recovery.

I stress, this is not fleshed out by looking at the source, since I can't find it right now, so is pure conjecture, but this *MIGHT* explain why certain ROMs tend to trigger the EMMC lockup/superbrick bug. If their update.zip was packaged using an update-binary that was linked against libext4_utils.a from an ICS environment with the "wipe" change, then the install using that update.zip might trigger the bug.

If instead, folks on other ROMs bundled with GB-based update-binary linked against GB libext4_utils.a, then even though they are installing ICS-based ROMs, the calls by the updater would be GB-based and possibly not susceptible to the wipe problem.

Again this is even more conjecture, but theoretically if people took their GB CWM recovery that was linked against GB libext4_utils.a and repacked with ICS kernels, it would probably be safe. It is only the CWM that is linked against ICS libext4_utils.a that appears dangerous because it has the "wipe" functionality.

I don't build these recoveries myself. Can someone tell me what they did to produce CWM for an ICS environment? Did they just take their GB CWM binary (compiled against GB code base) and repack with ICS kernel, or did they recompile CWM within an ICS environment using the generic ICS code, then repack that with ICS kernel?

biliskner · May 29, 2012

sfhub said:
I went back and traced the CWM recovery code to see how mmc_erase() was being called.

The call chain looks like this:

Wow. Great explanation. Thanks!

I agree that Sam will just change the libs rather than release a FW rev for their emmc. But one can always hope.......

sfhub · May 29, 2012

biliskner said:
Wow. Great explanation. Thanks!

I agree that Sam will just change the libs rather than release a FW rev for their emmc. But one can always hope.......

To clarify, the lockup/superbrick bug is ultimately in the EMMC firmware. This is hardest to change and I doubt we will see a newer EMMC firmware for our EMMC chips.

Given that, there are 2 likely places they will "workaround" the issue:

1) in the kernel mmc driver
2) in libext4_utils.a which is linked into utilities like recovery

I don't think they will put the workaround in the kernel mmc driver. The GB kernel mmc driver called mmc_erase() also, it just wasn't invoked by wipe data/factory reset so the EMMC firmware bug never got triggered. If by some chance, they put the workaround in the mmc driver, then all recoveries would be rendered "safe" when packed with the kernel with the workaround in the mmc driver. This would be the case even if the Recovery binaries were not recompiled.

If they put the workaround in libext4_utils.a (which would essentially be backing out the ICS change) that would also workaround the problem, but only for Recoveries that are relinked/compiled against the libext4_utils.a that has the workaround. It is my opinion Samsung will put the workaround in here.

We'll know for sure once the source is released. I just got tired of waiting so wanted to look into the issue in more detail and consider the different places the workaround could show up.

By looking into the issue I also realized that if Samsung is "working around" the problem in the userspace libraries (#2 above), then there is no reason people who are building CWM/custom recoveries to wait for Samsung. They can make a change in their own source tree right now to implement a workaround.

I hope my analysis of the issue is accurate. If someone feels I'm reading this wrong, please feel free to add your comments.

prabhu1980 · May 29, 2012

Dear Super Users ;

Case 1 : Reproducing the emmc bug ....
An improperly downloaded firmware uploaded without proper md5 check
when flashed will cause this emmc brick.
(Ex : I downloaded ICS Leaked version of Midnote 1.3 from Hotfile and the file size was supposed to be 800+ whereas the flashed file was 676 MB and I didnt notice it and bricked my Note ...)

This may probably explain why very few people get bugged by this bug....

Stuck at FactoryFS.img and could not be retrieved by using any method ...
Some Get stuck at Data.img .....

Case 2: Now EMMC is locking up the damaged areas (probably soft damaged due to corrupt file copied)

Why cant we dump the emmc firmware and decode it ??
Step 1: Dump EMMC firmware using dd command
Step 2: Decode using a HEX / ARM Decode Tool and disable Lockup ... (Is it possible ?)
Step 3: Reflash the mod EMMC firmware
Step 4: Run fsck to solve the issue.

Please reply without mixing both the Cases ...

Case a) Happened to me ...
Case b) My Theory for debugging

Sorry I am not a developer .... So I cant handle Case 2....
I am just throwing some ideas ....

sfhub · May 29, 2012

There is no interface readily available to us to retrieve the EMMC firmware as far as we know and we don't have the documentation for the interfaces.

RainMotorsports · May 29, 2012

prabhu1980 said:
Case 2: Now EMMC is locking up the damaged areas (probably soft damaged due to corrupt file copied)

Why cant we dump the emmc firmware and decode it ??
Step 1: Dump EMMC firmware using dd command

The eMMC firmware can only be read or written via low level commands to the eMMC controller. It is not presented as a writable storage to the device at any level. Not having access to the documentation eMMC partners pay for and then sign a non disclosure agreement for makes a good challenge.

prabhu1980 · May 29, 2012

Dear Electronic Gizmos and Most Respectable Developers ,

I have downloaded the documentation and it describes every bit to bit of eMMC.
May be now you can help us ....

It describes Bad Block Management ....
It describes Power Cycling .....
It describes How the factory settings are made ....

I know you guys will now open up your partitions and change the Emmc Bits and help us ... bUT WHERE IS THE PARTITION .....
n900 EMMC DOCUMENTATION MAY HELP ..... sOURCE : google

But I do also know that this information may be already available with you.
But do ask what u would like to have....

Looking for Samsung Insider .....

The eMMC firmware can only be read or written via low level commands to the eMMC controller. It is not presented as a writable storage to the device at any level. Not having access to the documentation eMMC partners pay for and then sign a non disclosure agreement for makes a good challenge.

- tHANKS rAINmOTORsPORTS for bringing this up ....

There is no interface readily available to us to retrieve the EMMC firmware as far as we know and we don't have the documentation for the interfaces.

Yeah when i tried accessing the partition containing firmware, the device pushes me out of the program trying to access it even with root privileges...
But there should be a workaround in the recovery mode ..... I read emmc firmware flash without any complicated tools for N900.

RainMotorsports · May 29, 2012

That is literally just a presentation with no useful information. The kind of information you are looking for is the kind that Samsung sends Cease and Desist letters prior to pending lawsuits on the source over.

There is no partition as you are thinking. This is stored internally on a low level and its NOT meant to be updated by end users or OTA and even the concept of letting it out to service technicians is beyond the intended scope. It was only intended to be written at manufacture time, maybe in factory service or refurbishment. But even Carrier level techs generally do not have any access to this information or related tools.

The raw level of read/write access normally presented to the device, os kernel is a higher level in the storage hierarchy. Access to the lower levels, sensitive areas can only be done, depending on the operation either by things hard coded in the silicon (very low level operations) or in the coded in the firmware (slightly higher level). The PDF you are looking for is going to be 200+ pages usually.

I could be wrong but this is not really any different than what I have had experience with.

prabhu1980 · May 29, 2012

Not the presentation Boss !!
The JEDEC standard for EMMC .... See the second attachment .. (I uploaded later and u replied so quick ....)

There is no partition as you are thinking. This is stored internally on a low level and its NOT meant to be updated by end users or OTA and even the concept of letting it out to service technicians is beyond the intended scope.

The raw level of read/write access normally presented to the device, os kernel is a higher level in the storage hierarchy. Access to the lower levels, sensitive areas can only be done, depending on the operation either by things hard coded in the silicon (very low level operations) or in the coded in the firmware (slightly higher level). The PDF you are looking for is going to be 200+ pages usually.

I could be wrong but this is not really any different than what I have had experience with.

I hope you are terribly wrong and pray that u should be wrong ...
Hard Coded to Silicon .... Too Old Technology Boss .....
Everything should be revisable ...
Please read the N900 Emmc flashing of firmware in my post (Added Later ....)

SFHUB - Request you to read this post ....

RainMotorsports · May 29, 2012

prabhu1980 said:
Not the presentation Boss !!
The JEDEC standard for EMMC .... See the second attachment .. (I uploaded later and u replied so quick ....)

Oh wow that might be what we want. At first you had the wrong file.

I know part of the documentation is public, but I am not familiar with what is and is not public. This looks very good though.

prabhu1980 said:
I hope you are terribly wrong and pray that u should be wrong ...
Hard Coded to Silicon .... Too Old Technology Boss .....
Everything should be revisable ...

You still have to have some basic operations hard coded so that a blank manufactured chip can be initially written to. Since we are trying to read and write the revisable portion of the chips operating instructions these are likely those hard coded/built in commands.

Private Vendor Specific Address Space: the area of the e•MMC device that cannot be accessed by a read command from the host software. It contains vendor specific internal management data. This data can be either loaded at manufacturing or generated during device operation e.g. Memory Vendor Firmware and mapping tables. It does not contain any data (or portion of data) that was sent from the host to the device.

The glossary defines one of the areas we are discussing, however looking through the file seems to skip all but anything we need. There is data in this area that needs to be read and backed up.

prabhu1980 · May 29, 2012

SFHUB - Waiting for you to feed my thoughts .... Read this Post

Discussion thread for /data EMMC lockup/corruption bug

Senior Member

Senior Member

Senior Member

Retired Forum Mod / Inactive Recognized Developer

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Similar threads

Top Liked Posts