Samsung working on Emmc brick fix

Search This thread

Mario1968

Senior Member
Jan 1, 2012
405
329
Roma
Sure: https://bitbucket.org/franciscofranco/android-tuna-omap/changeset/cea631bdac53/raw/

The value 0xD20228FF looks suspiciously like

cmp r0,#0xff
bcs there
...
...
...
there:
...

The other value 0x000000FF looks suspiciously related to that interpretation, too.

There also seems to be code to dump firmware, which is necessary to analyse where to hook, to learn how to do IO and raw array access inside the chip, and also to make sense of the existing data structures. We could also compare vulnerable firmware to fixed firmware, highlighting which portions are relevant for the damage mechanism.

Hi Paul, what do you think about this fix/code? Is enough to prevent the eMMC brick? And what about the bricked devices, can this code used to revive those devices?
Sorry for the noob questions, but I prefer to know the expert thinking about this fix.
Thank you
Mario

Inviato dal mio Galaxy S2 con Tapatalk2®
XWLPF Stock
Siyah 4.1.5
Jkay V14.1
 

aletheus

Senior Member
Oct 13, 2011
586
280
+1
you guys have a few readers trying to wrap their heads around this problem. superbricked my p6800 a couple of weeks ago.... it was like loosing a pet.....

definitely interested in understanding this problem here.
 

PaulF8080

Senior Member
Apr 10, 2012
116
15
Thank you for the clarification - I've made it more than clear that I'm the amateur in the group. You are certainly right to criticize those remarks and I was wrong to post that, no matter what your opinion or background was. (Actually in this case it embarrasses me far more than you for clearly showing ignorance and a lack of knowledge on the subject)

I will revise that statement with an apology immediately.

As for further comments from me I think I'm better off holding any further comments/questions for now while the experts in the room hash out the more technical aspects. I've made myself look stupid enough for the day, no?
Please do not be be embarrassed. I always tell people HW debug is a very weird job. You are wrong 90% of time. Debuggers always have their own guesses. The rules are: there are no stupid guesses and listen to everyone. I was trying to get off the FW is borked track.

Sent from my SAMSUNG-SGH-I717 using xda app-developers app
 
  • Like
Reactions: djorke

Entropy512

Senior Recognized Developer
Aug 31, 2007
14,088
25,086
Owego, NY
Paul,

JEDEC standards specify the requirement for an embedded controller as part of the eMMC package, thus why a standardized command set. (Edit: At least as far as I have read and understood it - I can certainly be wrong here as proven already today)
(Sentence following this removed as it was offensive and did not take into account Paul's background and experience, which is far, FAR more than mine in these matters. My apologies for posting it in the first place and hope this will not detract from the discussion any further. )


---------- Post added at 01:32 PM ---------- Previous post was at 01:21 PM ----------

Refresher: Back and Forth with Mr. Sumrall (5/22/12)
Bold is me, non-bold is Mr. Sumrall's response:

A couple of things if I re-read this:
1) I suspect this is what jetmarc and Entropy are suggesting can only be done once (maybe CMD26 is a part of this?). But if so how did they develop a patch for it in the first place to update to revision 0x25? And if it was that unstable did they burn through a lot of chips or GNex phones just to fix this for Google to continue their kernel development? That seems awfully... cost insensitive.

2) Don't know if what Mr. Sumrall is mentioning is the same thing that Entropy was given access to. It's never been definitively confirmed nor can I assume it will be.
I wasn't given access to anything. The original hope was that Samsung and I would be providing this information when we met - but the guy working with the fix (He was at the meeting) was experiencing a very high failure rate. I think back in Korea he had a fairly large pile of dead chips. (They had an I9100 and an N7000 with socketed eMMC. It was pretty cool tbh.)

The problem is we'll likely never know exactly what Google had access to and if it even was the same as what the storage/memory guys gave to the engineer that was at the meeting with myself and my primary contact. (There were a total of three Samsung people there.) This stuff is all considered highly proprietary by the storage guys and there's also clearly a lot of issues with internal coordination/communication at Samsung (even within the same division, let alone cross-division.)

The eMMC standard allows vendors to implement proprietary extended commands, the update/repair process likely used one of these. (I'd link to the relevant kernel patch but I can't find it at the moment... drivers/mmc has a LONG changelog...)

It was my thinking at the time that this was being patched into RAM outside of the eMMC (since I thought it didn't have any usable RAM, just the controller and storage) - but if I put it into context as jetmarc suggested then I can see that his analysis makes more sense to me.

Question though: Would you have to re-patch the eMMC firmware again upon wake if indeed the patch took place in the SRAM (perhaps this is like a cache?) for the eMMC?
The internal controller within the eMMC (which translates MMC commands into whatever the underlying memory technology understands) appears, based on all of these patches, to have SRAM. It seems like it shadows its firmware image to SRAM on powerup. You won't see anything about this in the JEDEC spec because this is all internal manufacturer-specific implementation stuff.
Hi Paul, what do you think about this fix/code? Is enough to prevent the eMMC brick? And what about the bricked devices, can this code used to revive those devices?
Sorry for the noob questions, but I prefer to know the expert thinking about this fix.
Thank you
Mario

Inviato dal mio Galaxy S2 con Tapatalk2®
XWLPF Stock
Siyah 4.1.5
Jkay V14.1
That code will do nothing for the eMMC bug. Here's a summary of what we know from Ken for the VYL00M/MAG4FA:
Fwrev 0x19 has two bugs - one that can cause the eMMC's internal data structures to get severely corrupted when a secure erase is issued (Superbrick), and a second one where the wear leveller will occasionally insert 32kb of zeros that shouldn't be there.
Fwrev 0x25 has one known bug - the 32kb of zeroes one (it's immune to the secure erase issue). The patch that is changing the firmware in SRAM is issuing a fix for this bug. It won't fix the secure erase bug and in fact cannot be applied to fwrev 0x19.

Please do not be be embarrassed. I always tell people HW debug is a very weird job. You are wrong 90% of time. Debuggers always have their own guesses. The rules are: there are no stupid guesses and listen to everyone. I was trying to get off the FW is borked track.

Sent from my SAMSUNG-SGH-I717 using xda app-developers app
The problem is, we know for a FACT that it is the firmware that is borked.

Seriously, you, as an I717 owner, could fire secure erase commands at your eMMC all day long without ever worrying, because all I717 units that shipped contained updated eMMC firmware that is immune to this bug.
 

garwynn

Retired Forum Mod / Inactive Recognized Developer
Jul 30, 2011
5,179
8,589
NE Ohio
www.extra-life.org
That code will do nothing for the eMMC bug. Here's a summary of what we know from Ken for the VYL00M/MAG4FA:

Fwrev 0x19 has two bugs - one that can cause the eMMC's internal data structures to get severely corrupted when a secure erase is issued (Superbrick), and a second one where the wear leveller will occasionally insert 32kb of zeros that shouldn't be there.
Fwrev 0x25 has one known bug - the 32kb of zeroes one (it's immune to the secure erase issue). The patch that is changing the firmware in SRAM is issuing a fix for this bug. It won't fix the secure erase bug and in fact cannot be applied to fwrev 0x19.

I want to bring this up again because I'm noticing something that might be explained away by this.
Lately on flashes people are reporting random hangs - and it's not quite certain what the cause is. But if I think about the possibility for corruption by the above WL bug... a corrupted apk could definitely cause it to crash.

So if people are noticing odd behavior with an application it may be worth reinstalling just to make sure this bug isn't to blame. Just a thought for consideration.....
 

alvinasnow

Senior Member
Oct 23, 2011
85
22
Islamabad
Well.... looks like all this 2-3 months wait is gonna go astray. I am losing hope to find any solution for my superbricked eMMC. The best thing is that now, people have stopped giving us hope too :(
 
B

baz77

Guest
I know that regular user cannot replace it in home, but i worked in cellphones lab for 8 years.
anyway i purchased the emmc chip over the net (around 70euro), and flash it with my riffbox.
Now is work like a charm, all the 16GB just like new. :):):):):):):):):):):):):):):):):)

Thanks anyway to all the forum about the info...

so cool, maybe walkthrought photos?

Sent from my GT-N7000 using Tapatalk 2
 

friedje

Senior Member
Oct 29, 2006
614
257
Chartres
I want to bring this up again because I'm noticing something that might be explained away by this.
Lately on flashes people are reporting random hangs - and it's not quite certain what the cause is. But if I think about the possibility for corruption by the above WL bug... a corrupted apk could definitely cause it to crash.

So if people are noticing odd behavior with an application it may be worth reinstalling just to make sure this bug isn't to blame. Just a thought for consideration.....

There has been a bug in mobile odin and the latest roms. It caused the /cache to corrupt and I think most flash-hangs reported lately came from this problem. Chainfire updated MO since..

---------- Post added at 08:32 PM ---------- Previous post was at 08:23 PM ----------

so cool, maybe walkthrought photos?

Sent from my GT-N7000 using Tapatalk 2

Somehow, i don't think so....
But then again I might be paranoid lol :)
But someone asking on a sunday where to buy the chip and if it is maybe a sandisk.. and 2 days later not only he ordered and received the chip, but also replaced and reflashed it... I have a hard time believing it in such a short timespan.

However if it's done, that would be good news for the community. If there is enough space then it could be an option to get the chip desoldered and then resolder a socket for the replacement chip.
 
Last edited:
A

AndroidSlave

Guest
I had the mobile Odin hang with lrq... was fine with PC Odin

Sent from my GT-N7000 using Tapatalk 2
 

PaulF8080

Senior Member
Apr 10, 2012
116
15
The internal controller within the eMMC (which translates MMC commands into whatever the underlying memory technology understands) appears, based on all of these patches, to have SRAM. It seems like it shadows its firmware image to SRAM on powerup. You won't see anything about this in the JEDEC spec because this is all internal manufacturer-specific implementation stuff.

That code will do nothing for the eMMC bug. Here's a summary of what we know from Ken for the VYL00M/MAG4FA:
Fwrev 0x19 has two bugs - one that can cause the eMMC's internal data structures to get severely corrupted when a secure erase is issued (Superbrick), and a second one where the wear leveller will occasionally insert 32kb of zeros that shouldn't be there.
Fwrev 0x25 has one known bug - the 32kb of zeroes one (it's immune to the secure erase issue). The patch that is changing the firmware in SRAM is issuing a fix for this bug. It won't fix the secure erase bug and in fact cannot be applied to fwrev 0x19.


The problem is, we know for a FACT that it is the firmware that is borked
Are you insisting there is a microprocessor on the chip that executes FW even though the implementation is unknown.to you? I think we have language problem. I view a controller as logic that controls some external signals. An embedded controller does the same except the signals are internal. You can buy micro controllers(not embedded controllers) that control slow signals and they just microprocessors with a lot of I/O pins. Not all controllers have are cpus. There is exynos/arm FW on the eMMC. Surely you are not saying there is a cpu on the eMMC chip that executes FW arm code. Here is where the language comes in. Sometimes there is code implemented in the silicon, but that is call microcode not FW in my world. Even microcode is too slow for high speed signals.

I am a retired engineer who did debug for a very long time and I am sorry, I can't seem to stop. :) Please note: I always use "guess" instead of "fact"(an old debugger habit). I am glad my i717 is safe, thanks.
 
  • Like
Reactions: Mario1968
B

baz77

Guest
Somehow, i don't think so....
But then again I might be paranoid lol :)
But someone asking on a sunday where to buy the chip and if it is maybe a sandisk.. and 2 days later not only he ordered and received the chip, but also replaced and reflashed it... I have a hard time believing it in such a short timespan.

However if it's done, that would be good news for the community. If there is enough space then it could be an option to get the chip desoldered and then resolder a socket for the replacement chip.

I am pretty gullible I guess

Sent from my GT-N7000 using Tapatalk 2
 

PaulF8080

Senior Member
Apr 10, 2012
116
15
Somehow, i don't think so....
But then again I might be paranoid lol :)
But someone asking on a sunday where to buy the chip and if it is maybe a sandisk.. and 2 days later not only he ordered and received the chip, but also replaced and reflashed it... I have a hard time believing it in such a short timespan.

However if it's done, that would be good news for the community. If there is enough space then it could be an option to get the chip desoldered and then resolder a socket for the replacement chip.
It's the Sanddisk guy? That seals it. Good catch! I pointed out to him that there no sector marks on a ram disk. Being paranoid explains a reply to one of my posts. You have no way of knowing who people are here, I guess.
 

friedje

Senior Member
Oct 29, 2006
614
257
Chartres
Are you insisting there is a microprocessor on the chip that executes FW even though the implementation is unknown.to you? I think we have language problem. I view a controller as logic that controls some external signals. An embedded controller does the same except the signals are internal. You can buy micro controllers(not embedded controllers) that control slow signals and they just microprocessors with a lot of I/O pins. Not all controllers have are cpus. There is exynos/arm FW on the eMMC. Surely you are not saying there is a cpu on the eMMC chip that executes FW arm code. Here is where the language comes in. Sometimes there is code implemented in the silicon, but that is call microcode not FW in my world. Even microcode is too slow for high speed signals.

I am a retired engineer who did debug for a very long time and I am sorry, I can't seem to stop. :) Please note: I always use "guess" instead of "fact"(an old debugger habit). I am glad my i717 is safe, thanks.

Although this is off topic, Firmware = Microcode.

So I don't see the point in spending 23453455445643 posts on insisting that
we DON'T call it firmware but microcode. by definition firmware IS microcode that bridges between actual software calls and the underlying hardware.

Now it doesn't matter if this is inside a CPU, a GPU, an Embedded controller or even an Electronic Toiletflusher.
 
  • Like
Reactions: Entropy512

jetmarc

Member
Jun 6, 2012
15
26
Are you insisting there is a microprocessor on the chip that executes FW even though the implementation is unknown.to you?

Although you didn't direct this to me, I'll reply.

The implementation is indeed unknown to us, so we are all guessing. You said your guess is that it's implemented as random logic, just like you did for a QPI interface implementation in the past. I don't agree with this guess, but likewise you don't agree with mine. There wouldn't be any problem if it weren't for the several pages of flame posts that follow. Those make it difficult for others to find real information here.

Therefore I'll bite and give reasons why I don't agree with your guess. Please defend your guess on a technical level, not a personal one, so that the thread stays on topic. I certainly don't want to attack you (or anyone else). If it may appear so, then it is not intentional. English is not my native language and this can cause all kinds of misunderstandings.

1. All (useful!) standards are made to solve realworld problems, with realworld tools. Often a reference implementation is made (public or not) and the standard is drafted with data observed with the reference implementation. QPI is a highspeed bus with low latency. Response time is defined in clock cycles. There is no way to implement QPI in software anytime soon (except when you don't want to interface to real hardware on the other side, like a simulator for example). Therefore QPI is clearly meant to be implemented in hardware, just like you did.

The EMMC spec is different. It does mandate timings, but only those of the IO interface itself are specified in clock cycles. All other timings are given in ms and some even in seconds. Those who wrote the spec didn't have a big and fast Verilog FSM in mind. The mandated features are easy to implement in software and still meet the timing spec. All you need is a peripherial that implements the IO interface. This is very much like the situation on USB, where an IO peripherial implements the "data pump", and all the rest is usually done in software.

Along the same lines, there is another hint apart from the timing: feature diversity. Since QPI is clearly meant to be implemented in hardware, it won't request extra features that can't be implemented well in hardware. EMMC on the other hand, does require lots of stupid extra stuff that vastly complicates a pure logic implementation, for almost no benefit.

For example, the EMMC spec includes a security feature, that is probably not even used in many devices with EMMC. Yet this feature makes it necessary to implement an HMAC with SHA256. Adding those in firmware is easy, it takes just a few extra kb of code. However, a direct hardware implementation suffers a lot from this (mostly unnecessary) burden. It will take lots of extra silicon area (and IP licences) for a feature that is rarely used.

For these and similar reasons I believe that EMMC is meant to be implemented in software, not hardware. That doesn't make it impossible to do it in hardware alone, but it certainly won't be cost effective. The utilization of the various (mandatory) circuits will be too low.

2. Samsung clearly says that their EMMC devices use "firmware", even giving their firmware names and version numbers. For example "VYL00M" 0x25.

3. The official android kernel patch mentioned earlier, contains comments which reveals information. It sais "We can patch the firmware in such chips", "the wear leveling code can insert 32 bytes", "Patch the firmware in certain Samsung emmc chips", "fix bug in wear leveling firmware", "patch the firmware in the internal sram", etc.

Clearly there is firmware (code) running in the device.

4. We know two 32bit words of a valid firmware image, which are contained in the same android kernel patch. One of them decodes to two valid thumb instructions. Coincidence?

5. Also from the patch, we learn something about the internal sram size. The patch accesses 0x4dd9c and 0x379a4. Therefore the SRAM is certainly at least 0x163f8 bytes big. Applying the typical rules about how memory is implemented, it is probably at least 0x80000 bytes big. Whoever put that much sram into the device, counted with quite a complex firmware.

6. Considering that there is firmware and it runs from sram, the only way to maintain your guess of a pure hardware implementation, is to use programmable hardware. CPLDs are not complex enough to require such big firmware images, leaving only FPGAs. It's certainly possible to patch FPGA bitstreams, and indeed they "execute from" (or better: are stored in) SRAM cells. However, all FPGA architectures that I know use configuration frames instead of 32-bit data words and 32-bit aligned memory addresses. Those configuration frames are typically longer, and the population of databits within the frame is very sparse. To me, the value 0xD20228FF (from the android kernel patch) doesn't look like a typical piece of FPGA configuration. I would expect something like 0x00110044, or 0x40840001.

7. FPGAs are used to get to the market quick. But they are not cost effective in a mass market, because they contain lots of flexibility (=extra area and layers) that is not needed once the functional requirements are frozen. EMMCs are clearly a mass market product. Including an FPGA with a bitstream size of 512KB is not the way to earn money. 512KB bitstream size is about equivalent to a XC3S1400A or a XC4VLX15.

Of course we don't have much official information, but what we have seems to support an embedded CPU core and at least 512kb SRAM available to it. For the reasons above, it is more likely than anything else.
 

Entropy512

Senior Recognized Developer
Aug 31, 2007
14,088
25,086
Owego, NY
Also note that eMMC stands for "embedded MMC" - For all practical purposes, it is an MMC/SD card but in a BGA package with some additional features to take advantage of the fact that it is a fixed installation with extra pins available.

We also know that almost all SD/MMC cards on the market use embedded controllers that execute some sort of firmware (they are clearly NOT an FPGA and , as the above poster stated, almost surely not a CPLD because you couldn't fit stuff like this in a CPLD), often on a separate die from the NAND flash memory - see http://www.bunniestudios.com/blog/?page_id=1022

We also know for a fact that these chips can have their firmware updated, however the exact method of updating the firmware is unknown AND the update method is known to be unreliable/dangerous when performed in-system.
 
  • Like
Reactions: M3TALLICA

letters_to_cleo

Senior Member
Jan 23, 2012
780
234
Portsmouth, N.H.
Isn't the eMMC have some form of embedded memory controller that optimized to take advantage of specific NAND flash performance features, including program and read cachings? I read somewhere that it 's even possible to boot directly from eMMC... if so, then I would suffice that this embedded memory controller(block-level interface/error management and wear-leveler) is the one controlling and handling the chip itself?
 

PaulF8080

Senior Member
Apr 10, 2012
116
15
I guess I can give a technical response without revealing secrets. I was called an RTLer. RTL is register transfer logic described in a hardware description language HDL. I used Verilog HDL at the end of my career.

Here is some pseudo code that can easily be converted to Verilog:

Code:
Case of State
    Idle:
        Case of eMMC_cmd
            Wipe:
                State = InitWipe
             more commands ...
        end case
     more states ...
end case

The synthesizer would generate tens of transistors. Why anyone would use hundreds of thousands of transistors for a cpu core and associated FW to do the same thing(interpret emmc commands) is very unlikely, but not impossible, in my opinion.

As a side note the chip I worked on also added an on chip memory controller and they just grabbed the RTL from the chip set people.

As far as the 2 hex digit chip revision, Intel would name the revisions, too. eg. 0x05 could be called "B1 step"

The only reason I brought up the FW issue is that an embedded programmer was discussing his guesses. It was very interesting to me until he got side tracked to "what if the eMMC FW got corrupted".
 

jetmarc

Member
Jun 6, 2012
15
26
Paul,

The official (!) firmware patch addresses suggest an SRAM size of at least 512KB, which translates into 25M transistors (6T). This is a given, unless you insist that Samsung will not align memory to natural boundaries.

Adding an extra half million (?) more for an ARM7 isnt that much. The IP issues weight in more than the silicon area. Also, ARM7 isn't a given - it's just my guess. Samsung has a 16bit CPU core of their own, and of course there are the usual suspects too.

I think you are not deeply familiar with the (disturbing) details of NAND flash, which contradict the harddisk-like guarantees that EMMC must give. EMMC is *not* just a memory controller or IO interface.

A simple NAND chip would be more or less equivalent to what you seem to think about. NAND client controllers accept commands like "fetch page, die=0 core=0 block=7 page=23" and then you (or your peripherial) polls a status register. Once the page is fetched, you access it through a memory-mapped 512 or 4K window (or you peripherial DMAs it into main memory). This kind of controller is indeed very simple, and will implemented in logic (Verilog if you want).

On the surface, EMMC looks similar. The spec lists quite some commands, and a few waveforms diagrams. But the difference is the *guarantees* it gives. EMMC offers a harddisk emulation, where 512-byte sectors can be written and retrieved without caring about the limitations of flash!

The three most important NAND limitations are:

1. Pages can be written only once per erase-cycle (meaning that "harddisk sectors" can't be changed individually, without affecting their neighbors).

2. Erasing flash blocks wears them out. Using FAT32 on a NAND without "wear leveling" stresses it beyond spec, simply because each block is allowed to fail once it has been erased/written more than X times. Filesystems like YAFFS try to address this issue natively (=they work with pure NAND). EMMC on the other hand wants to support traditional filesystems such as FAT32, which write the same sectors over and over (FAT=file allocation table, those sectors are updated during *every* write). This is why EMMC exists after all! It offers harddisk compatibility on NAND flash memory, without extra software.

3. Appending additional pages into a block can damage previously written pages. Usually there is a spec of how many times pages can be appended before you have to abandon the block.

To put your 10s of transistors into the real world:

Samsung has invented and promoted the most popular flash wear leveling system and flash translation layer. It is called XSR. I would be surprised to find anything different used in a Samsung EMMC.

It took me more than 16kb of ARM thumb code to do an (incomplete) blackbox implementation of XSR. Incomplete, as in: full read support but only limited write support). It was tedious to do in software, but doing it in hardware would have been a nightmare. It's really not hardware friendly. Many of the structures are clearly invented by a software engineer, rather than by an "RTL guy" like you.

You wouldn't be able to do all that in RTL with a straight face. And even if, then they would just hand you a new spec and ask you to "update". XSR is currently widely deployed in version 1.7. I wouldn't try to keep pace with software peoples cancerous "improvements" while being handicapped by wrong implementation decisions.

Of course you and me aren't in this situation, but the Samsung EMMC designers are. The fact that they keep selling product suggest that they made the right decisions (within reason, see superbrick bug).


I also want to take advantage of the attention and repeat my offer: If somebody prepares one of the EMMC chips (damaged or not), soldered to a breakout board with 2.54mm spacing for the relevant pins, I promise to spend time on it. Any takers?
 

Top Liked Posts

  • There are no posts matching your filters.
  • 60
    I am sooo fried after the meeting. It was quite long.

    Bad news: It turns out that the repair process turns eMMC chips from "damaged" to "completely toast" pretty often. As a result Samsung understandly does not want to release the info to the "wild" as it can easily kill a device if used improperly. Example - if a user thinks they have eMMC damage when they don't, and then tries to repair it - they have a high likelihood of making the problem worse. I strongly recommended that they talk to the RIFFbox team and a few JTAG repair specialists so that at least they have a chance of bringing these devices back - JTAG repair services have the technical skills and equipment to make a reliable determine on whether to perform a "last resort" attempt of fully wiping/resetting the eMMC.

    Samsung is 100% certain that non-secure erase as implemented by their stock recovery is safe. I still have concerns that in rare cases it can do damage, but unfortunately, every user that has reported eMMC damage caused by stock recovery has either sent their device back for warranty service already, or fell silent when asked for more information. There are also simply not many reports of this - maybe 4-5 on Note, and 4-5 on other devices two weeks ago when a bunch of ICS updates went out to other devices. Without more evidence, there is simply nothing I can do here.

    There is of course, the issue of kernels permitting secure erase commands through when they shouldn't - I think they finally realize how many ways that a secure erase command can be triggered, and how it is critical that secure erase is blocked by the kernel. They are working on deploying a fix to newer kernels that will take secure erase commands and replace them with nonsecure erase, so at least protection will improve.

    I would feel more comfortable if erase commands were completely blocked, but there is simply not enough evidence of problems with non-secure erase to have Samsung change their opinion on its safety.

    What is interesting is that there seem to be a lot of issues with internal communication within Samsung. In some cases it could be language barrier issues. It was apparently never clear to my contact that this was seen as long ago as December/January by the Epic 4G Touch community. Also, at least some of their engineers are swearing up and down that MMC_CAP_ERASE has always been enabled in the I9100 - even though we know without any doubt that it has been absent in I9100 builds previous to XWLPM and XXLQ5. There's a possibility this may have been a miscommunication somewhere - because someone saw MMC_CAP_ERASE in sdhci.c, they said the kernel had it - but what matters is whether it is present in mshci.c

    In general, I think this whole mess is a no-win situation for everyone involved. The only really good thing to come out of it is - my contact at Samsung is working with the XDA administrators to open up more communication channels so that if something like this happens again, it'll be caught earlier.

    I still have a bunch of followup I need to do in terms of getting them some additional information... but for now, I'm exhausted.
    51
    More info and source credits here, thanks to codeworkx https://plus.google.com/111398485184813224730/posts/21pTYfTsCkB.


    ATTENTION:
    We've contacted Samsung about the problem where performing a mmc erase could hardbrick your phone (i9100, i9100g, n7000, m250 - MAG4FA, VYL00M, and KYL00M with firmware revision 0x19 // T989 and I727 with fw rev 0x12) if it's having a faulty emmc chip.
    Read this thread for more informations about it: http://xdaforums.com/showthread.php?t=1644364

    They're working as hard as possible on a clean solution which will be ready soon.

    Please be patient and try to not flash any leaked kernels or kernels based on sources where MMC_CAP_ERASE is present.

    This app http://xdaforums.com/showpost.php?p=27014974&postcount=1 can be used to identify if your phone is affected or not.

    CM9 kernels are safe to flash.

    Please share!

    Update 14:56 CEST:
    Patches will be out in form of new official ROMs and also sourcecode releases after testing, which might take some time.

    New Clarifications by Entropy : http://xdaforums.com/showpost.php?p=28416723&postcount=403 - Must Read.

    To check whether your phone has the brick bug, download the Got Brickbug app from Chainfire: http://xdaforums.com/showthread.php?t=1693704

    Also do read the article on xda for more info.: http://www.xda-developers.com/android/samsung-diligently-working-towards-hardbrick-fix/
    25
    Right now, things aren't entirely definite, but it looks like I am probably going to be having a face-to-face meeting with Samsung next week.

    They are unwilling to provide AdamOutler with what is required to repair devices with damaged bootloaders, but we at least will be slightly better off than we were before.

    This is what is currently being discussed since USB-booting (See Unbrickable Mod for Galaxy S1/Nexus S devices - We are missing a small but critical component for Unbrickable to be deployed on Galaxy S2 family devices) is not something they are willing to discuss even though I feel it is necessary to deploy a proper solution to end users.

    1) A solution will only be available for users whose devices can still boot a kernel. If a device has a damaged bootloader or the kernel partition is damaged, these devices will be unrepairable by users. It may become possible to recover these devices via JTAG.
    2) The solution will require wiping the chip, repartitioning, and then repopulating bootloaders all in one go. This is going to have a very high failure rate - after all, even merely overwriting the bootloaders in an OTA (see Captivate Froyo and T-Mobile Nexus S ICS upgrade fiascos) often can kill a device - we have to wipe the chip, power cycle the eMMC chip from the kernel and reinitialize the MMC subsystem, repartition the chip, then rewrite the bootloaders.

    The little bit of good news is that at least these devices will become recoverable via JTAG (which they currently are not). The bad news is that a reliable solution will not be deployable to end users - a partial unreliable one will be, but I fully expect it to fail and require JTAGging for many.

    I'm really pushing for more, but it's not looking good.

    So the summary right now:
    If your device boots but is missing chunks of storage, we will possibly be able to fully restore the device
    Devices in the category above that fail to fully repair will have all storage space back, but will require JTAG to boot again
    Devices that do not boot will still require JTAG, however it should become possible for JTAG shops to repair the damage
    22
    An update: Our conference call last night went very well. I'm now 90% certain we will, at the very least, be able to turn "superbricks" into JTAG-recoverable "traditional hardbricks". Everyone is hopeful that some developers will be receiving this information sometime next week (do note that it is going to take some time to develop this information into a deployable solution - how long really depends on what I will discuss below.)

    This is of course not ideal, since it means the device is still bricked (but at least repairable using known methods), but we've made our case to Samsung why it is highly beneficial to everyone involved if the bits and pieces required to make Unbrickable Mod available for Exynos4 devices are provided. With that information, we will not only be able to fully recover "superbricked" devices to operational state, but will be able to recover many traditionally hardbricked devices to an operational state too. (Unfortunately, depending on the nature of traditional hardbricks, some may require resistor modifications to start the process. The "superbrick" recovery process fully wipes everything, so the device will USB-boot without any resistor mods.)

    One caveat: Right now, the process of recovering from a superbrick will ALSO nuke your EFS partition. EVERYONE who is able to should make a backup of their EFS partition to their PC (or at a bare minimum, an external SD card) ASAP! Those who have lost their EFS partition to Superbrick damage are in a unique and unfortunate situation, there are some possible paths for recovery here though that the ERD community, XDA admins, and a few others are looking into. The problem is that regenerating the EFS can change a phone's IMEI, which presents unique ethical and legal challenges.

    As to being stuck on GB - as long as you use a safe kernel (such as Speedmod, Franco's kernel, or the CM9 kernel) you should be fine.
    17
    At least hardwareside, samsung keep on replacing bricked mobo´s with the same vulnerable eMMC chipset / 0x19 - as they did in my case. for my sake they replaced it as case of warranty.

    is there any possibility at all - in case samsung develope it - to update the eMMC firmware?!
    To my knowledge, not without fully wiping the chip, including the bootloaders. This will turn your phone from "superbrick" to "traditional hardbrick" - which is at least JTAG-recoverable. However Samsung will not release the information required to do this, nor will they give Mr. Sumrall permission it seems.

    At the moment, there is a mystery around extremely low failure rate on virgin stock ROM vis-a-vis rooted ROM.

    Entropy512 is working on unraveling the same. Hope he gets some concrete lead on the same. Patience till then.
    Yeah. I have the explanation why, the problem is I have been asked not to share it by the source. I'm in the process of drafting a response to said source that includes a bit of yelling... It explains why the damage rate is different between stock recovery and CWM - however the damage rate for stock recovery, while lower, is nonzero.


    Yes you are right my friend. NO ONE HAS bricked on official NON ROOTED ICS.
    When I ask for citation to add credence to their claim of Stock non ROOTED ICS being buggy, I always get blank response.


    Sent from my GT-N7000 using xda premium
    PRESENCE OF THE su BINARY IN SYSTEM DOES NOT MATTER. IT DOES *NOTHING* WHEN YOU ARE IN RECOVERY. ONLY KERNEL AND INITRAMFS MATTER THEN.

    You obviously don't even remotely understand how recovery works, such as the fact that everything in recovery runs directly from an initramfs image that is packed into the kernel itself. What is present in /system simply does not matter - if a brick happend with stock kernel and Samsung stock recovery, that is enough to prove that configuration is dangerous because the contents of /system do not matter. Especially not the presence of an su binary, considering that RECOVERY ALWAYS RUNS WITH ROOT PRIVELEGES!!!

    First, forgive for my english which if far from perfect and may not reproduce my real thoughts...

    The problem is even if one knows there is a emmc bug, nobody knows for sure where it's located and how if works, in order to reproduce the effect...
    I've tried to read all informations and posts about this bug but there were some many sparse threads, so it's hard to follow them all. ^^
    Bull****. We know where it's located and how it's triggered. It's in the flash chip, it's triggered by ERASE commands sent to the flash chip.

    I may be not an expert at emmc, but at least an "old" dev (but not yet retired ;) ) (and among other occupations), and that's the first time I encountered such a bug that causes intermittent flaw on the way a bug is supposed to be working. :D
    Considering even the Google engineer that described the bug said it's intermittent, this isn't a surprise.

    If was first explained the MMC_CAP_ERASE flag (and CONFIG_MMC_DISCARD, CONFIG_MMC_DISCARD_MERGE, and CONFIG_MMC_DISCARD_MOVINAND set) could be the true cause of the bug, but that doesn't explain why in the same condition GB doesn't cause bricked or corrupted devices, nor why removing the MMC_CAP_ERASE flag is not enough to solve the problem (except maybe with ICS for i9100, but why ICS and not GB ?) ???
    Please read before spouting utter and total bull****. It has already been explained by Mr. Sumrall why Gingerbread doesn't cause bricks - Gingerbread recoveries and update-binaries don't attempt to do wipe commands. That's how Chainfire has rendered recent CF-Root releases "safer" to use with affected kernels, and how CM9 has taken measures to reduce the chances of damage when flashing CM9 now that there is much more information about the bug.

    And you clearly are completely uninformed if you think MMC_CAP_ERASE removal doesn't protect from damage - there has not been a single incident of the bug triggering from someone confirmed to actually be running a "safe" kernel. The one person who bricked after flashing Franco R3 with Mobile Odin never verified the flash before dropping into recovery - and MO is rather notorious for occasionally failing to flash.

    It was also explained than erasing large files could also brick or corrupt some devices (only with internal or also with external storage ?), the reason I asked Entrophy, in another thread (Epic 4G Touch), why the MMC_CAP_ERASE flag was not removed from sdhci.c from latest ICS source code for i9100 and there was no problem... I now sdhci.c is dealing with external SD Card, and I haven't got time to develop my explanation not to study the whole source code, but I was wondering if it was using a complete different instructions set to do the erase/Trim operation or if at a time or another it was not also calling some instructions from emmc and so the incriminated firmware... Does that also mean there are 2 totally different source codes to deal with internal storage or external storage ??? ^^
    It was never "explained" that erasing large files could also brick - it was just speculated in the early days before Mr. Sumrall provided valuable information about the bug. Erasing large files is safe because none of our file systems are mounted with the "discard" flag. (Yeah, I need to update the PSA in this regard, it's a bit outdated now...)

    There are not two totally different source code bases to deal with internal and external storage - it's 90% common code. However the host driver is where there are differences, and the SDHCI host driver is only used for the external SD card and the wifi chip (SDIO). The internal storage is the MSHCI driver. The capabilities flags are in the source code for the host driver.

    This is just to explain that since "we" can't find a quick and reliable solution through the codes "yet", at least "we" could try to find indirect explanations by other means...
    You're severely misinformed if you think we can't find a quick and reliable protection solution... Preventing ERASE commands from reaching the chip is semi-reliable if done in userspace, and 100% reliable if done by blocking ERASE attempts in the kernel. (I consider failures to flash a safe kernel in the first place not to be a failure of the kernel itself.)