garwynn
Retired Forum Mod / Inactive Recognized Developer
Analysis/Opinion
OK, now that you've seen the comments from the Android team, have links to the code and we have confirmed that our devices are affected, let's try and walk this through:
Linux File System Terms:
inode:
Also known as metadata, it's the data about the data. Out of the many articles out there, I thought this might help without going too much into the technical side: http://www.linux-mag.com/id/8658/
Binary Bitmap:
In order to account for the usage of the blocks on the filesystem, the ext2 filesystem consists of a block bitmap. This keeps track of blocks that have been used and those that are free. Each bit in the Block Bitmap denotes an integral number of fragments. So if a bit is allocated to a file and marked as used, then an entire set of fragments are allocated to it.
The Block Bitmap is a clever way to keep track of new empty and old used ones. In order to look for a block, one needs to check the group to which the file belongs. Then the Block Bitmap of the appropriate `Group’ is selected and searched for the required block.
(Source: http://freeos.com/node/41)
Q: Can inserting 32 Kb of zeros corrupt a file system?
Certainly - as mentioned by Mr. Sumrall, it just depends on where the insertion was made.
Q: Can this corruption cause an I/O error?
Again, yes - under the conditions as described by Mr. Sumrall
Q: Can this particular bug be repaired?
Mr. Sumrall says no, and that sounds awfully familiar with our bricks.
Q: Why couldn't JTAG, with its bit blast, at least reset the values and allow it to go ahead?
The answer, as far as I can tell, is simply that the support is not available at a driver and/or eMMC controller level to handle this type of operation. This is either because the embedded controller chip simply cannot do so or the driver has not been designed to use those instructions as of yet. As a result, all it can do is throw an error in frustration. It might be possible down the road but not now. So instead of waiting for this to come Samsung implemented a workaround to the WL logic to avoid it corrupting the filesystem.
I'm still stumped as to why we saw this particularly on the /data portion of the filesystem. It's the most likely to see file changes the most often so perhaps the wear leveling logic kicked in on this partition first. It's also interesting to note that bypassing that block restores the file system to something stable, as tested by drnull. But if this truly is the bug, skipping the bad blocks is not solving the problem; it's only extending the life at the cost of possibly further corrupting the file system. I'm optimistic that it may be possible in the future to save a device even after it's bricked - filesystem corruption is not physical damage so I consider it in the realm of possibility. Whether it is practical or cost-effective is up to Samsung - they may even have a solution already available, just not for end users.
Initial Summary:
Based on available information this is does have significant credibility to be the bug in question and a rather clever attempt to work around the issue short of eMMC replacement. It should be tested for verification by a willing member of the community so long as they can afford to brick and replace the device if necessary. If verified the solution may not save an already bricked device at the moment, but it may avoid future bricks of this nature. It would also mean a high probability that any versions of Android prior to 4.0.4_r1.1 (which is the first standard build with the fix) should be the minimum requirements for any device with this eMMC if it can be supported.
*Disclaimer*
Comments and summary on this post, unless otherwise specified, should not be considered the definitive conclusion for this topic. Instead it is a summary of my observations - as such it should be reviewed and critiqued by others for possible improvement before the community comes to a conclusion.
OK, now that you've seen the comments from the Android team, have links to the code and we have confirmed that our devices are affected, let's try and walk this through:
Linux File System Terms:
inode:
Also known as metadata, it's the data about the data. Out of the many articles out there, I thought this might help without going too much into the technical side: http://www.linux-mag.com/id/8658/
Binary Bitmap:
In order to account for the usage of the blocks on the filesystem, the ext2 filesystem consists of a block bitmap. This keeps track of blocks that have been used and those that are free. Each bit in the Block Bitmap denotes an integral number of fragments. So if a bit is allocated to a file and marked as used, then an entire set of fragments are allocated to it.
The Block Bitmap is a clever way to keep track of new empty and old used ones. In order to look for a block, one needs to check the group to which the file belongs. Then the Block Bitmap of the appropriate `Group’ is selected and searched for the required block.
(Source: http://freeos.com/node/41)
Q: Can inserting 32 Kb of zeros corrupt a file system?
Certainly - as mentioned by Mr. Sumrall, it just depends on where the insertion was made.
Q: Can this corruption cause an I/O error?
Again, yes - under the conditions as described by Mr. Sumrall
Q: Can this particular bug be repaired?
Mr. Sumrall says no, and that sounds awfully familiar with our bricks.
Q: Why couldn't JTAG, with its bit blast, at least reset the values and allow it to go ahead?
The answer, as far as I can tell, is simply that the support is not available at a driver and/or eMMC controller level to handle this type of operation. This is either because the embedded controller chip simply cannot do so or the driver has not been designed to use those instructions as of yet. As a result, all it can do is throw an error in frustration. It might be possible down the road but not now. So instead of waiting for this to come Samsung implemented a workaround to the WL logic to avoid it corrupting the filesystem.
I'm still stumped as to why we saw this particularly on the /data portion of the filesystem. It's the most likely to see file changes the most often so perhaps the wear leveling logic kicked in on this partition first. It's also interesting to note that bypassing that block restores the file system to something stable, as tested by drnull. But if this truly is the bug, skipping the bad blocks is not solving the problem; it's only extending the life at the cost of possibly further corrupting the file system. I'm optimistic that it may be possible in the future to save a device even after it's bricked - filesystem corruption is not physical damage so I consider it in the realm of possibility. Whether it is practical or cost-effective is up to Samsung - they may even have a solution already available, just not for end users.
Initial Summary:
Based on available information this is does have significant credibility to be the bug in question and a rather clever attempt to work around the issue short of eMMC replacement. It should be tested for verification by a willing member of the community so long as they can afford to brick and replace the device if necessary. If verified the solution may not save an already bricked device at the moment, but it may avoid future bricks of this nature. It would also mean a high probability that any versions of Android prior to 4.0.4_r1.1 (which is the first standard build with the fix) should be the minimum requirements for any device with this eMMC if it can be supported.
*Disclaimer*
Comments and summary on this post, unless otherwise specified, should not be considered the definitive conclusion for this topic. Instead it is a summary of my observations - as such it should be reviewed and critiqued by others for possible improvement before the community comes to a conclusion.
Last edited: