FORUMS
Remove All Ads from XDA

[DEV] Debate: ext2 vs ext3 vs ext4

1,404 posts
Thanks Meter: 459
 
By xan, Retired Recognized Developer on 5th October 2010, 06:07 PM
Post Reply Email Thread
There is significant need for detailed information about filesystems here. Info is scattered in many topics.
My goal is to lead all thought concerning filesystem debate to this thread.
Lots of thought is given from the developers to choosing the correct one.

My idea is to start an detailed and technical only discussion about best suitable RFS replacement for SGS.

To start, I recall a blog post from Theodore Ts'o, one [if not main] of ext4 developers, who tested and compared write operations overhead on ext2/3/4 filesystems.

1. Thoughts by Ted - SSD’s, Journaling, and noatime/relatime

As a second reference I post Samsung's moviNAND™ (used in SGS) product info:
2. SAMSUNG Semiconductor - Products - Fusion Memory - MoviNAND
Third one, RFS product info:
3. SAMSUNG Semiconductor - Products - Flash - Flash Software

The first link should be the most interesting one. There are interesting results about overhead caused by journaling and comparisons of data written executing the same operations in these three filesystems.
Second and third ones should justify (or exclude!) the use of 'general purpose' filesystems on SGS.
The Following User Says Thank You to xan For This Useful Post: [ View ] Gift xan Ad-Free
 
 
5th October 2010, 06:31 PM |#2  
Senior Member
Flag JHB
Thanks Meter: 767
 
More
Nice topic! Hopefully we can get some useful information out of this that can relate back to better lagfixes!

I'd first like to say: EXT3 doesn't make any sense. It has no advantages other than being a 'middle ground', and a fairly poor one at that. I don't think EXT3 is really relevant here.

So that leaves EXT2, and EXT4. In effect, EXT4 is just EXT2 with added options that are available, so it comes down to 'EXT simple' which is EXT2, and 'EXT features' which is EXT4. The features add to the overhead (how much? we need some way to test it!) and also reduce the speed in the case of journaling. The purpose of journaling is to remove the need for filesystem checks, which is fairly important to a lot of users who don't want their phone to be sitting at a black screen for minutes while booting.

To address the topics though, in regards to the Samsung recommendations - Samsung recommends we use RFS. I would personally discount any recommendations from Samsung at this point as being clueless, so unless other sources back up anything said by Samsung, it should probably be discounted!

As far as the Tso blog post, it is in regards to wear and tear on the SSD, and not in regards to speed. EXT4 adds both increased wear (up to 40% in a 'make clean' - how much does it add to normal Android operations, such as writing sqlite databases?) and decreases the speed (certain apps run slower). What the blog post is saying is that nothing bad will come from using EXT4, which is great, and let's us choose - since it doesn't say anything bad will come from using EXT2 either, except for the need for filesystem checks.

As far as fragmentation issues - the underlying MoviNAND is constantly shuffling around data by itself, so fragmentation is a bit of an unknown at this point in all cases.

So I'm gonna conclude (and hopefully someone will prove this wrong and give us more information) -

EXT2 : faster and less wear on the SSD and less overhead on the CPU, but requires disk checking on boot.

EXT4 : slightly slower and slightly more wear and slightly more CPU, but very fast boots.
The Following 9 Users Say Thank You to RyanZA For This Useful Post: [ View ] Gift RyanZA Ad-Free
5th October 2010, 07:09 PM |#3  
Senior Member
Thanks Meter: 51
 
More
I personally don't worry about flash wear , it takes many years for it to run out of write cycles by then you will have switched to new phone two or three times, that is the case for all SSD's during normal use, only exception would be server load but then you will be using premium SLC nand anyways, or intentionally running heavy simulated load 24/7 for the sake of killing nand

As for EXT as long as it is not crappy RFS , ext2 or 4 is ideal in my opinion.
5th October 2010, 07:27 PM |#4  
xan's Avatar
OP Retired Recognized Developer
Flag Crack-ow
Thanks Meter: 459
 
Donate to Me
More
Quote:
Originally Posted by RyanZA

As far as fragmentation issues - the underlying MoviNAND is constantly shuffling around data by itself, so fragmentation is a bit of an unknown at this point in all cases.

We should notice two fragmentation levels:
1) File system fragmentation (external and internal),
2) MoviNAND fragmentation, which is present, but nature and level is unknown.

In the context of ext2/3/4 choice I must disagree to some extent.
Nature of first one is well-known, and ext4 has been designed with lowering external file system fragmentation in mind, compared to ext2/3. This makes ext4 more viable option at this point, which should be noted.
There is no info given about MoviNAND layer fragmenation, nor wear-leveling methods. My conclusion (based on study of wear-leveling methods:
Micron TN-29-42: Wear-Leveling Techniques in NAND Flash Devices Introduction
) is that there might be significant amount of underlying layer fragmentation present, caused by file system not being 1:1 with physical media.

We might need to get more detailed information about this. The one and only mention about in-build wear leveling in MoviNAND I personally found is:
Quote:

NAND technology changes invisible to the host
[...]
- Variation of NAND flash feature NAND block sizes, page sizes, planes, new features, MLC vs. SLC, wear leveling and ECC requirements

This is quite short.

Quote:
Originally Posted by RyanZA

EXT4 adds both increased wear (up to 40% in a 'make clean' - how much does it add to normal Android operations, such as writing sqlite databases?) and decreases the speed (certain apps run slower).

This seems to be the worst case scenario. Could anyone with knowledge explain the nature of 'make clean' operation?
Based on this study, average additional wear caused by amount of data written (only!) is no more than 15%, in the file system level.
We should consider write amplification, which level is unknown. Writing one byte in the file causes whole low-level block to be rewritten. Does this rewrite whole filesystem block? If so, it causes to rewrite all MoviNAND blocks containing data for one filesystem block.



We know that wear comes from write count per block, not write amount.



There are several improvements in ext4 which lower write count compared to ext2/3.
Here's some of them:
Persistent pre-allocation which prevents writing 0s to whole preallocated space,
Multiblock allocator, making allocating multiple blocks a single write operation,
Allocate-on-flush, delaying writes.

This adds up, decreasing number of writes.


//EDIT: Sorry for constantly editing this post, but I want to be as exact and understood I can be.
The Following User Says Thank You to xan For This Useful Post: [ View ] Gift xan Ad-Free
5th October 2010, 07:36 PM |#5  
Senior Member
Thanks Meter: 14
 
Donate to Me
More
Shouldn't we not worry about file system defragmentation since it is only a big deal for spinning disk drives for their access times? whereas in SSD/NAND, access time isn't a real concern?
5th October 2010, 07:41 PM |#6  
Senior Member
Flag JHB
Thanks Meter: 767
 
More
Quote:
Originally Posted by xan

We should notice two fragmentation levels:
1) File system fragmentation (external and internal),
2) MoviNAND fragmentation, which is present, but nature and level is unknown.

In the context of ext2/3/4 choice I must disagree to some extent.
Nature of first one is well-known, and ext4 has been designed with lowering external file system fragmentation in mind, compared to ext2/3. This makes ext4 more viable option at this point, which should be noted.
There is no info given about MoviNAND layer fragmenation, nor wear-leveling methods. My conclusion (based on study of wear-leveling methods:
Micron TN-29-42: Wear-Leveling Techniques in NAND Flash Devices Introduction
) is that there might be significant amount of underlying layer fragmentation present, caused by file system not being 1:1 with physical media.

We might need to get more detailed information about this. The one and only mention about in-build wear leveling in MoviNAND I personally found is:
This is quite short.

You can't just de-couple filesystem fragmentation from the underlying data fragmentation, I think. Or at least, the issue isn't all that cut and dried.

The major fragmentation changes in EXT4 over EXT2 is to ensure that the same file shares contiguous blocks. But, does this mean anything at all on top of MoviNAND?

NAND is random access - so 2 block file which has one block at the start of the disk, and one at the end, should perform identically to a 2 block file with the two blocks next to each other. However, this is where the practical comes in - If the MoviNAND has more than 1 chip with different channels or something, then having the 2 block file have block 1 on chip 1 and block 2 on chip 2 may actually be faster than having both blocks on chip 1.

Basically, fragmentation on the MoviNAND is not necessary the same as fragmentation on spinning media! And the MoviNAND isn't going to listen to where the filesystem tells it to put data, it provides a 'virtual table' that the filesystem sees, but unlike an old spinning disk, the table isn't what's actually going on.

So, I'm gonna say that We Know Nothin!
5th October 2010, 07:46 PM |#7  
Senior Member
Thanks Meter: 14
 
Donate to Me
More
^my point, better explained
5th October 2010, 08:02 PM |#8  
Member
Flag Gouda
Thanks Meter: 3
 
More
I do not have a lot of knowledge of file systems, but as explained here:
maenad.net/geek/di8k-debian/node29.html (i think that counts for ext4 as well)


Quote:

Q: What is journaling?
A: It means you don't have to fsck after a crash. Basically.
This is useful, because it means that every time your screen whites out and crashes while choosing the right video card (Section 1.2.1), you don't have to sit through an entire filesystem check of every inode. The filesystem still fscks itself every X mounts or Y days, but doesn't put you through the entire wait every time you crash it.
To convert partition,s to the ext3 filesystem, you need to cleanly unmount them, boot something else (like the Debian CD you installed from -- see Section 6.2 on how to do this), and then, on a console, do:
tune2fs -j /dev/hdaX
wherein /dev/hdaX is the partition you want to add journaling to (hence the `-j' flag).

So when i'm correct, we are safe to use ext2 and loose the overhead, and take the fsck for granted.

Is there a raid solution for the external / internal sd cards available?
5th October 2010, 08:27 PM |#9  
Senior Member
Thanks Meter: 8,159
 
Donate to Me
More
Also one point to consider is:

Are both the Internal SD (mmcblk0) and STL partitions MoviNAND? Or...

Is only the STL partitions OneNAND and the Internal SD is a separate device (MoviNAND)? Maybe RFS is necessary for OneNAND but not for MoviNAND?
5th October 2010, 08:40 PM |#10  
xan's Avatar
OP Retired Recognized Developer
Flag Crack-ow
Thanks Meter: 459
 
Donate to Me
More
Quote:
Originally Posted by RyanZA

You can't just de-couple filesystem fragmentation from the underlying data fragmentation, I think. Or at least, the issue isn't all that cut and dried.

Yes, this is correct. Furthermore I believe that file system fragmentation may lead to increased wear, depending on relation of file system and MoviNAND internal block sizes.

This actually depends on MoviNAND wear leveling algorithm, which is yet to be revealed.

Quote:
Originally Posted by RyanZA

The major fragmentation changes in EXT4 over EXT2 is to ensure that the same file shares contiguous blocks. But, does this mean anything at all on top of MoviNAND?

If this means something - we might actually add unneeded write count. If not, using these two file systems should result in the same amount of MoviNAND wear caused by fragmentation. We might be terribly wrong and introduce additional write counts. Why take the risk?

Quote:
Originally Posted by RyanZA

NAND is random access - so 2 block file which has one block at the start of the disk, and one at the end, should perform identically to a 2 block file with the two blocks next to each other.

In my opinion its too speculative. If it is true, it might increase linear operation speeds. There is a way to test it: write a large file and monitor MB/s in short time periods. If two or more performance levels are noticeable -> this makes this file to be written simultaneously, and your suspicions may be true. Although fragmenting file system in sake of exploiting this behavior is not a good idea.

What is sure, this puts excessive workload on controller, which should be always avoided.

Quote:
Originally Posted by RyanZA

Basically, fragmentation on the MoviNAND is not necessary the same as fragmentation on spinning media! And the MoviNAND isn't going to listen to where the filesystem tells it to put data, it provides a 'virtual table' that the filesystem sees, but unlike an old spinning disk, the table isn't what's actually going on.

Of course it isn't. But remember, with every additional write cycle in file system, there comes unknown amount of additional write cycles on MoviNAND. Also, with every single MoviNAND write, there comes full block (128KB? no idea here) read, ecc check, ecc recalc and rewrite. (Unless MoviNAND utilises copy-on-write, which is doubtful since it might increase wear)

What comes to my mind are first SSD's available on the market, some time ago. They were flawed, their random write times were unacceptable. Random write operations might not be that fast on MoviNAND as we might expect.

I think the fragmentation issue should be avoided, because in worst case scenario nor we lose or gain anything having less fragmentation.

Quote:
Originally Posted by RyanZA

So, I'm gonna say that We Know Nothin!

This however, is genuine statement
5th October 2010, 08:42 PM |#11  
xan's Avatar
OP Retired Recognized Developer
Flag Crack-ow
Thanks Meter: 459
 
Donate to Me
More
Quote:
Originally Posted by jbrugger

So when i'm correct, we are safe to use ext2 and loose the overhead, and take the fsck for granted.

Is it worth losing all features lowering write count and reducing fragmentation?

It might be better to use ext4 w/o journaling than ext2. This depends if the benefits of ext4 listed in one of my previous posts work without journaling.
Post Reply Subscribe to Thread

Guest Quick Reply (no urls or BBcode)
Message:
Previous Thread Next Thread
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes