Introducing XDA:DevCon – A Conference For Developers By Developers
XDA Developers Android and Mobile Development Forum
Forgot your password?
 
Post Reply+
Tip us?
 
RyanZA
Old
#1  
Senior Member - OP
Thanks Meter 655
Posts: 2,021
Join Date: Jan 2006
Location: JHB
Default Reality behind RFS Lag

This is probably missing a lot of facts that we haven't uncovered yet. When we learn more, we can update what we know here

Background

All data is stored on an 8gb or 16gb MoviNAND chip, of which 2GB is 'system data', and the rest is for user storage. The MoviNAND is one of the first mobile 'smart SSD' chips. That means that the MoviNAND handles all operations such as data wear leveling, physical data lookup, as well as having it's own internal buffers. This cleverness is both good... and very bad.

FSYNC

When writing data to disk, your system and apps will make a call to the driver to 'write some data to file X'. This data will then be placed into kernel filesystem buffers and streamed off as commands to the MoviNAND. The MoviNAND will then slowly accept these commands, and place them into its own buffer, and the disk controller itself will then go about it's business writing this data to disk, using lookup tables to determine where to write the data to ensure maximum NAND lifetime, etc. It does a lot of work.

The system or apps also have an extra tool, called FSYNC. When this is used, the kernel and filesystem will clear the buffer for the affected file, and ensure it is written to disk. The current thread will block, and wait for the fsync call to return to signal that the data is fully written to disk. The kernel itself will wait for an event from the MoviNAND to signal that the data has been completely written.

In a 'dumb' disk, this fsync is fairly quick - the kernel buffer will be written directly to where the kernel has directed, and the round trip time (RTT) will be as long as it takes for data to be written.

In a 'very smart' desktop SSD, the fsync can return instantly - the disk controller will take the data and place it in it's battery-backup protected, and then go about it's wear leveling and writing in the background without bothering the system.

In the 'smart' MoviNAND, the fsync will take a very very long time to return - sometimes fsync on MoviNAND will take several seconds(confirm?) to return. This is because the MoviNAND may have a long line of housekeeping tasks waiting for it when a fsync is called, and it will complete all of it's tasks before returning.

RFS

RFS has a fairly badly written driver, that will call an fsync on file close.

Basically, RFS runs in 'ultra secure' mode by default. This security may not be really needed - I personally don't want it if it means enormous slow downs. It also doesn't help data security if the system/app is holding a file open, only if it closes the file. The MoviNAND is also fairly smart, and appears to write it's cache to disk before turning off, and also appears to have capacitors to keep it alive for a little bit of time in the event of a power cut.

SQLite

Most Android apps use SQLite - a fairly simple database that is easy to embed. Sqlite has 'transactions' - not real transactions, but a transaction in sqlite is where the database is locked for the duration of a database write, and multiple databases writes can be included in one transaction. At the end of a transaction, sqlite will call FSYNC on the database file, causing a possibly long wait while the MoviNAND does it's thing. Certain applications will not bunch up writes into a single transaction, and will do all of their writes in new transactions. This means that fsync will be called again and again. This isn't really a problem on most devices, as fsync is a very fast operation. This is a problem on the SGS, because MoviNAND fsync is very slow.

The various fixes and why they work

Native EXT4 to replace RFS (Voodoo)

By replacing RFS with EXT4, the 'sync on fileclose' problem is removed. The EXT series of filesystems is also more efficient at allocating information into blocks than RFS/FAT32 is. This means less real writes to MoviNAND, which means that the MoviNAND buffer should be smaller, and when a sync is called, fewer commands have to be run. When a sync is called on EXT4, it will still be very slow, as the MoviNAND's sync is still slow.
Basically, EXT4 improves filesystem grouping which leads to less commands, and does not have the broken 'sync on file close' that RFS does. It will not heavily improve sqlite database access in certain apps, as the full fsync on transaction end will still have to go through MoviNAND, and will be slow.

When pulling out the battery, there is a chance to lose data that has been written to a file but has not yet been told to sync to disk. This means that EXT4 is less secure than RFS. However, I believe the performance to be worth the risk.

Loopback EXT2 on top of RFS (OCLF)

By creating a loopback filesystem of EXT2, the 'sync on fileclose' problem is removed as well. Since the Loopback File is never closed until the EXT2 is unmounted, RFS will not call fsync when a file in the EXT2 loopback is closed. Since a single large file is created on RFS instead of multiple small files, RFS is unable to mis-allocate the file, or fragment it. The actual allocation of filesystem blocks is handled by EXT2. As a note, care should be taken in making the large file on RFS - it MUST align correctly with the MoviNAND boundries, or operations will be slowed down due to double-disk accesses for files, etc. It is unknown whether OCLF is aligning this correctly (how to determine this? 4KB block size gives double the performance of 2KB block size, so it might be aligning it correctly already).

Loopback also has the benefit of speeding up Sqlite databases (at the expense of a transaction being lost in power outage, as it could still be in ram). As always, this is a performance tradeoff between data security when the battery is pulled out, and performance. When pulling a battery out while using the loopback filesystem, there is a chance to lose the last few seconds of database writes. In practice, this isn't a huge deal for a mobile phone - most lost data will be resynced when the phone reboots. In my opinion, the performance is worth it because of the very slow speed of a sync on MoviNAND.

Loopback EXT2 on top of EXT4

All of the above for normal loopback EXT2 applies. In addition, when the loopback flushes data, it will be flushed to EXT4 instead of RFS. This will probably be better than flushing to RFS, as the RFS driver is not as well written as the EXT4 driver. The difference should not be very large, though.

Journaling

Journaling on an SSD is not required. Your data will not be lost, your puppy will not die. Here is a post made by Theodore Tso - http://marc.info/?l=linux-ext4&m=125803982214652&w=2
Quote:
But there will be some distinct tradeoffs with
omitting the journal, including possibility that sometimes on an
unclean shutdown you will need to do a manual e2fsck pass.
Not using a journal is not a big deal, as long as you take care to do a full e2fsck pass when an unclear shutdown has occurred. This is the main reason for a journal - to prevent the need to do a full disk check, and instead the journal can be easily read, and the full disk check avoided.

EXT2 vs EXT4

EXT2 appears to work better on the SGS than EXT4. This is because EXT4 has more CPU overhead than EXT2. Journaling is also very bad on MoviNAND. Why? It appears to be the command buffer in the MoviNAND controller. A call to update the journal will use a command slot in the MoviNANDs buffer, that could otherwise have been used for a real disk write. This means that journaling on MoviNAND is a VERY expensive operation compared to journaling on a 'dumb' disk.

Well, you could technically use EXT4 and simply disable the high cpu and other features until you are left with EXT2, since EXT4 and EXT2 are basically the same thing.

At any rate, the difference between EXT4 and EXT2 is not very large, and there's no need for flamewars over it - it comes down to a choice of 'running' performance vs 'startup' performance, with EXT2 edging out EXT4 for everyday speed, while EXT4 not required a long disk check at boot.

Future Work

Rewrite the firmware for the MoviNAND's flash to handle fsyncs properly and not bring the system to it's knees. I joke, but this is really the true solution.

Other solutions include hacking EXT's fsync method to return instantly, and ensuring that the real fsync is called when the system shuts down. Or doing nothing, fsync is there for a reason, I guess, and would be fine if MoviNAND's fsync wasn't so very slow.

There is probably a lot of small details missing from this writeup. They'll be updated when we learn more. Thanks for all the useful discussions and arguments, everyone!
The Following 7 Users Say Thank You to RyanZA For This Useful Post: [ Click to Expand ]
 
psphknxp
Old
#2  
Junior Member
Thanks Meter 2
Posts: 19
Join Date: Aug 2010
Location: HKSAR
Thanks RyanZA a lot and it's good thread to all SGS users to understand what's we're running!

Keep on going!
 
lycan_codex
Old
#3  
lycan_codex's Avatar
Senior Member
Thanks Meter 31
Posts: 287
Join Date: Apr 2010
Location: Bangalore
thanks for breaking it down for largescale comsumption ! loved reading this post.
 
dakine
Old
#4  
dakine's Avatar
Senior Member
Thanks Meter 35
Posts: 711
Join Date: Nov 2006
Excellent post, it seems like you enjoy figuring this stuff out. Reading about it like this even gets me interested. Samsung would do well in hiring more people like you.
 
andrewluecke
Old
#5  
Senior Member
Thanks Meter 15
Posts: 848
Join Date: Jul 2010
Interesting.. How did you work these behaviors out, by checking the code?
 
heloman_limboonhai
Old
#6  
Junior Member
Thanks Meter 0
Posts: 10
Join Date: Feb 2009
Thanks RyanZA. You are a impressive coder with so much information.

thanks for sharing and hope that we can get it fix forever and get the desire HD rom for us.
 
dakine
Old
#7  
dakine's Avatar
Senior Member
Thanks Meter 35
Posts: 711
Join Date: Nov 2006
RFS has been around for a bit and is used on other phones do those phones have the same lag issues as the sgs?

Not sure if it helps but I stumbled on this:


http://www.samsung.com/global/busine...ting_Guide.pdf

http://movitool.ntd.homelinux.org/tr...itool/wiki/RFS
 
msri3here
Old
#8  
Senior Member
Thanks Meter 48
Posts: 227
Join Date: Aug 2010
Location: Pune

 
DONATE TO ME
Thanks dude...
being a techy guy, enjoyed reading your post and very nice to know the details of the file system...
Looking forward to your future work and updates
 
ykk_five
Old
#9  
Recognized Developer
Thanks Meter 391
Posts: 879
Join Date: Jul 2010

 
DONATE TO ME
ryanza, u crazy guy (again!! ), u did a good job. it should be clear enough for ppl to decide which fs is a better choice for their particular uses.

and , in fact, i've tried all of them. ext4 is far more cpu extensive, and caused a lot of lags when i was listening to mp3s while surfing the internet. ext3 is the modest one, while ext2 is very fast with the expense of "possible data loss".

for the ext fs over loop devices, it seems there is no impact on performance issue, as well as the noatime and nodiratime mount options, although theoritcally they should increase the performance a bit by skipping the atime and diratime jobs
 
cantIntoCode
Old
#10  
cantIntoCode's Avatar
Recognized Developer
Thanks Meter 2580
Posts: 1,599
Join Date: Aug 2010
Location: Glasgow
Thanks for the huge breakdown. Very informative. Hopefully someone sorts out this non sense in the near future. Looking forward to see what happens
Follow me on Twitter: http://twitter.com/cantIntoCode

 
Post Reply+
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Go to top of page...