Taken from this post here
Q. "What purposes does an i/o scheduler serve?"
Minimize hard disk seek latency.
Prioritize I/O requests from processes.
Allocate disk bandwidth for running processes.
Guarantee that certain requests will be served before a deadline.
So in the simplest of simplest form: Kernel controls the disk access using I/O Scheduler.
Q. "What goals every I/O scheduler tries to balance?"
Fairness (let every process have its share of the access to disk)
Performance (try to serve requests close to current disk head position first, because seeking there is fastest)
Real-time (guarantee that a request is serviced in a given time)
Q. "Description, advantages, disadvantages of each I/O Scheduler?"
Inserts all the incoming I/O requests to a First In First Out queue and implements request merging. Best used with storage devices that does not depend on mechanical movement to access data (yes, like our flash drives). Advantage here is that flash drives does not require reordering of multiple I/O requests unlike in normal hard drives.
Serves I/O requests with least number of cpu cycles. (Battery friendly?)
Best for flash drives since there is no seeking penalty.
Good throughput on db systems.
Reduction in number of cpu cycles used is proportional to drop in performance.
Goal is to minimize I/O latency or starvation of a request. The same is achieved by round robin policy to be fair among multiple I/O requests. Five queues are aggressively used to reorder incoming requests.
Nearly a real time scheduler.
Excels in reducing latency of any given single I/O.
Best scheduler for database access and queries.
Bandwidth requirement of a process - what percentage of CPU it needs, is easily calculated.
Like noop, a good scheduler for solid state/flash drives.
When system is overloaded, set of processes that may miss deadline is largely unpredictable.
Completely Fair Queuing scheduler maintains a scalable per-process I/O queue and attempts to distribute the available I/O bandwidth equally among all I/O requests. Each per-process queue contains synchronous requests from processes. Time slice allocated for each queue depends on the priority of the 'parent' process. V2 of CFQ has some fixes which solves process' i/o starvation and some small backward seeks in the hope of improving responsiveness.
Considered to deliver a balanced i/o performance.
Easiest to tune.
Excels on multiprocessor systems.
Best database system performance after deadline.
Some users report media scanning takes longest to complete using CFQ. This could be because of the property that since the bandwidth is equally distributed to all i/o operations during boot-up, media scanning is not given any special priority.
Jitter (worst-case-delay) exhibited can sometimes be high, because of the number of tasks competing for the disk.
Instead of time slices allocation by CFQ, BFQ assigns budgets. Disk is granted to an active process until it's budget (number of sectors) expires. BFQ assigns high budgets to non-read tasks. Budget assigned to a process varies over time as a function of it's behavior.
Believed to be very good for usb data transfer rate.
Believed to be the best scheduler for HD video recording and video streaming. (because of less jitter as compared to CFQ and others)
Considered an accurate i/o scheduler.
Achieves about 30% more throughput than CFQ on most workloads.
Not the best scheduler for benchmarking.
Higher budget assigned to a process can affect interactivity and increased latency.
Simple I/O scheduler aims to keep minimum overhead to achieve low latency to serve I/O requests. No priority quesues concepts, but only basic merging. Sio is a mix between noop & deadline. No reordering or sorting of requests.
Simple, so reliable.
Minimized starvation of requests.
Slow random-read speeds on flash drives, compared to other schedulers.
Sequential-read speeds on flash drives also not so good.
Unlike other schedulers, synchronous and asynchronous requests are not treated separately, instead a deadline is imposed for fairness. The next request to be served is based on it's distance from last request.
May be best for benchmarking because at the peak of it's 'form' VR performs best.
Performance fluctuation results in below-average performance at times.
Least reliable/most unstable.
Based on two facts
i) Disk seeks are really slow.
ii) Write operations can happen whenever, but there is always some process waiting for read operation.
So anticipatory prioritize read operations over write. It anticipates synchronous read operations.
Read requests from processes are never starved.
As good as noop for read-performance on flash drives.
'Guess works' might not be always reliable.
Reduced write-performance on high performance disks.
Q. "Best I/O Scheduler?"
A.There is nothing called "best" i/o scheduler. Depending on your usage environment and tasks/apps been run, use different schedulers. That's the best i can suggest.
However, considering the overall performance, battery, reliability and low latency, it is believed that
SIO > Noop > Deadline > VR > BFQ > CFQ, given all schedulers are tweaked and the storage used is a flash device
Explanation of Different Governors
Pulled from this page here
Default governor in almost all stock kernels. One main goal of the ondemand governor is to switch to max frequency as soon as there is a CPU activity detected to ensure the responsiveness of the system. (You can change this behavior using smooth scaling parameters, refer Siyah tweaks at the end of 3rd post.) Effectively, it uses the CPU busy time as the answer to "how critical is performance right now" question. So Ondemand jumps to maximum frequency when CPU is busy and decreases the frequency gradually when CPU is less loaded/apporaching idle. Even though many of us consider this a reliable governor, it falls short on battery saving and performance on default settings. One potential reason for ondemand governor being not very power efficient is that the governor decide the next target frequency by instant requirement during sampling interval. The instant requirement can response quickly to workload change, but it does not usually reflect workload real CPU usage requirement in a small longer time and it possibly causes frequently change between highest and lowest frequency.
Basically an ondemand with suspend/wake profiles. This governor is supposed to be a battery friendly ondemand. When screen is off, max frequency is capped at 500 mhz. Even though ondemand is the default governor in many kernel and is considered safe/stable, the support for ondemand/ondemandX depends on CPU capability to do fast frequency switching which are very low latency frequency transitions. I have read somewhere that the performance of ondemand/ondemandx were significantly varying for different i/o schedulers. This is not true for most of the other governors. I personally feel ondemand/ondemandx goes best with SIO I/O scheduler.
A slower Ondemand which scales up slowly to save battery. The conservative governor is based on the ondemand governor. It functions like the Ondemand governor by dynamically adjusting frequencies based on processor utilization. However, the conservative governor increases and decreases CPU speed more gradually. Simply put, this governor increases the frequency step by step on CPU load and jumps to lowest frequency on CPU idle. Conservative governor aims to dynamically adjust the CPU frequency to current utilization, without jumping to max frequency. The sampling_down_factor value acts as a negative multiplier of sampling_rate to reduce the frequency that the scheduler samples the CPU utilization. For example, if sampling_rate equal to 20,000 and sampling_down_factor is 2, the governor samples the CPU utilization every 40,000 microseconds.
Can be considered a faster ondemand. So more snappier, less battery. Interactive is designed for latency-sensitive, interactive workloads. Instead of sampling at every interval like ondemand, it determines how to scale up when CPU comes out of idle. The governor has the following advantages: 1) More consistent ramping, because existing governors do their CPU load sampling in a workqueue context, but interactive governor does this in a timer context, which gives more consistent CPU load sampling. 2) Higher priority for CPU frequency increase, thus giving the remaining tasks the CPU performance benefit, unlike existing governors which schedule ramp-up work to occur after your performance starved tasks have completed. Interactive It's an intelligent Ondemand because of stability optimizations. Why??
Sampling the CPU load every X ms (like Ondemand) can lead to under-powering the CPU for X ms, leading to dropped frames, stuttering UI, etc. Instead of sampling the CPU at a specified rate, the interactive governor will check whether to scale the CPU frequency up soon after coming out of idle. When the CPU comes out of idle, a timer is configured to fire within 1-2 ticks. If the CPU is very busy between exiting idle and when the timer fires, then we assume the CPU is underpowered and ramp to max frequency.
This is an Interactive governor with a wake profile. More battery friendly than interactive.
This new find from Tegrak is based on Interactive & Smartass governors and is one of the favorites.
Old Version: When workload is greater than or equal to 60%, the governor scales up CPU to next higher step. When workload is less than 60%, governor scales down CPU to next lower step. When screen is off, frequency is locked to global scaling minimum frequency.
New Version: Three more user configurable parameters: inc_cpu_load, pump_up_step, pump_down_step. Unlike older version, this one gives more control for the user. We can set the threshold at which governor decides to scale up/down. We can also set number of frequency steps to be skipped while polling up and down.
When workload greater than or equal to inc_cpu_load, governor scales CPU pump_up_step steps up. When workload is less than inc_cpu_load, governor scales CPU down pump_down_step steps down.
If current frequency=200, Every up_sampling_time Us if cpu load >= 70%, cpu is scaled up 2 steps - to 800.
If current frequency =1200, Every down_sampling_time Us if cpu load < 70%, cpu is scaled down 1 step - to 1000.
Result of Erasmux rewriting the complete code of interactive governor. Main goal is to optimize battery life without comprising performance. Still, not as battery friendly as smartassV2 since screen-on minimum frequency is greater than frequencies used during screen-off. Smartass would jump up to highest frequency too often as well.
Version 2 of the original smartass governor from Erasmux. Another favorite for many a people. The governor aim for an "ideal frequency", and ramp up more aggressively towards this freq and less aggressive after. It uses different ideal frequencies for screen on and screen off, namely awake_ideal_freq and sleep_ideal_freq. This governor scales down CPU very fast (to hit sleep_ideal_freq soon) while screen is off and scales up rapidly to awake_ideal_freq (500 mhz for GS2 by default) when screen is on. There's no upper limit for frequency while screen is off (unlike Smartass). So the entire frequency range is available for the governor to use during screen-on and screen-off state. The motto of this governor is a balance between performance and battery.
Intellidemand aka Intelligent Ondemand from Faux is yet another governor that's based on ondemand. Unlike what some users believe, this governor is not the replacement for OC Daemon (Having different governors for sleep and awake). The original intellidemand behaves differently according to GPU usage. When GPU is really busy (gaming, maps, benchmarking, etc) intellidemand behaves like ondemand. When GPU is 'idling' (or moderately busy), intellidemand limits max frequency to a step depending on frequencies available in your device/kernel for saving battery. This is called browsing mode. We can see some 'traces' of interactive governor here. Frequency scale-up decision is made based on idling time of CPU. Lower idling time (<20%) causes CPU to scale-up from current frequency. Frequency scale-down happens at steps=5% of max frequency. (This parameter is tunable only in conservative, among the popular governors )
To sum up, this is an intelligent ondemand that enters browsing mode to limit max frequency when GPU is idling, and (exits browsing mode) behaves like ondemand when GPU is busy; to deliver performance for gaming and such. Intellidemand does not jump to highest frequency when screen is off.
This governor from Ezekeel is basically an ondemand with an additional parameter min_time_state to specify the minimum time CPU stays on a frequency before scaling up/down. The Idea here is to eliminate any instabilities caused by fast frequency switching by ondemand. Lazy governor polls more often than ondemand, but changes frequency only after completing min_time_state on a step overriding sampling interval. Lazy also has a screenoff_maxfreq parameter which when enabled will cause the governor to always select the maximum frequency while the screen is off.
Lagfree is similar to ondemand. Main difference is it's optimization to become more battery friendly. Frequency is gracefully decreased and increased, unlike ondemand which jumps to 100% too often. Lagfree does not skip any frequency step while scaling up or down. Remember that if there's a requirement for sudden burst of power, lagfree can not satisfy that since it has to raise cpu through each higher frequency step from current. Some users report that video playback using lagfree stutters a little.
Lionheart is a conservative-based governor which is based on samsung's update3 source. Tweaks comes from 1) Knzo 2) Morfic. The original idea comes from Netarchy. See here. The tunables (such as the thresholds and sampling rate) were changed so the governor behaves more like the performance one, at the cost of battery as the scaling is very aggressive.
To 'experience' Lionheart using conservative, try these tweaks:
sampling_rate:10000 or 20000 or 50000, whichever you feel is safer. (transition latency of the CPU is something below 10ms/10,000uS hence using 10,000 might not be safe).
Lionheart goes well with deadline i/o scheduler. When it comes to smoothness (not considering battery drain), a tuned conservative delivers more as compared to a tuned ondemand.
LionheartX is based on Lionheart but has a few changes on the tunables and features a suspend profile based on Smartass governor.
Similar to smartassV2. More aggressive ramping, so more performance, less battery.
Another smartassV2 based governor. Achieves good balance between performance & battery as compared to brazilianwax.
Instead of automatically determining frequencies, lets user set frequencies.
Locks max frequency to min frequency. Can not be used as a screen-on or even screen-off (if scaling min frequency is too low).
Sets min frequency as max frequency. Use this while benchmarking!
So, Governors can be categorized into 3/4 on a high level:
1.a) Ondemand Based:
Works on "ramp-up on high load" principle. CPU busy-time is taken into consideration for scaling decisions. Members: Ondemand, OndemandX, Intellidemand, Lazy, Lagfree.
1.b) Conservative Based:
Members: Conservative, Lionheart, LionheartX
2) Interactive Based:
Works on "make scaling decision when CPU comes out of idle-loop" principle. Members: Interactive, InteractiveX, Lulzactive, Smartass, SmartassV2, Brazilianwax, SavagedZen.
3) Weird Category:
Members: Userspace, Powersave, Performance.