[MODULE] KTweak - Backed by evidence

Search This thread

lapirado

Senior Member
Aug 25, 2015
382
189
39
OnePlus 3
Xiaomi Poco F3
Kind of noob issue: I'm using ktweak in my OP3 with MSM rom (Q). When I execute the script, he changes my governor to interactive. After some time (minutes/hours), when I go to check (on FKM), the governor changed itself to smartmax_eps. Its possible to make something to the interactive stick?
 

lividhen99

Senior Member
Jun 19, 2018
301
49
Kind of noob issue: I'm using ktweak in my OP3 with MSM rom (Q). When I execute the script, he changes my governor to interactive. After some time (minutes/hours), when I go to check (on FKM), the governor changed itself to smartmax_eps. Its possible to make something to the interactive stick?
If the kernel keeps resetting it, then that may be what's best for it. You may just want to double check that nothing else is setting it though.
 

Lehner82

Member
Nov 26, 2014
29
2
Script failed to run
balance log:
system/bin/sh: sh /data/user/0/com.draco.ktweak/files/script-balance.sh: inaccessible or not found

sh /data/user/0/com.draco.ktweak/files/script-balance.sh works if i put it in Termux
 

guijohn

Member
Oct 21, 2019
5
0
Please, can someone tell me whats meaning the "Branch" options? Like Balance, latency and throughput.
 

hyborian

Senior Member
Oct 25, 2017
1,188
372
Please, can someone tell me whats meaning the "Branch" options? Like Balance, latency and throughput.

From Telegram KTweak channel :

- balance: This should be the default. It still has performance and latency tweaks that should help out on most phones but without being too aggressive on either end.

- latency: Prioritizes latency over throughput and power. I recommend this for people who can sacrifice some throughput for a smoother UX.

- throughput: Prioritizes throughput over latency and power. Those who desperately need more raw performance should try this.

Additionally, I updated the KTweak app to select balance as the default branch.
 

robnitro

Senior Member
Feb 21, 2010
138
39
LG G8X ThinQ
Samsung Galaxy S21
I had reboots on the latest app and the script on Android 10 lg g8x. Tried too many times and got a decryption unsuccessful message which persisted and had to reset device ugh.

Removing the scheduler and the queues functions but using the sysctl tweaks works fine. I call it from tasker now.
 

Top Liked Posts

  • There are no posts matching your filters.
  • 94
    Another "kernel optimizer"?

    No. Well, yes. However, a "kernel optimizer" is a poor way to put it. KTweak performs kernel adjustments based on facts and evidence, unlike other optimizers with poorly written or heavily obfuscated code. For example:

    • LSpeed is almost 4000 lines long; completely unnecessary.
    • NFS Injector uses compiled binaries that are closed source... yuck. Not to mention the typos in the README. This one is hard to look at.
    • LKT sets random nonsensical build.props that likely don't even exist.
    • MAGNETAR uses (you guessed it) compiled binaries that install themselves to your /system/etc/ directory (???). Great idea, install an external closed source, compiled binary to the system partition.

    Need I go on?

    What's different about KTweak?
    Unlike other "kernel optimizers", KTweak is:

    • Concice, at around 200 lines long,
    • Entirely open source with no compiled components,
    • Backed by logic and evidence,
    • Designed by an experienced kernel developer,
    • Non-intrusive, being completely systemless.

    Benchmarks
    The following benchmarks were performed on a OnePlus 7 Pro running the stock kernel provided by the OEM on Android 10.

    hackbench -pTl 4000 (lower is better)
    Without KTweak: ~20-50 seconds on average
    With KTweak: ~4-6 seconds on average

    perf bench mem memcpy (lower is better) (average of 50 iters)
    Without KTweak: 14.01 ms
    With KTweak: 10.40 ms

    synthmark (voicemark) (higher is better)
    Without KTweak: 374.94
    With KTweak: 383.556

    synthmark (latencymark little) (lower is better)
    Without KTweak: 10
    With KTweak: 10

    synthmark (latencymark big) (lower is better)
    Without KTweak: 12
    With KTweak: 10

    The Tweaks
    In order to remain genuine, I have commited to explaining each and every kernel tweak that KTweak applies. Grab your coffee, this could take a while.


    kernel.perf_cpu_time_max_percent: 25 --> 5
    This is the maximum CPU time long perf event processing can take as a percentage. If this percentage is exceeded (meaning perf event processing used too much CPU time), the polling rate is throttled. This is reduced from 25% to 5%. We can afford inaccuracies with perf events in exchange for more time that a foreground task can use.

    kernel.sched_autogroup_enabled: 0 --> 1
    The Linux Kernel scheduler (CFS) distributes timeslices to each active task. For example, if the scheduling period is 10ms, and there are 5 tasks running, CFS will give each task 2ms of runtime for that scheduling cycle. However, this means that a SCHED_OTHER task may compete with a SCHED_FIFO task. Autogrouping groups task groups together during scheduling. For example, if the scheduling period is 10ms, and there are 6 SCHED_OTHER tasks running and 4 SCHED_FIFO tasks running, the SCHED_OTHER tasks will get 50% of the runtime and the SCHED_FIFO tasks will get the other 50%. For each task group, the timeslices are once again divided. The SCHED_FIFO tasks will get 12.5% runtime and the SCHED_OTHER tasks will get ~8.3% runtime. This usually offers better interactivity on multithreaded platforms. See scheduling priority documentation: https://man7.org/linux/man-pages/man7/sched.7.html See autogrouping off: https://www.youtube.com/watch?v=uk70SeGA7pg See autogrouping on: https://www.youtube.com/watch?v=prxInRdaNfc

    kernel.sched_enable_thread_grouping: 0 --> 1
    To my knowledge using the limited documentation of this tunable, this is basically autogrouping for thread groups.

    kernel.sched_child_runs_first: 0 --> 1
    When forking a child process from the parent, execute the child process before the parent process. This usually shaves down some latency on task initializations, since most of the time the child process is doing some form of heavy lifting.

    kernel.sched_downmigrate: 20 20
    Do not allow tasks to migrate back down to a lower-power CPU until the estimated CPU utilization would go below 20% on said CPU. This means tasks will stay on higher-performance CPUs for longer than usual.

    kernel.sched_upmigrate: 80 80
    Similar to the previous tunable, do not allow CPUs to migrate to the higher-performance CPUs unless the utilization goes above 80%.

    kernel.sched_group_downmigrate: 20
    The same as kernel.sched_downmigrate, except for whole task groups.

    kernel.sched_group_upmigrate: 80
    The same as kernel.sched_upmigrate, except for whole task groups.

    kernel.sched_tunable_scaling: 0
    This is more of a precaution than anything. Since the next few tunables will be scheduler timing related, we don't want the scheduler to scale our values for multiple CPUs, as we will be providing CPU-agnostic values.

    kernel.sched_latency_ns: 10000000 (10ms)
    Set the default scheduling period to 10ms. If this value is set too low, the scheduler will switch contexts too often, spending more time internally than executing the waiting tasks.

    kernel.sched_min_granularity_ns: 1000000 (1ms)
    Set the minimum task scheduling period to 1ms. With kernel.sched_latency_ns set to 1ms, this means that 10 tasks may execute within the 10ms scheduling period before we exceed it.

    kernel.sched_migration_cost_ns: 500000 (0.5ms) --> 1000000 (1ms)
    Increase the time that a task is considered to be cache hot. According to RedHat, increasing this tunable reduces the number of task migrations. This should reduce time spent balancing tasks and increase per-task performance. See RedHat: https://www.redhat.com/files/summit...tuning-of-Red-Hat-Enterprise-Linux-Part-1.pdf

    kernel.sched_min_task_util_for_boost: 25
    This value effects if tasks should be migrated to a higher performant CPU if it's utilization is above this amount. Allow tasks to be migrated upwards if the user is triggering a touch boost and the task is above 25% utilization.

    kernel.sched_min_task_util_for_colocation: 50
    This value is the same as the former, except it occurs when the user is not touching the screen. We shouldn't upmigrate tasks if the user isn't actively interacting with them (i.e. video streaming).

    kernel.sched_nr_migrate: 32 --> 64
    When migrating tasks between CPUs, allow the scheduler to migrate twice as many as usual. This should increase scheduling latency marginally, but increase the performance of SCHED_OTHER tasks.

    kernel.sched_schedstats: 1 --> 0
    Disable scheduler statistics accounting. This is just for debugging, but it adds overhead.

    kernel.sched_wakeup_granularity_ns: 1000000 (1ms) --> 5000000 (5ms)
    Require the current task to be surpassing the new task in vmruntime by 5ms instead of 1ms before preemption occurs. This should reduce jitter due to less frequent task interruptions.

    kernel.timer_migration: 1 --> 0
    Disable the migration of timers among CPUs. Usually, when a timer is created on one CPU, it would be able to be migrated to another CPU. However, this increases realtime latencies and scheduling interrupts. It can be turned off.

    net.ipv4.tcp_ecn: 2 --> 1
    Enable Explicit Congestion Notification for incoming and outgoing negotiations. This reduces packet losses.

    net.ipv4.tcp_fastopen: 3
    Enable data transmission during the SACK exchange point in TCP negotiation. This reduces packet latencies. Enable it for senders and receivers.

    net.ipv4.tcp_syncookies: 1 --> 0
    This tunable, when enabled, prevents denial of service attacks by allowing connection ACKs to be tracked. However, this is more-or-less unnecessary for a mobile device. It is more applicable for servers. Disable it.

    net.ipv4.tcp_timestamps: 1 --> 0
    RedHat claims that TCP timestamps may cause performance spikes due to time accounting code on high-performance connections. Disable it. See RedHat: https://access.redhat.com/documenta...ml/tuning_guide/reduce_tcp_performance_spikes

    vm.compact_unevictable_allowed: 1 --> 0
    Do not allow compaction of unevictable pages. With this set to 1, more compactions can happen at the cost of small page fault stalls. Turn this off to compact less but avoid aforementioned stalls.

    vm.dirty_background_ratio: 5 --> 10
    Start writing back dirty pages (pages that have been modified but not yet written to the disk) asynchronously at 10% memory dirtied instead of 5%. Writing dirty pages back too early can be inefficient and overutilize the storage device.

    vm.dirty_ratio: 20 --> 30
    This tunable is the same as the former, but it is the ceiling for synchronous dirty writeback, meaning all I/O will stall until all dirty pages are written out to the disk. We usually won't need to worry about hitting this value, as the background writeback can catch up before we reach 20% memory dirtied. But as a precaution (i.e. heavy file transfers), increase this value to a 30% ceiling to prevent visible system stalls. We are sacrificing available memory in exchange for a reduced change of a brief system stall.

    vm.dirty_expire_centisecs: 300 (3s) --> 1000 (10s)
    This is the longest that dirty pages can remain in the system before they are forcefully written out to the disk. By increasing this value, we can allow the dirty background writeback to take its time asynchronously, and avoid unnecessary writebacks that can clog the flusher thread.

    vm.dirty_writeback_centisecs: 500 (5s) --> 0 (0s)
    Do not periodically writeback data every 5 seconds. Instead, leave it to the dirty background writeback to wake up when the dirty memory of the system hits 10%. This allows the dirty pages to stay in memory for longer, possibly increasing cache locality as the page cache is still available in memory.

    vm.extfrag_threshold: 500 --> 750
    Compact memory more often, even if the memory allocation was estimated to be due to a low-memory status. This lets us put more data into RAM at the expense of running compation more often. This is a worthy tradeoff, as it reduces memory fragmentation, which is incredibly important for ZRAM.

    vm.oom_dump_tasks: 1 --> 0
    Do not dump debug information when (or if) we run out of memory. If we have a lot of tasks running, and are OOMing often, then this overhead can add up.

    vm.page-cluster: 3 --> 0
    Disable reading additional pages from the swap device (in most cases, ZRAM). This is the same philosophy as disabling readahead.

    vm.reap_mem_on_sigkill: 0 --> 1
    When we kill a task, clean its memory footprint to free up whatever amount of RAM it was consuming.

    vm.stat_interval: 1 --> 10
    Update /proc/stat information every 10 seconds instead of every second, reducing jitter on loaded systems.

    vm.swappiness: 100 --> 80
    Swap to ZRAM less often if we don't have to. ZRAM can become expensive due to constant compression and decompression. If we can keep some of the memory uncompressed in regular RAM, we can avoid that overhead.

    vm.vfs_cache_pressure: 100 --> 200
    This tunable controls the kernel's tendency to reclaim inodes and dentries over page cache. Inodes and dentries are information about file metadata and directory structures, while page cache is the actual cached contents of a file. By increasing this value to 200, we tell the kernel to prefer claiming inodes and dentries over the page cache, increasing the chance of a cache hit when referencing recently used data, while not polluting the RAM with less-important information.

    Next Buddy
    By scheduling the last woken task first, we can increase cache locality since that task is likely to touch the same data as before.

    No Strict Skip Buddy
    Usually, the scheduler will always choose to skip tasks that call yield(). However, these yeilding tasks may be of higher importance than the last or next buddy that are available. Do not always skip the skip buddy if we don't have to.

    No Nontask Capacity
    The scheduler decrements the perceived CPU capacity that longer the CPU has been idle for. This means that an idle CPU may be skipped during task placement, and a task can be grouped with a busier CPU. Disable this to improve task start latency.

    TTWU Queue
    Allow the scheduler to place tasks on their origin CPU, increasing cache locality if the CPU is non-local (i.e. a cache hit would definitely have been missed).

    Governor Tweaks
    • hispeed_load: 90 --> 80: Jump to a higher frequency if we are approaching the end of the frequency list, where a task may begin to starve or begin to stutter.
    • hispeed_freq: : Set the "higher freq" (referencing hispeed_load) to the maximum frequency available to take advantage of Race-To-Idle.
    • CAF CPU Boost Tweaks
    • input_boost_freq: 1.4 GHz (closest freq) as a generic, universal boost frequency to the little cluster.
    • input_boost_ms: 250 ms, not consuming too much power but boosting for important, interactive events such as clicking on things.

    I/O
    • iostats: 1 --> 0: Disable I/O statistics accounting, which adds overhead.
    • readahead: 0: Disable readahead, which is intended for disks with long seek times (HDD), whereas mobile devices use flash storage with zero seek time.
    • nr_requests: 128 --> 512: Allow more I/O requests to be issued before flushing the queue, slightly increasing latencies but allowing more requests to be executed before being put to sleep.
    • noop / none: Use a scheduler with little CPU overhead to reduce I/O latencies, which is essential for fast flash storage (eMMC & UFS).

    ZRAM
    ZRAM reduces disk wear by reducing disk writes, and also increases cache locality by allowing more data to fit in RAM at once. KTweak configures ZRAM to take up at most half of the available RAM on the system, which is a good ratio of RAM to ZRAM for a mobile device.


    Other Notes
    You should know that KTweak applies after 60s of uptime as to prevent Android's init from overwriting any values.

    Contact
    You can find me on telegram at @tytydraco. Feel free to email me at tylernij@gmail.com.

    Downloads
    All releases and the entire source code for KTweak is available on GitHub:

    Downloads

    XDA:DevDB Information
    KTweak, Tool/Utility for all devices (see above for details)

    Contributors
    tytydraco, tytydraco
    Source Code: https://github.com/tytydraco/ktweak


    Version Information
    Status: Stable
    Current Stable Version: v1.0.7
    Stable Release Date: 2020-08-16

    Created 2020-08-16
    Last Updated 2020-08-16
    15
    The goal of the module is battery or performance?

    It's definitely not tuned for battery, but it isn't tuned to be overly performance hungry. That means I'm not increasing CPU frequencies to achieve better performance, I'm tuning the scheduler (balancing tasks across CPUs) for efficiency at the cost of less debugging, and generally less "stability". By less stability, I mean I tune things with a low cost but a high return, if that makes much sense. Basically, I attempt efficiency without artificially inflating frequencies.
    13
    What are the requirements to use this? Root with Magisk is a given - but Linux kernel version, Android OS version, device, etc?

    The script adjusts only the tweaks that are compatible with your version. It contains tweaks for EAS, HMP, and supports 3.18 and above in testing. It likely supports even lower. Otherwise, it's totally universal.
    11
    I don't want profiles like battery etc...
    I like it how it is. I find the concept, a little "performance boost" (kernel tweaks based on facts) without loosing battery, very good.
    For those who wants battery, etc. then use another kernel and tweak it in a kernel app.
    But for me (Stock kernel) this here is perfect.
    That is my opinion.
    Thank you very much @tytydraco. Keep up the good work. :good:
    9
    KTweak now has an official Telegram channel for release information and changelogs: @ktweak