[COMMIT] [AOSP] JustArchi's ArchiDroid Optimizations V4.1 - Unleash the power!

Search This thread

JustArchi

Inactive Recognized Developer
Mar 7, 2013
8,740
38,810
Warsaw
Hello dear developers.

I'd like to share with you effect of nearly 300 hours spent on trying to optimize Android and push it to the limits.

In general. You should be already experienced in setting up your buildbox, using git, building AOSP/CyanogenMod/OmniROM from source and cherry-picking things from review/gerrit. Solving git conflicts would also be nice. If you don't know how to build your own ROM from source, this is not a something you can apply to your ROM. Also, as you probably noticed, this is not a something you can apply to already prebuilt ROM (stocks), as these optimizations are applied during compilation, so only AOSP roms, self-compiled from source may use this masterpiece.

So, what is it about? As we know, Android contains a bunch of low-level C/C++ code, which is compiled and acts as a backend for our java's frontend and android apps. Unfortunately, Google didn't put their best at focusing on optimization, so as a result we're using the same old flags set back in 2006 for Android Donut or anything which existed back then. As you guess, in 2006 we didn't have as powerful devices as now, we had to sacrifice performance for smaller code size, to fit to our little devices and run well on very low amount of memory. However, this is no longer a case, and by using newest compilers and properly setting flags, we can achieve something great.

You probably may heard of some developers claiming using of "O3 Flags" in their ROMs. Well, while this may be true, they've applied only to low-level ARM code, mostly used during kernel compilation. Additionally it overwrites O2 flag, which is already fast, so as you may guess, this is more likely a placebo effect and disappears right after you change the kernel. Take a look at the most cherry-picked "O3 Flags commit". You see big "-Os" in "TARGET_thumb_CFLAGS"? This is what I'm talking about.

However, the commit I'm about to present you is not a placebo effect, as it applies flags to everything what is compiled, and mostly important - target THUMB, about 90% of an Android.

Now I'll tell you some facts. We have three interesting optimization levels. Os, O2, O3. O2 enables all optimizations that do not involve a space-speed tradeoff. Os is similar to O2, but it disables all flags that increase code size. It also performs further optimizations to reduce code size to the minimum. O3 enables all O2 optimizations and some extra optimizations that like to increase code size and may, or may not, increase performance. If you want to ask if there's something more like O4, there is - Ofast, however it breaks IEEE standard and doesn't work with Android, as i.e. sqlite3 is not compatible with Ofast's -ffast-math flag. So no go for us.

Now here comes the fun part. Android by default is compiled with O2 flag for target ARM (about 10% of Android, mostly low-level parts) and Os flag for target THUMB (about 90% of Android, nearly everything apart from low-level parfts). Some guys think that Os is better than O2 or O3 because smaller code size is faster due to better fitting in cpu cache. Unfortunately, I proven that it is a myth. Os was good back in 2006, as only with this flag Google was able to compile Dalvik and it's virtual machine while keeping good amount of free memory and space on eMMC cards. As or now, we have plenty of space, plenty of ram, plenty of CPU power and still good old Os flag for 90% of Android.

I've made countless tests to find out what is the most efficient in terms of GCC optimization, two selected tests I am about to present you right now.

3j9IcCO.png

As you may noticed, I compiled whetstone.c benchmark using three different optimization flags - Os, O2 and O3. I set CPU to performance, maximum frequency, and I repeated each test additional two times, just to make sure that Android doesn't lie to me. Source code of this test is available here and you may download it, compile for our beloved Android and try yourself. As you can see O3 > O2 >> Os, Os performs about 2.5x times worse than O2, and about 3.0x times worse than O3.

But, of course. Android is not a freaking benchmark, it's operating system. We can't tell if things are getting better or worse according to a simple benchmark. I kept that in mind and provided community with JustArchi's Mysterious Builds for test. I gave both mysterious builds and didn't tell my users what is the mysterious change. Both builds have been compiled with the same toolchain, same version, same commits. The one and only mysterious change was the fact that every component compiled as target thumb (major portion of an android) has been optimized for speed (O3) in build #1 (experimental), and optimized for size (Os) in build #2 (normal behaviour). Check poll yourself, 9 votes on build 1 in terms of performance, and 1 vote on build 2. I decided that this and benchmark is enough to tell that O2/O3 for target thumb is something that we want.

Now it doesn't matter that match if you wish to use O2 or O3, but here is some comparison:
1. Kernel compiled with O2 has 4902 KB, with O3 4944 KB, so O3 is 42 KB bigger.
2. ROM compiled with O3 is 3 MB larger than O2 after zip compression. Fast overview: 97 binaries in /system/bin and 2 binaries in /system/xbin + 283 libraries in /system/lib and other files, about 400 files in total. 3 MB / 400 = 7,5 KB per file size increase.
3. It's unlikely that code working properly with O2 level might break on O3 level, most issues are on the Os <-> O2 part.
4. If it doesn't cause any issues, and speeds up a binary by a little bit, why not use it?
5. The only real reason to not use O3 is potential higher memory usage due to oversized binaries.

In general, I doubt that this extra chunk of code may cause any significant memory usage or slower performance. I suggest to use O3 if it doesn't cause any issues to you compared to O2, but older devices may use O2 purely for saving on code size, similar way Google did it back in 2006 using Os flag.

[SIZE="+1"]Now let's get down to business[/SIZE].

Here is a list of important improvements:
- Optimized for speed yet more all instructions - ARM and THUMB (-O3)
- Optimized for speed also parts which are compiled with Clang (-O3)
- Turned off all debugging code (lack of -g)
- Eliminated redundant loads that come after stores to the same memory location, both partial and full redundancies (-fgcse-las)
- Ran a store motion pass after global common subexpression elimination. This pass attempts to move stores out of loops (-fgcse-sm)
- Enabled the identity transformation for graphite. For every SCoP we generate the polyhedral representation and transform it back to gimple. We can then check the costs or benefits of the GIMPLE -> GRAPHITE -> GIMPLE transformation. Some minimal optimizations are also performed by the code generator ISL, like index splitting and dead code elimination in loops (-fgraphite -fgraphite-identity)
- Performed interprocedural pointer analysis and interprocedural modification and reference analysis (-fipa-pta)
- Performed induction variable optimizations (strength reduction, induction variable merging and induction variable elimination) on trees (-fivopts)
- Didn't keep the frame pointer in a register for functions that don't need one. This avoids the instructions to save, set up and restore frame pointers; it also makes an extra register available in many functions (-fomit-frame-pointer)
- Attempted to avoid false dependencies in scheduled code by making use of registers left over after register allocation. This optimization most benefits processors with lots of registers (-frename-registers)
- Tried to reduce the number of symbolic address calculations by using shared “anchor” symbols to address nearby objects. This transformation can help to reduce the number of GOT entries and GOT accesses on some targets (-fsection-anchors)
- Performed tail duplication to enlarge superblock size. This transformation simplifies the control flow of the function allowing other optimizations to do a better job (-ftracer)
- Performed loop invariant motion on trees. It also moved operands of conditions that are invariant out of the loop, so that we can use just trivial invariantness analysis in loop unswitching. The pass also includes store motion (-ftree-loop-im)
- Created a canonical counter for number of iterations in loops for which determining number of iterations requires complicated analysis. Later optimizations then may determine the number easily (-ftree-loop-ivcanon)
- Assumed that loop indices do not overflow, and that loops with nontrivial exit condition are not infinite. This enables a wider range of loop optimizations even if the loop optimizer itself cannot prove that these assumptions are valid (-funsafe-loop-optimizations)
- Moved branches with loop invariant conditions out of the loop (-funswitch-loops)
- Constructed webs as commonly used for register allocation purposes and assigned each web individual pseudo register. This allows the register allocation pass to operate on pseudos directly, but also strengthens several other optimization passes, such as CSE, loop optimizer and trivial dead code remover (-fweb)
- Sorted the common symbols by alignment in descending order. This is to prevent gaps between symbols due to alignment constraints (-Wl,--sort-common)

Sound cool, doesn't it? Head over to my ArchiDroid project and see yourself how people react after switching to my ROM. Take a look at just one small example, or another one :). No bullsh*t guys, this is not a placebo.

However, please read my commit carefully before you decide to cherry-pick it. You must understand that Google's flags weren't touched since 7 years and nobody can assure you that they will work properly for your ROM and your device. You may experiment with them a bit to find out if they're not causing conflicts or other issues.

I can assure you that my ArchiDroid based on CM compiles fine with suggested steps written in the commit itself. Just don't forget to clean ccache (rm -rf /home/youruser/.ccache or rm -rf /root/.ccache) and make clean/clobber.

You can use, modify and share my commit anyway you want, just please keep proper credits in changelogs and in the repo itself. If you feel generous, you may also buy me a coke for massive amount of hours put into those experiments.

Now go ahead and show your users how things should be done :cool:.

Cherry-picking time!

Android "Lollipop" (5.1.1 & 5.0.2 tested)
JustArchi's ArchiDroid Optimizations V4.1 for CyanogenMod (latest)

A set of commits you may want to pick to fix O3-related issues:
external_bluetooth_bluedroid | hardware_qcom_display | libcore | frameworks_av #1 | frameworks_av #2

Older entries are provided for reference only. I suggest using only latest commit above.

Android "Lollipop" (5.1.1 & 5.0.2 tested)
JustArchi's ArchiDroid Optimizations V4 for CyanogenMod

Android "Kitkat" 4.4.4:
JustArchi's ArchiDroid Optimizations V3 for CyanogenMod
JustArchi's ArchiDroid Optimizations V3 for OmniROM
JustArchi's ArchiDroid Optimizations V2
JustArchi's ArchiDroid Optimizations V1

AFTER applying above commit and AFTER EVERY CHANGE regarding flags, ALWAYS make clean/clobber AND empty ccache (rm -rf ~/.ccache)


Q: How to properly change toolchains used in local manifest?
Open from your source rootdir .repo/local_manifests/roomservice.xml (or create one). Here is a sample manifest that replaces default 4.8 toolchain (both eabi and androideabi) with 4.8 SaberMod and 4.9 ArchiToolchain:
Code:
<?xml version="1.0" encoding="UTF-8"?>
<manifest>
<remove-project name="platform/prebuilts/gcc/linux-x86/arm/arm-eabi-4.8" />
<project name="ArchiDroid/Toolchain" path="prebuilts/gcc/linux-x86/arm/arm-eabi-4.8" remote="github" revision="architoolchain-5.2-arm-linux-gnueabihf" />
<remove-project name="platform/prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.8" />
<project name="ArchiDroid/Toolchain" path="prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.8" remote="github" revision="uber-4.9-arm-linux-androideabi" />
</manifest>

This is only an example, you should use the toolchains that suit you best. My ArchiDroid/Toolchain github repo is a good start to test various different toolchains and decide which one you like the most, or which one causes the least problems for you. I do not suggest any other magic tricks to include custom toolchains, putting your selected one in proper path is enough, avoid magic android_build modifications.


[size=+1]Troubleshooting[/size]

Q: Compiler errror:
Code:
(...)/prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.8/bin/../libexec/gcc/arm-linux-androideabi/4.8.x-sabermod/cc1: error while loading shared libraries: libcloog-isl.so.4: cannot open shared object file: No such file or directory

This error can be fixed by installing missing library. libcloog-isl.so.4 is provided by libcloog-isl4 package, so on debian-like OSes, you should be able to fix it with:
Code:
apt-get install libcloog-isl4

Q: Compiler errror:
Code:
(...)/prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.8/bin/../libexec/gcc/arm-linux-androideabi/4.8.x-sabermod/cc1: error while loading shared libraries: libisl.so.13: cannot open shared object file: No such file or directory

This error is very similar to above, but considers other shared library. libisl.so.13 is provided by libisl13 package. Now the problem is that this package is in testing/sid, so we'll need to install it from there.

Add to your /etc/apt/sources.list following entries:
Code:
deb http://ftp.debian.org/debian testing main contrib non-free
deb-src http://ftp.debian.org/debian testing main contrib non-free

Then apt-get update && apt-get install libisl13.


Issues below are for older commits and should be used for reference only


Kitkat THUMB O2+ errors?
These are the most common issues.
* Change -O3 flag from TARGET_thumb_CFLAGS back to -Os, make clean/clobber, empty ccache and try again. This fixes most of the issues.
* RIL problems for for the Exynos 4210 family? Add -fno-tree-vectorize to TARGET_thumb_CFLAGS.
* Broken exFAT -> https://github.com/JustArchi/android_external_fuse/commit/78ebbc4404de260862dca5f0454bffccee650e0d

Errors caused by toolchain?
1. Try Google's GCC 4.8 if you used Linaro 4.8 or SaberMod 4.8
2. Fallback to Google's GCC 4.7 if above didn't help (change TARGET_GCC_VERSION back to 4.7)

Errors caused by GCC 4.8+?
* ART Fix (bootloop) -> https://github.com/JustArchi/android_art/commit/71a0ca3057cc3865bd8e41dcb94443998d028407
* Not booting kernel -> https://github.com/JustArchi/androi...mmit/9262707f4ea471acf40baa43ffe4bfb3cff64de9 and https://github.com/JustArchi/androi...mmit/41a70abcdad746d9415f3ee40f90528feb0c9bdd

Errors caused by GCC 4.9+?
* Graphical glitches in PlayStore -> https://github.com/JustArchi/android_external_webp/commit/36c6201fbb108d6b757f860e2cd57f3982191662

Errors caused by Linaro?
* error: unknown CPU architecture -> https://github.com/JustArchi/androi...mmit/5e5e158a7c147725beae1eeb6785174baacecb03 (Keep in mind that this is a sample fix for smdk4412 kernel, you may need to use similar solution in your own case. Also, this error happens only with Linaro toolchain, doesn't happen with Google's GCC)

Other errors?
* error: undefined reference to 'memmove' -> https://github.com/XperiaSTE/androi...mmit/679a4e571ef77f47892a785e852d8219c1e6807a


[size=+1]Credits[/size]
@IAmTheOneTheyCallNeo - For inspiration and first steps
@metalspring - For some nice commits
@sparksco - For SaberMod, some nice commits and support for the optimization idea
 
Last edited:

Codename13

Senior Member
Jun 22, 2012
935
1,305
Can I use a 4.7 toolchain with the ArchiDroid optimizations? I've tried compiling my kernel with different 4.8 toolchains before, and it's always resulted in boot loops.
 

ayoubij

Senior Member
Jan 28, 2013
2,565
751
Tripoli
great work archi !! also after porting a fully working 4.4.2 touchwiz to i9300 is it possible to make aosp for i9300 more stable now? :)
 

dragonnn

Senior Member
Oct 16, 2011
1,136
861
@JustArchi which flags could be used for kernel compiling? And where should I put it in in the Makefile? I don't compile the complete ROM because I have to low machine for that, but I am developing a custom kernel for my Xperia Z. BTW Fajnie trafić na innych Polaków ;)
 

pianistaPL

Senior Member
Feb 15, 2012
2,301
1,249
Poznań - Poland
[B @JustArchi[/B] which flags could be used for kernel compiling? And where should I put it in in the Makefile? I don't compile the complete ROM because I have to low machine for that, but I am developing a custom kernel for my Xperia Z. BTW Fajnie trafić na innych Polaków ;)

Ktoś tu mówił o Polakach? :D
Świetna robota @JustArchi :highfive:
 
  • Like
Reactions: dragonnn

JustArchi

Inactive Recognized Developer
Mar 7, 2013
8,740
38,810
Warsaw
@JustArchi which flags could be used for kernel compiling? And where should I put it in in the Makefile? I don't compile the complete ROM because I have to low machine for that, but I am developing a custom kernel for my Xperia Z. BTW Fajnie trafić na innych Polaków ;)

If you have whole ROM tree then you cherry-pick this commit, lunch your target and make bootimage. This is enough.

If you have standalone kernel, take a look at main Makefile :).
 

Top Liked Posts

  • There are no posts matching your filters.
  • 695
    Hello dear developers.

    I'd like to share with you effect of nearly 300 hours spent on trying to optimize Android and push it to the limits.

    In general. You should be already experienced in setting up your buildbox, using git, building AOSP/CyanogenMod/OmniROM from source and cherry-picking things from review/gerrit. Solving git conflicts would also be nice. If you don't know how to build your own ROM from source, this is not a something you can apply to your ROM. Also, as you probably noticed, this is not a something you can apply to already prebuilt ROM (stocks), as these optimizations are applied during compilation, so only AOSP roms, self-compiled from source may use this masterpiece.

    So, what is it about? As we know, Android contains a bunch of low-level C/C++ code, which is compiled and acts as a backend for our java's frontend and android apps. Unfortunately, Google didn't put their best at focusing on optimization, so as a result we're using the same old flags set back in 2006 for Android Donut or anything which existed back then. As you guess, in 2006 we didn't have as powerful devices as now, we had to sacrifice performance for smaller code size, to fit to our little devices and run well on very low amount of memory. However, this is no longer a case, and by using newest compilers and properly setting flags, we can achieve something great.

    You probably may heard of some developers claiming using of "O3 Flags" in their ROMs. Well, while this may be true, they've applied only to low-level ARM code, mostly used during kernel compilation. Additionally it overwrites O2 flag, which is already fast, so as you may guess, this is more likely a placebo effect and disappears right after you change the kernel. Take a look at the most cherry-picked "O3 Flags commit". You see big "-Os" in "TARGET_thumb_CFLAGS"? This is what I'm talking about.

    However, the commit I'm about to present you is not a placebo effect, as it applies flags to everything what is compiled, and mostly important - target THUMB, about 90% of an Android.

    Now I'll tell you some facts. We have three interesting optimization levels. Os, O2, O3. O2 enables all optimizations that do not involve a space-speed tradeoff. Os is similar to O2, but it disables all flags that increase code size. It also performs further optimizations to reduce code size to the minimum. O3 enables all O2 optimizations and some extra optimizations that like to increase code size and may, or may not, increase performance. If you want to ask if there's something more like O4, there is - Ofast, however it breaks IEEE standard and doesn't work with Android, as i.e. sqlite3 is not compatible with Ofast's -ffast-math flag. So no go for us.

    Now here comes the fun part. Android by default is compiled with O2 flag for target ARM (about 10% of Android, mostly low-level parts) and Os flag for target THUMB (about 90% of Android, nearly everything apart from low-level parfts). Some guys think that Os is better than O2 or O3 because smaller code size is faster due to better fitting in cpu cache. Unfortunately, I proven that it is a myth. Os was good back in 2006, as only with this flag Google was able to compile Dalvik and it's virtual machine while keeping good amount of free memory and space on eMMC cards. As or now, we have plenty of space, plenty of ram, plenty of CPU power and still good old Os flag for 90% of Android.

    I've made countless tests to find out what is the most efficient in terms of GCC optimization, two selected tests I am about to present you right now.

    3j9IcCO.png

    As you may noticed, I compiled whetstone.c benchmark using three different optimization flags - Os, O2 and O3. I set CPU to performance, maximum frequency, and I repeated each test additional two times, just to make sure that Android doesn't lie to me. Source code of this test is available here and you may download it, compile for our beloved Android and try yourself. As you can see O3 > O2 >> Os, Os performs about 2.5x times worse than O2, and about 3.0x times worse than O3.

    But, of course. Android is not a freaking benchmark, it's operating system. We can't tell if things are getting better or worse according to a simple benchmark. I kept that in mind and provided community with JustArchi's Mysterious Builds for test. I gave both mysterious builds and didn't tell my users what is the mysterious change. Both builds have been compiled with the same toolchain, same version, same commits. The one and only mysterious change was the fact that every component compiled as target thumb (major portion of an android) has been optimized for speed (O3) in build #1 (experimental), and optimized for size (Os) in build #2 (normal behaviour). Check poll yourself, 9 votes on build 1 in terms of performance, and 1 vote on build 2. I decided that this and benchmark is enough to tell that O2/O3 for target thumb is something that we want.

    Now it doesn't matter that match if you wish to use O2 or O3, but here is some comparison:
    1. Kernel compiled with O2 has 4902 KB, with O3 4944 KB, so O3 is 42 KB bigger.
    2. ROM compiled with O3 is 3 MB larger than O2 after zip compression. Fast overview: 97 binaries in /system/bin and 2 binaries in /system/xbin + 283 libraries in /system/lib and other files, about 400 files in total. 3 MB / 400 = 7,5 KB per file size increase.
    3. It's unlikely that code working properly with O2 level might break on O3 level, most issues are on the Os <-> O2 part.
    4. If it doesn't cause any issues, and speeds up a binary by a little bit, why not use it?
    5. The only real reason to not use O3 is potential higher memory usage due to oversized binaries.

    In general, I doubt that this extra chunk of code may cause any significant memory usage or slower performance. I suggest to use O3 if it doesn't cause any issues to you compared to O2, but older devices may use O2 purely for saving on code size, similar way Google did it back in 2006 using Os flag.

    [SIZE="+1"]Now let's get down to business[/SIZE].

    Here is a list of important improvements:
    - Optimized for speed yet more all instructions - ARM and THUMB (-O3)
    - Optimized for speed also parts which are compiled with Clang (-O3)
    - Turned off all debugging code (lack of -g)
    - Eliminated redundant loads that come after stores to the same memory location, both partial and full redundancies (-fgcse-las)
    - Ran a store motion pass after global common subexpression elimination. This pass attempts to move stores out of loops (-fgcse-sm)
    - Enabled the identity transformation for graphite. For every SCoP we generate the polyhedral representation and transform it back to gimple. We can then check the costs or benefits of the GIMPLE -> GRAPHITE -> GIMPLE transformation. Some minimal optimizations are also performed by the code generator ISL, like index splitting and dead code elimination in loops (-fgraphite -fgraphite-identity)
    - Performed interprocedural pointer analysis and interprocedural modification and reference analysis (-fipa-pta)
    - Performed induction variable optimizations (strength reduction, induction variable merging and induction variable elimination) on trees (-fivopts)
    - Didn't keep the frame pointer in a register for functions that don't need one. This avoids the instructions to save, set up and restore frame pointers; it also makes an extra register available in many functions (-fomit-frame-pointer)
    - Attempted to avoid false dependencies in scheduled code by making use of registers left over after register allocation. This optimization most benefits processors with lots of registers (-frename-registers)
    - Tried to reduce the number of symbolic address calculations by using shared “anchor” symbols to address nearby objects. This transformation can help to reduce the number of GOT entries and GOT accesses on some targets (-fsection-anchors)
    - Performed tail duplication to enlarge superblock size. This transformation simplifies the control flow of the function allowing other optimizations to do a better job (-ftracer)
    - Performed loop invariant motion on trees. It also moved operands of conditions that are invariant out of the loop, so that we can use just trivial invariantness analysis in loop unswitching. The pass also includes store motion (-ftree-loop-im)
    - Created a canonical counter for number of iterations in loops for which determining number of iterations requires complicated analysis. Later optimizations then may determine the number easily (-ftree-loop-ivcanon)
    - Assumed that loop indices do not overflow, and that loops with nontrivial exit condition are not infinite. This enables a wider range of loop optimizations even if the loop optimizer itself cannot prove that these assumptions are valid (-funsafe-loop-optimizations)
    - Moved branches with loop invariant conditions out of the loop (-funswitch-loops)
    - Constructed webs as commonly used for register allocation purposes and assigned each web individual pseudo register. This allows the register allocation pass to operate on pseudos directly, but also strengthens several other optimization passes, such as CSE, loop optimizer and trivial dead code remover (-fweb)
    - Sorted the common symbols by alignment in descending order. This is to prevent gaps between symbols due to alignment constraints (-Wl,--sort-common)

    Sound cool, doesn't it? Head over to my ArchiDroid project and see yourself how people react after switching to my ROM. Take a look at just one small example, or another one :). No bullsh*t guys, this is not a placebo.

    However, please read my commit carefully before you decide to cherry-pick it. You must understand that Google's flags weren't touched since 7 years and nobody can assure you that they will work properly for your ROM and your device. You may experiment with them a bit to find out if they're not causing conflicts or other issues.

    I can assure you that my ArchiDroid based on CM compiles fine with suggested steps written in the commit itself. Just don't forget to clean ccache (rm -rf /home/youruser/.ccache or rm -rf /root/.ccache) and make clean/clobber.

    You can use, modify and share my commit anyway you want, just please keep proper credits in changelogs and in the repo itself. If you feel generous, you may also buy me a coke for massive amount of hours put into those experiments.

    Now go ahead and show your users how things should be done :cool:.

    Cherry-picking time!

    Android "Lollipop" (5.1.1 & 5.0.2 tested)
    JustArchi's ArchiDroid Optimizations V4.1 for CyanogenMod (latest)

    A set of commits you may want to pick to fix O3-related issues:
    external_bluetooth_bluedroid | hardware_qcom_display | libcore | frameworks_av #1 | frameworks_av #2

    Older entries are provided for reference only. I suggest using only latest commit above.

    Android "Lollipop" (5.1.1 & 5.0.2 tested)
    JustArchi's ArchiDroid Optimizations V4 for CyanogenMod

    Android "Kitkat" 4.4.4:
    JustArchi's ArchiDroid Optimizations V3 for CyanogenMod
    JustArchi's ArchiDroid Optimizations V3 for OmniROM
    JustArchi's ArchiDroid Optimizations V2
    JustArchi's ArchiDroid Optimizations V1

    AFTER applying above commit and AFTER EVERY CHANGE regarding flags, ALWAYS make clean/clobber AND empty ccache (rm -rf ~/.ccache)


    Q: How to properly change toolchains used in local manifest?
    Open from your source rootdir .repo/local_manifests/roomservice.xml (or create one). Here is a sample manifest that replaces default 4.8 toolchain (both eabi and androideabi) with 4.8 SaberMod and 4.9 ArchiToolchain:
    Code:
    <?xml version="1.0" encoding="UTF-8"?>
    <manifest>
    <remove-project name="platform/prebuilts/gcc/linux-x86/arm/arm-eabi-4.8" />
    <project name="ArchiDroid/Toolchain" path="prebuilts/gcc/linux-x86/arm/arm-eabi-4.8" remote="github" revision="architoolchain-5.2-arm-linux-gnueabihf" />
    <remove-project name="platform/prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.8" />
    <project name="ArchiDroid/Toolchain" path="prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.8" remote="github" revision="uber-4.9-arm-linux-androideabi" />
    </manifest>

    This is only an example, you should use the toolchains that suit you best. My ArchiDroid/Toolchain github repo is a good start to test various different toolchains and decide which one you like the most, or which one causes the least problems for you. I do not suggest any other magic tricks to include custom toolchains, putting your selected one in proper path is enough, avoid magic android_build modifications.


    [size=+1]Troubleshooting[/size]

    Q: Compiler errror:
    Code:
    (...)/prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.8/bin/../libexec/gcc/arm-linux-androideabi/4.8.x-sabermod/cc1: error while loading shared libraries: libcloog-isl.so.4: cannot open shared object file: No such file or directory

    This error can be fixed by installing missing library. libcloog-isl.so.4 is provided by libcloog-isl4 package, so on debian-like OSes, you should be able to fix it with:
    Code:
    apt-get install libcloog-isl4

    Q: Compiler errror:
    Code:
    (...)/prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.8/bin/../libexec/gcc/arm-linux-androideabi/4.8.x-sabermod/cc1: error while loading shared libraries: libisl.so.13: cannot open shared object file: No such file or directory

    This error is very similar to above, but considers other shared library. libisl.so.13 is provided by libisl13 package. Now the problem is that this package is in testing/sid, so we'll need to install it from there.

    Add to your /etc/apt/sources.list following entries:
    Code:
    deb http://ftp.debian.org/debian testing main contrib non-free
    deb-src http://ftp.debian.org/debian testing main contrib non-free

    Then apt-get update && apt-get install libisl13.


    Issues below are for older commits and should be used for reference only


    Kitkat THUMB O2+ errors?
    These are the most common issues.
    * Change -O3 flag from TARGET_thumb_CFLAGS back to -Os, make clean/clobber, empty ccache and try again. This fixes most of the issues.
    * RIL problems for for the Exynos 4210 family? Add -fno-tree-vectorize to TARGET_thumb_CFLAGS.
    * Broken exFAT -> https://github.com/JustArchi/android_external_fuse/commit/78ebbc4404de260862dca5f0454bffccee650e0d

    Errors caused by toolchain?
    1. Try Google's GCC 4.8 if you used Linaro 4.8 or SaberMod 4.8
    2. Fallback to Google's GCC 4.7 if above didn't help (change TARGET_GCC_VERSION back to 4.7)

    Errors caused by GCC 4.8+?
    * ART Fix (bootloop) -> https://github.com/JustArchi/android_art/commit/71a0ca3057cc3865bd8e41dcb94443998d028407
    * Not booting kernel -> https://github.com/JustArchi/androi...mmit/9262707f4ea471acf40baa43ffe4bfb3cff64de9 and https://github.com/JustArchi/androi...mmit/41a70abcdad746d9415f3ee40f90528feb0c9bdd

    Errors caused by GCC 4.9+?
    * Graphical glitches in PlayStore -> https://github.com/JustArchi/android_external_webp/commit/36c6201fbb108d6b757f860e2cd57f3982191662

    Errors caused by Linaro?
    * error: unknown CPU architecture -> https://github.com/JustArchi/androi...mmit/5e5e158a7c147725beae1eeb6785174baacecb03 (Keep in mind that this is a sample fix for smdk4412 kernel, you may need to use similar solution in your own case. Also, this error happens only with Linaro toolchain, doesn't happen with Google's GCC)

    Other errors?
    * error: undefined reference to 'memmove' -> https://github.com/XperiaSTE/androi...mmit/679a4e571ef77f47892a785e852d8219c1e6807a


    [size=+1]Credits[/size]
    @IAmTheOneTheyCallNeo - For inspiration and first steps
    @metalspring - For some nice commits
    @sparksco - For SaberMod, some nice commits and support for the optimization idea
    33
    Hello dear developers.

    I'd like to share with you effect of nearly 200 hours spent on trying to optimize Android and push it to the limits.

    In general. You should be already experienced in setting up your buildbox, using git, building AOSP/CyanogenMod/OmniROM from source and cherry-picking things from review/gerrit. If you don't know how to build your own ROM from source, this is not a something you can apply to your ROM. Also, as you probably noticed, this is not a something you can apply to every ROM, as these optimizations are applied during compilation, so only AOSP roms, self-compiled from source may use this masterpiece.

    So, what is it about? As we know, Android contains a bunch of low-level C/C++ code, which is compiled and acts as a backend for our java's frontend and android apps. Unfortunately, Google didn't put their best at focusing on optimization, so as a result we're using the same old flags set back in 2006 for Android Donut or anything which existed back then. As you guess, in 2006 we didn't have as powerful devices as now, we had to sacrifice performance for smaller code size, to fit to our little devices and run well on very low amount of memory. However, this is no longer a case, and by using newest compilers such as GCC 4.8 and properly setting flags, we can achieve something, which I call "Android in 2014".

    You probably may heard of some developers claiming using of "O3 Flags" in their ROMs. Well, while this may be true, they've applied only to low-level ARM code, mostly used during kernel compilation. Additionally it overwrites O2 flag, which is already fast, so as you may guess, this is more likely a placebo effect and disappears right after you change the kernel. Take a look at the most cherry-picked "O3 Flags commit". You see big "-Os" in "TARGET_thumb_CFLAGS"? This is what I'm talking about.

    However, the commit I'm about to present you is not a placebo effect, as it applies flags to everything what is compiled, and mostly important - target THUMB, about 90% of an Android.

    Now I'll tell you some facts. We have three interesting optimization levels. Os, O2, O3. O2 enables all optimizations that do not involve a space-speed tradeoff. Os is similar to O2, but it disables all flags that increase code size. It also performs further optimizations to reduce code size to the minimum. O3 enables all O2 optimizations and some extra optimizations that like to increase code size and may, or may not, increase performance. If you want to ask if there's something more like O4, there is - Ofast, however it breaks IEEE standard and doesn't work with Android, as i.e. sqlite3 is not compatible with Ofast's -ffast-math flag. So no go for us.

    Now here comes the fun part. Android by default is compiled with O2 flag for target ARM (about 10% of Android, mostly kernel) and Os flag for target THUMB (about 90% of Android, nearly everything apart from kernel). Some guys think that Os is better than O2 or O3 because smaller code size is faster due to bettering fitting in cpu cache. Unfortunately, I proven that it is a myth. Os was good back in 2006, as only with this flag Google was able to compile Dalvik and it's virtual machine while keeping good amount of free memory and space on eMMC cards. As or now, we have plenty of space, plenty of ram, plenty of CPU power and still good old Os flag for 90% of Android.

    Now you should ask - where is your proof?, here I have it for you:

    Screenshot_2014-03-07-23-54-52.png

    As you may noticed, I compiled whetstone.c benchmark using three different optimization flags - Os, O2 and O3. I repeated each test additional two times, just to make sure that Android doesn't lie to me. Source code of this test is available here and you may download it, compile for our beloved Android and try yourself. As you can see O3 > O2 >> Os, Os performs about 2.5x times worse than O2, and about 3.0x times worse than O3.

    But, of course. Android is not a freaking benchmark, it's operating system. We can't tell if things are getting better or worse according to a simple benchmark. I kept that in mind and provided community with JustArchi's Mysterious Builds for test. I gave both mysterious builds and didn't tell them what is the mysterious change. Both builds have been compiled with the same toolchain, same version, same commits. The one and only mysterious change was the fact that every component compiled as target thumb (major portion of an android) has been optimized for speed (O3) in build #1 (experimental), and optimized for size (Os) in build #2 (normal behaviour). Check poll yourself, 9 votes on build 1 in terms of performance, and 1 vote on build 2. I decided that this and benchmark is enough to tell that O2/O3 for target thumb is something that we want.

    Now the battle is, O2 or O3? This is tough choice, here are some facts:
    1. Kernel compiled with O2 has 4902 KB, with O3 4944 KB, so O3 is 42 KB bigger.
    2. ROM compiled with O3 is 3 MB larger than O2 after zip compression. Fast overview: 97 binaries in /system/bin and 2 binaries in /system/xbin + 283 libraries in /system/lib and other files, about 400 files in total. 3 MB / 400 = 7,5 KB per file size increase.
    3. No issues

    In general, I doubt that this extra chunk of code may cause any significant memory usage or slower performance. I suggest to use O3 if it doesn't cause any issues to you compared to O2, but older devices may use O2 purely for saving on code size, similar way Google did it back in 2006 using Os flag.

    [SIZE="+1"]Now let's get down to bussiness[/SIZE].
    Here is a link to the commit -> https://github.com/JustArchi/android_build/commit/8e1b82c082a8de9160e6c0fc3ded37b591c3e517

    And here is a list of important improvements:


    Looks badass? It is badass. Head over to my ArchiDroid 2.X project and see yourself how people react after switching to my ROM. Take a look at just one small example, or another one :). No bullsh*t guys, this is future.

    However, please read my commit carefully before you decide to cherry-pick it. You must understand that Google's flags weren't touched since 7 years and nobody can assure you that they will work properly for your ROM and your device. You may experiment with them a bit to find out if they're not causing conflicts or other issues.

    I can assure you that my OmniROM build compiles fine with some fixes mentioned in the commit itself. Just don't forget to clean ccache (rm -rf /home/youruser/.ccache or rm -rf /root/.ccache) and make clean/clobber.

    You can use, modify and share my commit anyway you want, just please keep proper credits in changelogs and in the repo itself. If you feel generous, you may also buy me a coke for massive amount of hours put into those experiments.

    Now go ahead and show your users how things should be done :cool:.

    I do agree that the most cherry-picked commit you linked is terrible, and while this is some nice work you've done, it's misleading and not entirely true.

    Android is NOT 90% thumb. I'd give it a maximum of 50%, and that's being extremely generous. On Linaro's website I recall reading about potential future experiments for them, and one of them was to see what else could be improved by using thumb. If Android is already 90% thumb, that'd be a useless task and you might as well compile everything as thumb, and looking at my build logs there's plenty more things compiled as ARM than there are thumb.
    I agree that the people saying usage of -Os is good are way out of date and stuck in the past. With older ARM chips it was a good idea, but anything Cortex-A9 (if not Cortex-A8) and newer is plenty good enough for at least -O2.

    The kernel's compiler flags are all defined in the kernel itself. None of the flags in the build section of Android end up in the kernel.

    There are a number of "changes" you've made which actually didn't do anything:
    1. -DNDEBUG was already defined in TARGET_RELEASE_CFLAGS, which if I remember correctly is just all appended or prepended (doesn't matter which it is in this case) to your usual TARGET_arm_CFLAGS and TARGET_thumb_CFLAGS, so debugging code was already disabled.
    2. The -fgcse-after-reload and -funswitch-loops flags are already enabled by using -O3, so it is not necessary nor helpful to add it by hand.
    3. The -frename-registers flag was already enabled in TARGET_RELEASE_CFLAGS, as was -fomit-frame-pointer, so again these additions have no effect or change.
    4. The -frerun-cse-after-loop is already enabled by using -O3 and it's even enabled at as low as -Os, so this is unnecessary as well and even in the unmodified build flags would already be there.

    There is a flag which you've stated is broken, but actually works just fine and can be extremely beneficial: -flto (Link-Time Optimization).
    I've been using LTO for at least 5 months now.
    You're also missing out on using C++11, this allows for some extra optimizations as well.

    There is a change you've made which is EXTREMELY DANGEROUS and could result in crashes or maybe even worse:
    You've used the "-Wno-error=strict-aliasing" flag. strict-aliasing is a fatal error by default for a reason!! You should definitely give this a read: http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html
    Fixing those violations is extremely time-consuming, and resulted in me having to fork over 30 git repos to do so, but if I didn't do that I could end up being responsible for the loss of data on a thousand or more devices. I don't want something like that on my conscious.


    Now, I've never been one to try and advertise my work or say my work is better than other people's, but I've been hearing about stuff like this for a long time now and I'm getting sick and tired of it.
    My GitHub: https://github.com/MWisBest/
    My ROM's thread: http://forum.xda-developers.com/galaxy-nexus/verizon-develop/rom-fml-fork-life-01-09-2014-t2427087/
    If you take a look at my thread you'll see I'm extremely humble. Heck, if somebody posts that they've moved on to/liked a different ROM better, I'm actually happy that they've found something they like! All I hope for is that people find a ROM they like. If it's mine, great! If it's not, that's just dandy too! I don't boast about the amount of users my ROM has, I don't brag about (let alone bring up) being the first person to get a KitKat ROM on the VZW Galaxy Nexus, etc. I'm not in this for fame and internet points. My expectation when I released my ROM was for it to just fall back page by page and have just a few users.
    I don't know how long you've been working on this stuff, but I've been working on it for 9 months and I'm constantly refining and improving it. I don't claim to be an expert at it, but compared to the people that get attention for similar work I'd say what I've done is better. To put what I've just said into perspective: That is the only time I've ever said anything I've done is better than somebody else here.

    I've been sitting here debating whether or not to press the "Submit Reply" button, in the end I decided that the risks of people using the work posted here outweigh any potential negative outcome from me submitting this post.
    18
    Bravo! I see flags in here that I am yet to try. Thanks for all your work and dedication to this effort.
    Going to do a comparison this week against my "neofy" initiative :)

    Thanks!

    -Neo
    Forum Moderator
    18
    I'm working on CM12 for my beloved i9300, it's in alpha stage, but it works correctly in terms of Android, so I think that I'll start working on V4 for Lollipop soon. But it will take a few weeks more, I need to be sure that there are no upstream bugs before starting to... generate them :).
    18
    Just because some of you are wondering.

    Yes, I'm testing how M works with my optimizations, with UBERTC toolchains, and testing which newly introduced or previously disabled (for compatibility reasons) flags may work again. If you need simple copy-pasta that should work properly, you can find it here and don't forget to click thanks. In the meantime I'm working on V5 for M, which will come at some point, when I test that everything works for at least 2, preferably 3 of my devices, including ARM64 one - OnePlus 2, which is now my main device.

    As I said many times, it's kind of long process, because every flag has to be carefully tested, and because it's being applied globally - also with caution. Full builds take time, tests as well. This commit is already "too device-based", and may require additional fixes for you, so the point now is to require as few as possible, while being compatible with as many as possible devices out of the box.

    So yeah, it will come out, eventually.