[ROM] Kryo-AICP [Heavily optimized] [O3] [ODEX] [DTC] [POLLY] [ThinLTO] [MORE] !

Rampage

Senior Member
Jul 5, 2012
1,137
632
0
Bremen

Manav Bhagia

Senior Member
May 1, 2015
158
462
0
Mumbai
why, did you even started allready? :)
I'm just getting started .
Dragon TC Clang caused a lot of errors and messed up the new build again , and again , so I came up with the decision to build with Archidroid optimizations , which are more stable , will build with DTC once it is more stable .

Here is a list of important improvements with next build :
Quote:
- Optimized for speed yet more all instructions - ARM and THUMB (-O3)
- Optimized for speed also parts which are compiled with Clang (-O3)
- Turned off all debugging code (lack of -g)
- Eliminated redundant loads that come after stores to the same memory location, both partial and full redundancies (-fgcse-las)
- Ran a store motion pass after global common subexpression elimination. This pass attempts to move stores out of loops (-fgcse-sm)
- Enabled the identity transformation for graphite. For every SCoP we generate the polyhedral representation and transform it back to gimple. We can then check the costs or benefits of the GIMPLE -> GRAPHITE -> GIMPLE transformation. Some minimal optimizations are also performed by the code generator ISL, like index splitting and dead code elimination in loops (-fgraphite -fgraphite-identity)
- Performed interprocedural pointer analysis and interprocedural modification and reference analysis (-fipa-pta)
- Performed induction variable optimizations (strength reduction, induction variable merging and induction variable elimination) on trees (-fivopts)
- Didn't keep the frame pointer in a register for functions that don't need one. This avoids the instructions to save, set up and restore frame pointers; it also makes an extra register available in many functions (-fomit-frame-pointer)
- Attempted to avoid false dependencies in scheduled code by making use of registers left over after register allocation. This optimization most benefits processors with lots of registers (-frename-registers)
- Tried to reduce the number of symbolic address calculations by using shared “anchor” symbols to address nearby objects. This transformation can help to reduce the number of GOT entries and GOT accesses on some targets (-fsection-anchors)
- Performed tail duplication to enlarge superblock size. This transformation simplifies the control flow of the function allowing other optimizations to do a better job (-ftracer)
- Performed loop invariant motion on trees. It also moved operands of conditions that are invariant out of the loop, so that we can use just trivial invariantness analysis in loop unswitching. The pass also includes store motion (-ftree-loop-im)
- Created a canonical counter for number of iterations in loops for which determining number of iterations requires complicated analysis. Later optimizations then may determine the number easily (-ftree-loop-ivcanon)
- Assumed that loop indices do not overflow, and that loops with nontrivial exit condition are not infinite. This enables a wider range of loop optimizations even if the loop optimizer itself cannot prove that these assumptions are valid (-funsafe-loop-optimizations)
- Moved branches with loop invariant conditions out of the loop (-funswitch-loops)
- Constructed webs as commonly used for register allocation purposes and assigned each web individual pseudo register. This allows the register allocation pass to operate on pseudos directly, but also strengthens several other optimization passes, such as CSE, loop optimizer and trivial dead code remover (-fweb)
- Sorted the common symbols by alignment in descending order. This is to prevent gaps between symbols due to alignment constraints (-Wl,--sort-common)

So hang on , new build WILL be up the day after tomorrow .
Lots more to come with Kryo-AICP !
Also I'm responsible for optimizations on Atomic-ROM , be sure to check that out , if you wanna go AOSP . It will be up in a week or two !
 

Naman Bhalla

Senior Member
Jan 1, 2014
1,166
3,051
0
22
I'm just getting started .
Dragon TC Clang caused a lot of errors and messed up the new build again , and again , so I came up with the decision to build with Archidroid optimizations , which are more stable , will build with DTC once it is more stable .

Here is a list of important improvements with next build :
Quote:
- Optimized for speed yet more all instructions - ARM and THUMB (-O3)
- Optimized for speed also parts which are compiled with Clang (-O3)
- Turned off all debugging code (lack of -g)
- Eliminated redundant loads that come after stores to the same memory location, both partial and full redundancies (-fgcse-las)
- Ran a store motion pass after global common subexpression elimination. This pass attempts to move stores out of loops (-fgcse-sm)
- Enabled the identity transformation for graphite. For every SCoP we generate the polyhedral representation and transform it back to gimple. We can then check the costs or benefits of the GIMPLE -> GRAPHITE -> GIMPLE transformation. Some minimal optimizations are also performed by the code generator ISL, like index splitting and dead code elimination in loops (-fgraphite -fgraphite-identity)
- Performed interprocedural pointer analysis and interprocedural modification and reference analysis (-fipa-pta)
- Performed induction variable optimizations (strength reduction, induction variable merging and induction variable elimination) on trees (-fivopts)
- Didn't keep the frame pointer in a register for functions that don't need one. This avoids the instructions to save, set up and restore frame pointers; it also makes an extra register available in many functions (-fomit-frame-pointer)
- Attempted to avoid false dependencies in scheduled code by making use of registers left over after register allocation. This optimization most benefits processors with lots of registers (-frename-registers)
- Tried to reduce the number of symbolic address calculations by using shared �?�¢??anchor�?�¢?�?� symbols to address nearby objects. This transformation can help to reduce the number of GOT entries and GOT accesses on some targets (-fsection-anchors)
- Performed tail duplication to enlarge superblock size. This transformation simplifies the control flow of the function allowing other optimizations to do a better job (-ftracer)
- Performed loop invariant motion on trees. It also moved operands of conditions that are invariant out of the loop, so that we can use just trivial invariantness analysis in loop unswitching. The pass also includes store motion (-ftree-loop-im)
- Created a canonical counter for number of iterations in loops for which determining number of iterations requires complicated analysis. Later optimizations then may determine the number easily (-ftree-loop-ivcanon)
- Assumed that loop indices do not overflow, and that loops with nontrivial exit condition are not infinite. This enables a wider range of loop optimizations even if the loop optimizer itself cannot prove that these assumptions are valid (-funsafe-loop-optimizations)
- Moved branches with loop invariant conditions out of the loop (-funswitch-loops)
- Constructed webs as commonly used for register allocation purposes and assigned each web individual pseudo register. This allows the register allocation pass to operate on pseudos directly, but also strengthens several other optimization passes, such as CSE, loop optimizer and trivial dead code remover (-fweb)
- Sorted the common symbols by alignment in descending order. This is to prevent gaps between symbols due to alignment constraints (-Wl,--sort-common)

So hang on , new build WILL be up the day after tomorrow .
Lots more to come with Kryo-AICP !
Also I'm responsible for optimizations on Atomic-ROM , be sure to check that out , if you wanna go AOSP . It will be up in a week or two !
Sorry. Don't wanna sound rude. But all these are compiler related (no ArchiDroid in what you mentioned, only GCC stuff https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html) and atleast the things you mentioned won't have any affect on user. Also the optimizations might be a bad thing infact as they aren't up to date for msm8996. And again, the things you mentioned will just affect you while building and almost 0 effect to users. Been there. Done that.
Just a friendly word, focus on device tree and kernel optimizations and they are the biggest things that affect users.
And again, all these optimizations have known to have 0 impact on other SOCs.
Rest, it is your rom, but believe me, you will have 0 results with these and good results with device tree improvements.

Regards and Enjoy !

Sent from my OnePlus2 using XDA Labs
 
Last edited:

Manav Bhagia

Senior Member
May 1, 2015
158
462
0
Mumbai
Sorry. Don't wanna sound rude. But all these are compiler related (no ArchiDroid in what you mentioned, only GCC stuff https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html) and atleast the things you mentioned won't have any affect on user. Also the optimizations might be a bad thing infact as they aren't up to date for msm8996. And again, the things you mentioned will just affect you while building and almost 0 effect to users. Been there. Done that.
Just a friendly word, focus on device tree and kernel optimizations and they are the biggest things that affect users.
And again, all these optimizations have known to have 0 impact on other SOCs.
Rest, it is your rom, but believe me, you will have 0 results with these and good results with device tree improvements.

Regards and Enjoy !

Sent from my OnePlus2 using XDA Labs
O3 and graphite , though are known to improve performance on a lot of devices , we'll see however how well it works on msm8996 , I think it may give us a significant performance bump ! I'll also compile with Dragon TC , once its more stable !
 
  • Like
Reactions: Jole7

KaszasM

Senior Member
Nov 8, 2013
1,186
559
143
Buttenheim
O3 and graphite , though are known to improve performance on a lot of devices , we'll see however how well it works on msm8996 , I think it may give us a significant performance bump ! I'll also compile with Dragon TC , once its more stable !
hi mate, so i got to say @Naman Bhalla is a cool dev to checked his git for a few weeks/days because RR rom, and i "remember" on yours for the Opo but Naman got right also, these opts were there for "low" resource devices basically, and "just my 5cent" on op3 the problem is that the trees/roms are not using the existing hw. resources, and not how that they need opt. to use the already outlasted resources.
so hope you cooking something smooth again for us, and need to test the result first anyway, and i really hope that is worth to edit all the bugs, and that you don't need to start again if there will be some significant changes... (in the future released sources can/could/did sometimes :) )
 

MeeGz

Senior Member
Mar 3, 2012
58
16
0
I can't wait for OP to release their sources for DASH... Thats the only thing holding me back from non-OOS ROMs... We're getting close to the end of July so hopefully they'll finally provide the code. Until then I'll just have to wait because lately the battery has been draining way too fast and DASH is an absolute necessity and godsend. No way I'm getting a full day of moderate usage on a full charge...
 

KaszasM

Senior Member
Nov 8, 2013
1,186
559
143
Buttenheim
Last edited:

Jakkomo77

Senior Member
Jun 8, 2014
276
55
0
All hate on me ,why are you..guys asking so much for Rom Update!!!?
After a lot of time is asking OK but not so guys.
And dash is not working only by stock base.
Great Rom


Gesendet von meinem ONEPLUS A3003 mit Tapatalk
 
  • Like
Reactions: Manav Bhagia

Manav Bhagia

Senior Member
May 1, 2015
158
462
0
Mumbai
Due to having many problems with dragon tc , and having learnt that archidroid optimizations aren't up to date with the Oneplus 3 and might not work well , I have decided to go with more up to date optimizations , on clang and GCC alike , so here are the new optimizations included on the next build :

Clang O3
Graphite (no Polly as it is not stable yet)
Strict aliasing
Removed some useless debugging
Pipe opts
Mem sanitize
Cortex tunings