Originally Posted by Khaon
Ok but you did before enabling those flags for the whole build, that is why I point you out it was still inUNKNOWN ( but did'nt know you move them this to a config file
1/ By default vectorization with quad words so you don't need to activate it i think
2/ Seems your misunderstood what s the difference with double and quad, it is not related with the number of core
No, I never enabled quad and double vectorizations at the same time, it's the most stupid thing I could've done
Yes, you're right - mvectorize-with-neon-quad is the default choice. Adding that is intended, as it may not be default sometime in future.
Also, we have found that specifying -mvectorize-with-neon-quad option
gives slightly better overall results (about 1%) than default double-integer vectorization
Now you're comparing doubleword-registers, and quadword-registers (now default). You see, quadword-vectorization is NOT always faster than doubleword-vectorization. It depends mostly on how big L1/L2/L3 CPU caches are, and how many registers we need to store in there. For low-budget phones, especially with 2 cores or even less, doubleword-vectorization was faster by around 2-3% for me, rather than quadword-vectorization. This test was done on Xperia M with only 2 cores available. On Galaxy S3 with exynos4412 and 4 cores, quadword vectorization gives around 1-2% better results than doubleword.
Mastering optimizations is not done per-android-basis, but by per-device one. Doing something as easy as specifying how big CPU caches are for target device
improves compiler decisions, especially on big optimization levels such as O3. If you enable O3 and you don't help the compiler, most likely your code will perform worse than it'd perform on O2 or Os. That's why I enable O3 in my commit, but I also help the compiler to describe my target device as good as I possibly can.