What are flags?
For compilers such as GCC, flags are essentially options. A flag can enable or disable an option or feature that is used when compiling (building) code.
What are optimizations?
Optimizations, in the context of compiler flags, are flags that improve some aspect of the code, whether it be size, speed, memory use, and debugging, among other possibilities.
General Optimizations
These optimizations are basic flags in GCC, typically implemented into projects to improve an aspect of the final compiled code.
-O1: Optimization level 1, very basic optimizations, rarely used.
-O2: Optimization level 2, basic optimizations, most commonly used.
-O3: Optimization level 3, basic + experimental optimizations. Large performance boost, but can produce bugs.
-Os: Optimize for size. Most of the optimizations from levels 1 and 2, with extras added to decrease the size of code.
-Ofast: All Optimizations from levels 1, 2, and 3, with extra fast math optimizations.
Typically produces the most bugs, with a large performance gain.
-Og: No performance boost, optimizes the debugging experience, making errors and
warnings more informative to help developers.
-g0: Disables all extra debugging, usually makes code faster.
-fomit-frame-pointer: Removes frame pointers when they aren’t needed, streamlining the code.
-fipa-sra: Removes unused parameters/variables and replaces parameters with the called value, streamlining the code.
-fkeep-inline-functions: Emits static inline functions, even after they’ve been called.
-fmodulo-sched: Reorders instructions in loops in the most optimal way.
-fmodulo-sched-allow-regmoves: a more aggressive -fmodulo-sched, optimizing loops further by allowing register moves
-fgcse-sm: Moves stores out of loops to decrease the workload of loops.
-fgcse-las: Removes redundant loads after a store to reduce the workload.
-fgcse-after-reload: Removes redundant loads after a reload.
-funsafe-loop-optimizations: Optimize more by making assumptions, can create bugs from loops overflowing.
-fira-hoist-pressure: Decreases size of the code by evaluating register pressure for hoist expressions.
-fira-loop-pressure: Makes code smaller and faster by evaluating the register pressure of loops.
-DNDEBUG: Passes the variable for no debugging.
-flto: Enables link time optimizations (LTO) for improved library and executable performance.
Graphite Optimizations
Graphite is a project within gcc that uses the integer set library (ISL) and the chunky loop generator (CLooG) to improve memory use and optimize loops.
-fgraphite: Performs basic graphite loop and memory optimizations.
-floop-interchange: Switches two nested loops.
-floop-strip-mine: Splits a complex loop into a set nested loops.
-floop-block: Splits a loop into nested loops so that the memory fits into caches.
-fgraphite-identity: Creates a visual polyhedral representation of certain graphite optimizations. with some optimizations from ISL such as dead code removal.
-floop-nest-optimize: Optimizes the order of nested loops for data-locality and parallelism. This flag is experimental
-floop-unroll-and-jam: Enable unroll and jam for the ISL loop optimizer.
-floop-parallelize-all: Use graphite data to find and parallelize loops that can be.
Multithreading optimizations
Make code run in multiple jobs (threads) to use a multicore cpu to its fullest potential.
-ftree-parallelize-loops=n: Run parallelized loops is n number of threads.
-pthread: Use the posix thread system for multi-threading.
-fopenmp: Use the OpenMP thread system for multithreading. Tends to use less ram than posix.
Sanitizer Flags
These flags use libsanitizer for memory optimizations.
-fsanitize=leak: Sanitize memory leaks to reduce memory use
-fsanitize=address: Sanitize memory addresses to reduce memory use
-fsanitize=thread: Sanitize excess threads to reduce memory/cpu use. Only for 64bit.
Hardware Optimizations
These optimizations optimize code for the targets cpu, gpu, or ram.
-marm: Uses the ARM instruction set for executable code, improving performance.
-mthumb: Uses the Thumb2 instruction set, improving compatibility.
-mthumb-interwork: Improves compatibility between Thumb and ARM code.
-march=X: Optimizes code for your CPU’s arch, such as armv6, armv7-a, etc
-mcpu=X: Optimizes code for your specific CPU such as cortex-a15, cortex-a53, etc.
-mtune=X: Refer to -mcpu
-mfpu=X: Optimizes code for your CPU’s FPU such as vfpv3. vfpv4, neon, etc.
-mabi=X: Optimizes code ABI for your CPU, such as 32 or 64