The below information was based on how things should have been. Reality is the Exynos 5410 has some serious issues with its cache coherent interconnect / CCI which cripples the chip to only cluster migration, effectively making the major parts of the big.LITTLE operating scheme useless. This is an issue in silicon which cannot be solved.
What's it all about?
The Exynos Octa or Exynos 5410 is a big.LITTLE design engineered by ARM and is the first consumer implementation of this technology. Samsung was their lead partner in terms of bringing this to market first. Reneseas is the other current chip designer who has publicly announced a big.LITTLE design.
- Misconception #1: Samsung didn't design this, ARM did. This is not some stupid marketing gimmick.
The point of the design is to meld the advantages of the A7 processor architectures, with its extreme power efficiency, with the A15 architecture, with extreme performance at a cost of power consumption. The A7 cores are slightly slower than an A9 equivalent, but using much less power. The A15 cores are in another ballpark in terms of performance but their power consumption is also extreme on this current manufacturing generation.
The effective goal is to achieve the best of both worlds. Qualcomm on the other does this by using their own architecture which is similar in some design aspects to the A15 architecture, but compromises on feature and performance to achieve higher power efficiency. The end result is for the user can be expressed in 2 measurements: IPC (Instrucitons per clock), and Perf/W (Performance per Watt).
In terms of IPC, the A15 leads the pack by quite a margin, followed by Krait 400, Krait 300, Krait 200, A9, A7, and A8 cores, in that order.
In terms of Perf/W, the A7 leads by a margin, followed by A9's and the Krait cores, with the A15 at a distant last in terms of efficiency.
Of course, the Exynos Octa is the first to use this:
Currently, the official word seems to be that the A7 cluster is configured to run from 200 to 1200MHz, and the A15 cluster from 200 to 1600MHz.
There are several use-cases of how the design can be used, and it is purely limited by software, as the hardware configuration is completely flexible.
In-Kernel Switcher (IKS)
This is what most of us will see this in our consumer products this year; Effectively, you only have a virtual quad-core processor. The A15 cores are paired up with the A7 core clusters. Each A15 has a corresponding A7 "partner". Hardware wise, this pair-up has no physical representation as provided by an actual die-shot of the Exynos Octa.
The IKS does the same thing as a CPU governor. But instead of switching CPU frequency depending on the load, it will switch between CPUs.
Effecively, you are jumping from one performance/power curve to another: And that's it. Nothing more, nothing less.
The actual implementation is a very simple driver on the side of the kernel which measures load and acts much like a CPU governor.
The above is a demonstration; you can see how at most times the A7 cores are used for video playback, simple tasks, and miscellaneous computations. The A15 cores will kick in when there is more demanding load being processed, and then quickly drop out again to the A7 cores when it's not doing much anymore.
- Misconception #2: You DON'T need to have all 8 cores online, actually, only maximum 4 cores will ever be online at the same time.
- Misconception #3: If the workload is thread-light, just as we did hot-plugging on previous CPUs, big.LITTLE pairs will simply remain offline under such light loads. There is no wasted power with power-gating.
- Misconception #4: As mentioned, each pair can switch independently of other pairs. It's not he whole cluster who switches between A15 and A7 cores. You can have only a single A15 online, together with two A7's, while the fourth pair is completely offline.
- Misconception #5: The two clusters have their own frequency planes. This means A15 cores all run on one frequency while the A7 cores can be running on another. However, inside of the frequency planes, all cores run at the same frequency, meaning there is only one frequency for all cores of a type at a time.
Heterogeneous Multi-Processing (HMP)
This is the other actual implemented function mode of a big.LITTLE CPU. In this case, all 8 cores can be used simultaneously by the system.
This is a vastly more complex working mechanism, and its implementation is also an order of magnitude more sophisticated. It requires the kernel scheduler to actually be aware of the differentiation of between the A7 and A15 cores. Currently, the Linux kernel is not capable of doing this and treats all CPUs as equals. This is a problem since we do not want to use the A15 cores when a task can simply me processed on an A7 core with a much lower power cost.
The Linaro working-group already finished the first implementation of the HMP design as a series of patches to be applied against the Linux 3.8 kernel. What they did is to make the scheduler smart enough to be able to track the load of single process entities, and with that information to schedule the threads smartly on either the A7 cores or the A15 cores. This achieves much lower latency in terms of switching workloads, or better said, switching the environments (CPUs) to the respective work-loads, and exposes the full processing capabilities of the silicon as all cores can be used at once.
You can follow the advancements of this in the publications of the Linaro Connect summits that happen every few months. The code was only published in the middle of February this year for the first working implementation equivalent in power consumption to the IKS.
- Misconception #6: Yes the CPU is a true 8-core processor. It's just not being used a such in its initial software implementations