to access the VFP reg set look into vfphw.S, there are vfp_get/put_float/double functions. you can get the asm from there.
the code certainly looks like each core has an fpu! and i always suspected smp implied that
Yeah. Each core has a completely independent set of registers. However, at a certain level, I believe there is shared cache. Linus loves to rant about broken cache implementations.
Edit: Fun little piece of trivia. In some cases (specifically, Intel HyperThreading), each core actually has two independent sets of registers. This allows the core to behave logically as if it were two cores, eliminating context switch/state change penalties when switching between the two threads. This gives the CPU's instruction scheduler more flexibility to resolve dependency issues. If one thread has a pipeline stall, the other thread can keep the core's execution resources busy. In this case by "thread" I mean "logical core". I believe this is one of the primary reasons HT disappeared for a few generations - it was needed to get acceptable performance from the Pentium 4's exceptionally deep pipeline, but then Intel moved to microarchitectures that benefited far less from it by having shallower pipelines.
A very good hint that this is supposed to be that way is that the current Linux kernel also calls cpu_pm_enter() when entering the AFTR state (didn't see the LPA (Low Power Audio?) though, maybe it is Android-specific); see
here. For a real confirmation, one would probably have to look into the Exynos documentation, if there is any (publicly accessible).
LPA means something other than Low Power Audio here. Exactly what, I don't know. There isn't any documentation on this I've ever found. I'm going to need to spend some time over vacation rereading through all this. It explains why CONFIG_LOCAL_TIMERS might have altered things - that may have blocked entry into some idle states. It in fact fully explains why this issue has been so timing sensitive - you probably had to enter a deeper idle state during a pread64(), and another task running could easily stop that from happening.
This is consistent with the claim that this bug cropped up in ICS but didn't affect Gingerbread - Samsung made MAJOR improvements to cpuidle (as in, I don't believe one of those states was even accessible in GB, I forget which one as it's been close to 3 years...) in ICS. I recall backporting some of the cpuidle changes from somewhere into a GB kernel, and it had MASSIVE improvements in standby power consumption - but apparently it seems that it might have introduced a bug too.
So if you disable some of the idle states, I'd be very wary of causing power consumption issues. There was one point in time (back in the GB days I think?) where the cpuidle backport dropped idle power consumption of a wakelocked I777 from 5%/hour to around 1.5%/hour.
You are indeed correct that all but the most basic (WFI I think) state can't be entered if more than one core is online (some of that cache stuff Linus likes to rant about IIRC...). Although in later kernels, Samsung moved to a "coupled idle" approach where some states could be entered if more than one core was online, but only if BOTH cores were ready to enter the same state. WFI can be entered since I believe all it does is clock-gate the cores but does not power-gate anything. AFTR and LPA will cause varying degrees of power-gating, which requires time to save/restore states, which is why the kernel will only enter them if the kernel expects to be idle for a certain period of time (those residency parameters...)
IIRC, LPA was the deepest, and AFTR was an intermediary state. In GB, it was basically impossible (or completely impossible?) to enter AFTR. This meant to go past WFI, the kernel had to expect a VERY long idle time. AFTR saves less power, but requires a far lower residency period.
You know, LONG ago there was some issue where one of the cpuidle patches did bad things to the video codec (look through siyahkernel in mid-2012 I think?), I wonder if the MFC HALs use the FP registers at all? Of interest back then, you could never catch the issue when connected via ADB to debug - an ADB shell would cause enough interrupts to block the offending idle state.
Edit: I'd have to do a lot of digging/comparisons, but I wonder if
https://github.com/Entropy512/linux_kernel_sgh-i777/commit/2257adfc9cf3d82641b447cf120d660c93afb302 is relevant
Original backport:
https://github.com/Entropy512/linux_kernel_sgh-i777/commit/be3c8e0d8b901ac73843b3a7dc86c3427c94fd33
Edit: If my memory is correct AND enable_mask is still present in newer kernels, you can enable/disable AFTR entry at runtime.