Hi,
first of all, thank you GreyLeshy especially for your hard work (I can really imagine the time it took as I'm now trying similar work) and all the contributors and testers of your kernel. I'm now cherry-picking from your repo and trying to create alternative kernel for suzuran on LOS 14.1 inspired by your kernel. First I've synchronized suzuran LOS sources with upstream, f2fs for 3.10 and CAF. Then around november last year, I began synchronization with your sources. Unfortunatelly, my kernel is still plagued with one problem, which I'm unable to solve.
This always happen only on idle phone with screen off, with frequency of every 4 days in average:
Code:
<6>[16663.139532] PM: suspend of devices complete after 24.348 msecs
<6>[16663.142508] PM: late suspend of devices complete after 2.953 msecs
<6>[16663.144838] PM: noirq suspend of devices complete after 2.318 msecs
<6>[16663.144858] Disabling non-boot CPUs ...
<1>[16663.151820] Unable to handle kernel paging request at virtual address ffffffc001a14719
<1>[16663.151831] pgd = ffffffc035306000
<1>[16663.151844] [ffffffc001a14719] *pgd=0000000000000000
<0>[16663.151867] Internal error: Oops: 96000021 [#1] PREEMPT SMP
<6>[16663.151896] CPU: 2 PID: 851 Comm: kworker/2:1H Tainted: G W 3.10.108-g3cd60421ddcd-06126-g0cb59086392d #1
<6>[16663.151902] Hardware name: SoMC Suzuran-ROW (DT)
<6>[16663.151941] Workqueue: events_highpri wq_barrier_func
<6>[16663.151952] task: ffffffc0586bcb40 ti: ffffffc04b4e0000 task.ti: ffffffc04b4e0000
<6>[16663.151978] PC is at test_and_set_bit+0x14/0x40
<6>[16663.151996] LR is at check_for_migration+0x400/0x428
I think it happens during suspend after the log "Disabling non-boot CPUs", I have to say that it was a surprise to me, that android kernel is using some sort of frequent suspend-to-ram procedure to save power, but I consider it now to be some hardware specific feature of arm soc, which is far from common OS suspend to ram (I wasn't able to find more documentation about this functionality). Anyway, from the log, I suspect that the bug is most probably in his part of power suspend or wakeup code (as a side note I still dont understand the purpose of powersuspend patch and its adreno-idler dependency - however disabled, I know), my second suspect would be power efficient workqueues or something around tickless/nohz, or some unfortunate combination. I was only able to identify the root cause somewhere between initial 300 commits I've cherry-picked at the beginning from green kernel, but because this problem manifests once every 4 days in average, I started paying attention only when I was at least 3k commits forward, when at some revision suddenly the bug was triggered in several hours. I was then able to go back a little to the point where the probability of this is still those 4 days in average, and then tried to revert some patches which I suspected can cause this. To make the story short - I had't succeeded, I always reverted 4-5 patches, after approximtely 4 days phone rebooted, then I repeated the same procedure for a whole month (last what I reverted was state nofifier, but phone reboots after day 4 on different crash caused by some android userspace - can't remember the name but is part of android sources - responsible for power management). I have given up, reverted the changes and continue different way. I post this here only just for the case someome would be able to provide any usefull info or advice on the matter, but I really don't expect that. I also post this as more detailed explanation for others from suzuran LOS thread here on xda, and I'm gonna post other update about the kernel development now to Berni's thread, so you can check it there if you're interested.