FORUMS
Remove All Ads from XDA

[Bug report] Music player stops playing music from sd after some time

1,639 posts
Thanks Meter: 460
 
By pilgrim011, Senior Member on 15th January 2014, 09:45 PM
Post Reply Email Thread
20th December 2014, 05:41 PM |#821  
Lanchon's Avatar
Senior Member
Thanks Meter: 4,277
 
Donate to Me
More
Quote:
Originally Posted by zeitferne

Afte a lot of try & error I finally managed to read the s0 (lower? half of d0) register in assembler, and printed a message when they do not match (and forced reloading of the registers (even when the message is not printed)). I also added printks to the hotplug and cpu_pm notifiers (also merged the two commits I mentioned). Then I ran the FPBug detector and my message got printed far more often than I expected (it also got printed before a few times).

Here are all relevant files: the patches, the kmsg and also a translation from assembler to ("pseudo")-C of the vfp_support_entry procedure, that I manually made (read: may contain errors).

As soon as the screen is turned off, it seems that the FPU register values of CPU0 (!) always get clobbered between the invocation of the vfp_support_entries. Noob question: Do the cores have separate FPU registers?

to access the VFP reg set look into vfphw.S, there are vfp_get/put_float/double functions. you can get the asm from there.

the code certainly looks like each core has an fpu! and i always suspected smp implied that
 
 
20th December 2014, 06:11 PM |#822  
Senior Member
Flag Upper Austria
Thanks Meter: 687
 
More
Quote:
Originally Posted by Lanchon

to access the VFP reg set look into vfphw.S, there are vfp_get/put_float/double functions. you can get the asm from there.

Yes I initially got the code from there, but got stuck in a bootloop. The solution was then to use p11 instead of p10 (this specifies the coprocessor to use and is also used in VFPFSTMIA macro, that is used for reloading the state).

Meanwhile I have two more insights:
1. I compared the i9300_defconfig with the i9100 one and noticed that CONFIG_EXYNOS_PM_HOTPLUG is disabled there. The i9100 also does not exhibit the bug in this configuration.

2. I found a vfp_enable implementation and calls in mach-exynos/cpuidle-exynos4.c. Commenting them out did not change anything though.
Having become suspicious of the file, I added prinks at the beginning of the exynos4_enter_core0_aftr and the exynos4_enter_core0_lpa functions that called vfp_enable. Indeed, when I get my flood of "HW state (=%x) for %p on CPU0 not in sync" messages, I also get a flood of messages from "exynos4_enter_core0_aftr". Since this function is only called from exynos4_enter_lowpower, I disabled the CONFIG_EXYNOS4_LOWPWR_IDLE configuration option. Result: The FPU corruption seems to be gone (and my phone is still able to wake up this time )
The Following 7 Users Say Thank You to zeitferne For This Useful Post: [ View ] Gift zeitferne Ad-Free
20th December 2014, 06:18 PM |#823  
Lanchon's Avatar
Senior Member
Thanks Meter: 4,277
 
Donate to Me
More
Quote:
Originally Posted by Lanchon

to access the VFP reg set look into vfphw.S, there are vfp_get/put_float/double functions. you can get the asm from there.

the code certainly looks like each core has an fpu! and i always suspected smp implied that

zei, this weekend ill try to get ahold of an S3 for a few minutes and verify that it's really not affected. then ill post patches that can be merged in soc-shared trees. i got 200+ kernel downloads yesterday and no probs reported so i think its ready. i wanted to merge the two commits u found though, they sure look relevant, but i dont want to push anything untested. did u find any problems?

anyway we should look for the cause but i cant do much without a device in the meantime i think its better to merge asap and then reenable the restore state optimization when the cause is found.
20th December 2014, 06:51 PM |#824  
Senior Member
Flag Upper Austria
Thanks Meter: 687
 
More
Quote:
Originally Posted by Lanchon

zei, this weekend ill try to get ahold of an S3 for a few minutes and verify that it's really not affected. then ill post patches that can be merged in soc-shared trees. i got 200+ kernel downloads yesterday and no probs reported so i think its ready. i wanted to merge the two commits u found though, they sure look relevant, but i dont want to push anything untested. did u find any problems?.

Maybe. Android sometimes nearly hangs, it gets so slugish that it causes a soft reboot but even the boot animation is slugish and I have to restart. It seems to affect just userland though and I can get a kmsg (looks normal) or logcat (constant stream of messages, could be normal though) via adb. I'm not sure though if this is cause by these commits, or if I did something wrong with my debugging messages, but for now it would be better not to merge these two commits.
The Following User Says Thank You to zeitferne For This Useful Post: [ View ] Gift zeitferne Ad-Free
20th December 2014, 07:09 PM |#825  
Lanchon's Avatar
Senior Member
Thanks Meter: 4,277
 
Donate to Me
More
Quote:
Originally Posted by zeitferne

Maybe. Android sometimes nearly hangs, it gets so slugish that it causes a soft reboot but even the boot animation is slugish and I have to restart. It seems to affect just userland though and I can get a kmsg (looks normal) or logcat (constant stream of messages, could be normal though) via adb. I'm not sure though if this is cause by these commits, or if I did something wrong with my debugging messages, but for now it would be better not to merge these two commits.

thank you. ill follow your advice. plus, i didnt have time to look into those commits yet but at first glance it looks like more are needed anyway. its better if leave kernel updating to people that can test anyway.

speaking of which, i got an invite a month back and bought a oneplus one as a present for a friend who lives in barcelona. and soon he'll be sending me his old S2 with a friend that is coming to visit (i live in argentina), so ill be able to test
The Following 8 Users Say Thank You to Lanchon For This Useful Post: [ View ] Gift Lanchon Ad-Free
20th December 2014, 07:57 PM |#826  
AndDiSa's Avatar
Senior Member
Flag Heidelberg
Thanks Meter: 2,009
 
More
@zeitferne @Lanchon i've merged those two commits into my kernel. I am running it since yesterday but I did not realize those performance issues yet.
20th December 2014, 08:22 PM |#827  
Senior Member
Flag Upper Austria
Thanks Meter: 687
 
More
Quote:
Originally Posted by AndDiSa

@zeitferne @Lanchon i've merged those two commits into my kernel. I am running it since yesterday but I did not realize those performance issues yet.

The performance cost seems to be really marginal. If you want to measure it, an ideal benchmark would look something like that:
1. In one thread, do FPU operations. The speed of this thread is the benchmark's result.
2. Setup other threads that do not use the FPU (make sure to compile with -msoft-float) but which do use the CPU, so that as many context switches as possible occur for the first thread.
Without this patch, Linux would notice that the other threads did not use the FPU and therefore will skip writing the data it saved when it switched away from that thread to the FPU registers again, since it is (or should be; that's the bug) already there. With the patch, it writes the data to the registers anyway (but on the other hand, the patch saves the checking if it has to write the data).

Quote:
Originally Posted by zeitferne

[...] I disabled the CONFIG_EXYNOS4_LOWPWR_IDLE configuration option. Result: The FPU corruption seems to be gone (and my phone is still able to wake up this time )

Now I have reverted to stock CM kernel and applied only this single configuration change for which there is one single #ifdef in mach-exynos/cpuidle-exynos4.c (and no references in any makefile) and the bug is still gone. If my thinking is right, this means, our bug can be summarized succinctly as:

FPU registers on CPU0 get corrupted in Low Power Idle State. EDIT3: More specifically in AFTR (sic!) mode. Googling for "Exynos 4210 AFTR" reveals interesting results. [/EDIT3]

There are quite a few branches on soc_is_exynos4210() in the related functions, which would be a good explanation why this happens only on these devices.
EDIT2: Since this Low Power Idle Mode can only be entered if only one CPU is online (see this line) also explains why disabling hotplug gets rid of the bug.

EDIT: A file that is properly unifdefed for i9100 is here (ca. 200 lines ~ 10% less; all remaining #if-branches are active).
The Following 9 Users Say Thank You to zeitferne For This Useful Post: [ View ] Gift zeitferne Ad-Free
20th December 2014, 08:28 PM |#828  
Lanchon's Avatar
Senior Member
Thanks Meter: 4,277
 
Donate to Me
More
Quote:
Originally Posted by AndDiSa

@zeitferne @Lanchon i've merged those two commits into my kernel. I am running it since yesterday but I did not realize those performance issues yet.

top performance loss is one fpu context load per timeslice (only the load itself, not the whole trap). its probably under 1%.

you can try antutu before and after, but be sure to run it many times: variation will probably occlude the real impact.
21st December 2014, 12:11 AM |#829  
Senior Member
Flag Upper Austria
Thanks Meter: 687
 
More
Alternative fix, one level deeper
After my investigations in my post above, I just had to put the pieces together:
Quote:
Originally Posted by Lanchon

in our case somehow the tracking fails (across power management state changes, or during CPU migration of tasks, or who knows)


I based my kernel on @Lanchon's (this one), removed the last commit that always reloads the FPU and also merged three other commits related to CPU_PM. Here is the gist with the patches:
https://gist.github.com/9fb17c20e635bbffcb7f

I hope the corruption does not come back after a second restart but so far the FPBug app displayed nothing.

I don't know if these low power states are supposed to corrupt the FPU registers, but if they are, this fix should actually adress the root cause.

EDIT: I have added a flashable ZIP with this kernel (MD5: 55b591535854b66ef0de3245183eb33e).
EDIT2: Updated the ZIP to include driver modules, so that WiFi now works. New MD5: 4d9cd2b9021a4652f7d14ea5a101291d.
EDIT3: Fixed a potential bug. Since the kernel was running fine for (almost?) everyone, users don't need to update, but kernel developers should take a look at the change: https://gist.github.com/Oberon00/9fb...disabled-patch. New MD5: 4852de8e7c3b77878290a72519b8004d

EDIT4: Added a flashable ZIP for Note1/N7000. MD5: df4fb01a03f10ac88309616cddc787ce. WARNING: As I do not own this phone, I cannot test the kernel or the flash-procedure there. Better make a backup before you flash it!

For a full patchset (includes Lanchon's required changes), see here: http://forum.xda-developers.com/show...&postcount=833.
The Following 36 Users Say Thank You to zeitferne For This Useful Post: [ View ] Gift zeitferne Ad-Free
21st December 2014, 12:12 PM |#830  
Senior Member
Flag Großenstein
Thanks Meter: 188
 
More
Pro: "FP-PM-fixed-kernel-CM11-i9100.zip" runs FPBug.app with no corruption.
Con: WiFi is not working anymore on my s2.
It was and is working on Lanchon's latest.

Fixed with EDIT2 above from Zeitferne. Great Work, many thanks.
The Following 2 Users Say Thank You to grzwolf For This Useful Post: [ View ] Gift grzwolf Ad-Free
21st December 2014, 12:34 PM |#831  
Senior Member
Flag Upper Austria
Thanks Meter: 687
 
More
Quote:
Originally Posted by grzwolf

Pro: "FP-PM-fixed-kernel-CM11-i9100.zip" runs FPBug.app with no corruption.
Con: WiFi is not working anymore on my s2.
It was and is working on Lanchon's latest.

OK, sorry, now I know what this magic.patch was for ^^. The kernel refuses to load the WiFi driver which lies in a separately compiled module, because it has a different kernel version. Let's see if I can quickly fix this…

EDIT: I updated the ZIP, WiFi should work now.
The Following 3 Users Say Thank You to zeitferne For This Useful Post: [ View ] Gift zeitferne Ad-Free
Post Reply Subscribe to Thread

Guest Quick Reply (no urls or BBcode)
Message:
Previous Thread Next Thread
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes