[Bug report] Music player stops playing music from sd after some time

Search This thread

Lanchon

Senior Member
Jun 19, 2011
2,751
4,487
You could be a great maintainer for i9100 because you have proved you can fix things without having a i9100 device. :D Imagine if you will receive the device what other great things you can do :D

"Fun" for i9100 (and Exynos4 platform) is not near to end, buddy. ;) @GeeckoDev which owns N7000 tries to implement hwc, gralloc and Mali driver (r3p2-01rel0 with dmabuf, not UMP) from Insignal source. HWC for our platform it seems to do nothing, so his work will make Project Butter to be a reality and our devices should be smoother than even before! :D

Look here:
http://xdaforums.com/showpost.php?p=57565566&postcount=810

wow that really is amazing work geecko is doing!!! but unfortunately i dont know even half of the acronyms you mention lol :) i dont know the first thing about the linux kernel, arm arch or android for that matter, so dont look at me for maintaining any ports, sorry about that! :)

---------- Post added at 05:01 PM ---------- Previous post was at 04:56 PM ----------

@Lanchon @zeitferne @Entropy512 I love you guys.

Just a question though, I applied the initial patchset that disables lazy restore, but didn't enable CONFIG_CPU_PM. Tested on Touchwiz 4.1, fix seems to be working. Should I enable it anyway? :)

i can only say that my initial fix is not sensitive to any configuration options but the rest of the kernel might be. so if you disabled that option to try to fix this bug, you can reenable it with my fix in place. otherwise your answer lies elsewhere.
 
  • Like
Reactions: pirate11n11

zeitferne

Senior Member
Jul 15, 2014
153
684
Upper Austria
oberon00.github.io

snow

Senior Member
Oct 14, 2010
213
310
30
www.mozilla.ro
wow that really is amazing work geecko is doing!!! but unfortunately i dont know even half of the acronyms you mention lol :) i dont know the first thing about the linux kernel, arm arch or android for that matter, so dont look at me for maintaining any ports, sorry about that! :)

I just hope you will not leave S2 area. :D Like FPBug, Geecko's work will help Exynos4 devices to work properly, as other newer devices (smooth). :) Hardware composer (HWC) saves battery while using hardware instead of GPU for screen composition. Look here for more: https://wiki.mozilla.org/Platform/GFX/hwcomposer :D
 
  • Like
Reactions: grzwolf and Lanchon

Entropy512

Senior Recognized Developer
Aug 31, 2007
14,088
25,086
Owego, NY
to access the VFP reg set look into vfphw.S, there are vfp_get/put_float/double functions. you can get the asm from there.

the code certainly looks like each core has an fpu! and i always suspected smp implied that
Yeah. Each core has a completely independent set of registers. However, at a certain level, I believe there is shared cache. Linus loves to rant about broken cache implementations.

Edit: Fun little piece of trivia. In some cases (specifically, Intel HyperThreading), each core actually has two independent sets of registers. This allows the core to behave logically as if it were two cores, eliminating context switch/state change penalties when switching between the two threads. This gives the CPU's instruction scheduler more flexibility to resolve dependency issues. If one thread has a pipeline stall, the other thread can keep the core's execution resources busy. In this case by "thread" I mean "logical core". I believe this is one of the primary reasons HT disappeared for a few generations - it was needed to get acceptable performance from the Pentium 4's exceptionally deep pipeline, but then Intel moved to microarchitectures that benefited far less from it by having shallower pipelines.

A very good hint that this is supposed to be that way is that the current Linux kernel also calls cpu_pm_enter() when entering the AFTR state (didn't see the LPA (Low Power Audio?) though, maybe it is Android-specific); see here. For a real confirmation, one would probably have to look into the Exynos documentation, if there is any (publicly accessible).
LPA means something other than Low Power Audio here. Exactly what, I don't know. There isn't any documentation on this I've ever found. I'm going to need to spend some time over vacation rereading through all this. It explains why CONFIG_LOCAL_TIMERS might have altered things - that may have blocked entry into some idle states. It in fact fully explains why this issue has been so timing sensitive - you probably had to enter a deeper idle state during a pread64(), and another task running could easily stop that from happening.

This is consistent with the claim that this bug cropped up in ICS but didn't affect Gingerbread - Samsung made MAJOR improvements to cpuidle (as in, I don't believe one of those states was even accessible in GB, I forget which one as it's been close to 3 years...) in ICS. I recall backporting some of the cpuidle changes from somewhere into a GB kernel, and it had MASSIVE improvements in standby power consumption - but apparently it seems that it might have introduced a bug too.

So if you disable some of the idle states, I'd be very wary of causing power consumption issues. There was one point in time (back in the GB days I think?) where the cpuidle backport dropped idle power consumption of a wakelocked I777 from 5%/hour to around 1.5%/hour.

You are indeed correct that all but the most basic (WFI I think) state can't be entered if more than one core is online (some of that cache stuff Linus likes to rant about IIRC...). Although in later kernels, Samsung moved to a "coupled idle" approach where some states could be entered if more than one core was online, but only if BOTH cores were ready to enter the same state. WFI can be entered since I believe all it does is clock-gate the cores but does not power-gate anything. AFTR and LPA will cause varying degrees of power-gating, which requires time to save/restore states, which is why the kernel will only enter them if the kernel expects to be idle for a certain period of time (those residency parameters...)

IIRC, LPA was the deepest, and AFTR was an intermediary state. In GB, it was basically impossible (or completely impossible?) to enter AFTR. This meant to go past WFI, the kernel had to expect a VERY long idle time. AFTR saves less power, but requires a far lower residency period.

You know, LONG ago there was some issue where one of the cpuidle patches did bad things to the video codec (look through siyahkernel in mid-2012 I think?), I wonder if the MFC HALs use the FP registers at all? Of interest back then, you could never catch the issue when connected via ADB to debug - an ADB shell would cause enough interrupts to block the offending idle state.

Edit: I'd have to do a lot of digging/comparisons, but I wonder if https://github.com/Entropy512/linux_kernel_sgh-i777/commit/2257adfc9cf3d82641b447cf120d660c93afb302 is relevant

Original backport: https://github.com/Entropy512/linux_kernel_sgh-i777/commit/be3c8e0d8b901ac73843b3a7dc86c3427c94fd33

Edit: If my memory is correct AND enable_mask is still present in newer kernels, you can enable/disable AFTR entry at runtime.
 
Last edited:

FrodgE

Senior Member
Aug 27, 2010
85
62
Melbourne
Unfortunately I've been pretty busy the past 6 months so I couldn't help with testing, but for what it's worth I added @zeitferne patch set from post 833 to my local omni clone and have been running it all day without any issues :) I am however still using exFat on my sdcards. As expected the FPbug detector apps @Lanchon posted in 714 didn't pick up any problems although I must admit I didn't try to reproduce the problem on stock omni. From reading the posts though it sounds like it was easy to reproduce so I'll assume my testing steps were adequate and it passed.

All of the kernel downloads that I've seen listed are targeting CM11 (kinda ironic given this is an omni thread) other than @Gustavo_s kernel which I don't think contains @zeitferne last update yet.

For the benefit of other omni users I'd post the kernel I built so it could be tested by more people. Unfortunately my EXTREME n00b status prevents me from doing so (I make the average n00b look like Stephen Hawking). I actually have no idea how to compile and release the kernel independently of the entire ROM build. My googling brings me to guides for downloading the source and configuring the toolchains from scratch, I'd assume that if I have a complete git clone and working build environment many of these steps can be skipped. If anyone can tell me otherwise or point me to a guide I'll upload my build, unless of course someone credible beats me to it.

@Lanchon @zeitferne you guys are awesome ! Thanks heaps for your hard work.
 
Last edited:
  • Like
Reactions: Lanchon

zeitferne

Senior Member
Jul 15, 2014
153
684
Upper Austria
oberon00.github.io
I've compiled my Touchwiz kernel with the newer patchset from @zeitferne but I'm always getting a freeze after turning the screen off and on. :(
I assume you mean this Note 1 (N7000) kernel: http://xdaforums.com/showthread.php?t=2397000.

First question(s): Does it only affect the touchwiz version of your kernel? And do I understand correctly that it is based directly on Samsung N7000 Sources and not on any AOSP (CyanogenMod, OmiROM, …) Kernel? Did the patches even apply cleanly there?

sorry to hear that. did you also try my patchset or just zei's?
That would have been my second question, but looking at https://github.com/libcg/raw_kernel_tw/commits/master, it seems you did. I assume it did work?

Third: Try some debugging yourself! If you can answer all the question above with "yes", you might first want to try independently reverting this (a) and this (b) (and maybe this (c)) . If reverting (a) fixed the problem, simply leave it out, this was a "just to be sure" cherry-pick. For (c), check with the FPBug-App if the bug is still fixed; if so you can leave that one out too. If reverting (b) fixes the problem, add some printks around my cpu_pm_enter/exit calls: Where exactly does the freeze occur?
 

grzwolf

Senior Member
Mar 7, 2009
276
840
Großenstein
...
For the benefit of other omni users I'd post the kernel I built so it could be tested by more people. Unfortunately my EXTREME n00b status prevents me from doing so (I make the average n00b look like Stephen Hawking). I actually have no idea how to compile and release the kernel independently of the entire ROM build. My googling brings me to guides for downloading the source and configuring the toolchains from scratch, I'd assume that if I have a complete git clone and working build environment many of these steps can be skipped. If anyone can tell me otherwise or point me to a guide I'll upload my build, unless of course someone credible beats me to it.

@Lanchon @zeitferne you guys are awesome ! Thanks heaps for your hard work.

Since you already have a complete ROM:
boot.img --> flashable zip
Concept:
http://android.stackexchange.com/questions/36265/flash-boot-img-without-using-fastboot-usb
Look at answer 1: graph and script only, skip the rest.
You might start with a proven functional zipable kernel for your device.
Anything capable to zip some files will do the job.
Forget about signing the package on a rooted ROM.
 
Last edited:
  • Like
Reactions: ze7zez and FrodgE

FrodgE

Senior Member
Aug 27, 2010
85
62
Melbourne
Since you already have a complete ROM:
boot.img --> flashable zip
Concept:
http://android.stackexchange.com/questions/36265/flash-boot-img-without-using-fastboot-usb
Look at answer 1: graph and script only, skip the rest.
You might start with a proven functional zipable kernel for your device.
Anything capable to zip some files will do the job.
Forget about signing the package on a rooted ROM.

Thankyou ! I'll take a look. Can you check the url ? It's broken.

---------- Post added at 08:58 PM ---------- Previous post was at 08:42 PM ----------

Never mind, think I found it.

EDIT: Link all sorted, quote updated to avoid confusion.
 
Last edited:

FrodgE

Senior Member
Aug 27, 2010
85
62
Melbourne
i9100 kernel build for OmniROM

For anyone running OmniROM on their i9100 who would like to try the kernel updates from @Lanchon and @zeitferne, I have attached my own build. This build is based on stock Omni with only the additional commits as described by zeitferne in post 833. This is only for the adventurous, it is working for me but I can't make any guarantees it will work for you. Normal disclaimers apply. Ideally you should wait for an official implementation within Omni.

As a test I have flashed this directly over the normal nightly build. Android settings should the version as "NIGHTLY" and the kernel as my own OpenSUSE box.

attachment.php


I've appended my sig to the filename purely to identify the build and certainly not to attempt to claim any ownership.
md5: 63a0495f1da93dbe5c78895cd31f3a86


grzwolf said:
You might start with a proven functional zipable kernel for your device.
You make a very good point. Based upon your link and some other flashable kernels I've seen I've put this together. I'd still be keen to learn how to build each section of the ROM independently. XDA university has a good article but it misses the kernel which is the bit I wanted !
 

Attachments

  • 20141223_omni_i1900_FPbug_zeitferne_FrodgE.zip
    6.2 MB · Views: 25
  • kernel_screenshot_50.png
    kernel_screenshot_50.png
    37.4 KB · Views: 2,567

FrodgE

Senior Member
Aug 27, 2010
85
62
Melbourne
  • Like
Reactions: Lanchon

notabenem

Senior Member
Jul 17, 2009
118
75
besides my kernels, i think only gustavos has trim. he made the changes this week. we didnt talk about it, so i really cant vouch for its safety.

I am using the latest build of Gustavo's kernel. Since the 21/12 build of Gustavo's kernel I am having random Baseband issues (phone would not recognize the Baseband after restart, and About phone would show "Baseband: unknown"). Subsequent restarts usually help and once the Baseband is recognized, connectivity is stable.
 

Top Liked Posts

  • There are no posts matching your filters.
  • 55
    CM11-M10 Music Bug Fix

    this is just sdcard.c compiled with "-mfloat-abi=soft" instead of "-mfloat-abi=softfp". disassembly confirms that regular instead of FP registers are used to temporarily save unique.

    sdcard.c is usually left alone in roms. so although this fix is for CM11-M10, it probably works fine on all kitkat roms.
    @GidiK, could you please test? just flash from recovery.
    thank you!
    42
    thanks guys, this is great!! pheww... that was a *****

    i'll clean up and post everything later cause i'm busy now, but here's what did it:
    on SMP arm (eg: our case, multicore), the FPU state is saved eagerly on context switch out and restored lazily on context switch in. a given time slice starts with the FPU disabled, and only if the process later touches the FPU during the slice, the kernel restores the state in a trap (and only then it will have to save the state when the slice is up).

    the state is saved in ram but also left there in the disabled FPU registers. it may happen that when the kernel decides to restore the FPU state saved in ram to a given disabled FPU, by chance the state it wants to load is already present in there. to take advantage of this lucky situation, the kernel tracks the leftover state in the disabled FPUs and optimizes the load away when it can prove that it's not needed. (the trap is still needed to enable the FPU, so the time saved is really not that much.)

    in our case somehow the tracking fails (across power management state changes, or during CPU migration of tasks, or who knows), the wrong decision is made and the state is not restored when it's actually needed. FR has a bunch of mainline fixes for FPU corruption cases (same as MG8), but the change that did away with this problem is simply disabling this state restore optimization and always load the state from ram.

    this looks like a full fix. it does have a performance impact, but only for processes that use the FPU, and only when they are running FPU-uncontested in a core. in any case, i'd guess the performance impact of affected processes is probably less than 1%.


    now i'd really like someone with a 4412 device (eg: S3 international) to properly test FPBug2.apk (screen off) and verify that it's truly not affected.

    ---------- Post added at 02:22 PM ---------- Previous post was at 02:16 PM ----------

    I must be doing something wrong. Flashed the kernel, but I am stuck at the boot animations screen (rest of the system is still Omni, no data-wipe)

    this is a CM kernel, no reason it should work in Omni

    This only patches the sdcard daemon, and thus doesn't affect testing with Lanchon's FPBug App, but it would of course affect tests that depend on playing music/copying files.

    that's right!

    ---------- Post added at 02:28 PM ---------- Previous post was at 02:22 PM ----------

    i'm curious as to the performance impact, see if it's worth the effort to restore the optimization.

    if somebody wants to test:
    after reboot please run antutu 2 or 3 times with each kernel (FR and MG8) and report
    thanks!!!
    36
    If you want to have a workaround until this bug is properly fixed, try the attached flashable zip.

    Warning: tested on i9100 variant only.
    Warning2: since we are dealing with an obscure memory/variable corruption bug, it is possible that something else goes wrong with the workaround.
    To go back to original, flash your rom zip again (it will overwrite the sdcard binary)

    For me, this workaround fixes any issues and the file copy testcase I have used to debug this bug has now been passing several times and across reboots

    What this workaround does:
    Code:
    $ git diff sdcard.c
    diff --git a/sdcard/sdcard.c b/sdcard/sdcard.c
    index 989ca00..67e5910 100644
    --- a/sdcard/sdcard.c
    +++ b/sdcard/sdcard.c
    @@ -1214,9 +1214,10 @@ static int handle_read(struct fuse* fuse, struct fuse_handler* handler,
             const struct fuse_in_header* hdr, const struct fuse_read_in* req)
     {
         struct handle *h = id_to_ptr(req->fh);
    -    __u64 unique = hdr->unique;
    +    volatile __u64 vars64[2];
    +    vars64[0] = hdr->unique;
         __u32 size = req->size;
    -    __u64 offset = req->offset;
    +    vars64[1] = req->offset;
         int res;
     
         /* Don't access any other fields of hdr or req beyond this point, the read buffer
    @@ -1224,15 +1225,15 @@ static int handle_read(struct fuse* fuse, struct fuse_handler* handler,
          * saves us 128KB per request handler thread at the cost of this scary comment. */
     
         TRACE("[%d] READ %p(%d) %u@%llu\n", handler->token,
    -            h, h->fd, size, offset);
    +            h, h->fd, size, vars64[1]);
         if (size > sizeof(handler->read_buffer)) {
             return -EINVAL;
         }
    -    res = pread64(h->fd, handler->read_buffer, size, offset);
    +    res = pread64(h->fd, handler->read_buffer, size, vars64[1]);
         if (res < 0) {
             return -errno;
         }
    -    fuse_reply(fuse, unique, handler->read_buffer, res);
    +    fuse_reply(fuse, vars64[0], handler->read_buffer, res);
         return NO_STATUS;
     }

    update: made the update script a bit safer
    35
    Alternative fix, one level deeper

    After my investigations in my post above, I just had to put the pieces together:
    in our case somehow the tracking fails (across power management state changes, or during CPU migration of tasks, or who knows)

    I based my kernel on @Lanchon's (this one), removed the last commit that always reloads the FPU and also merged three other commits related to CPU_PM. Here is the gist with the patches:
    https://gist.github.com/9fb17c20e635bbffcb7f

    I hope the corruption does not come back after a second restart ;) but so far the FPBug app displayed nothing.

    I don't know if these low power states are supposed to corrupt the FPU registers, but if they are, this fix should actually adress the root cause.

    EDIT: I have added a flashable ZIP with this kernel (MD5: 55b591535854b66ef0de3245183eb33e).
    EDIT2: Updated the ZIP to include driver modules, so that WiFi now works. New MD5: 4d9cd2b9021a4652f7d14ea5a101291d.
    EDIT3: Fixed a potential bug. Since the kernel was running fine for (almost?) everyone, users don't need to update, but kernel developers should take a look at the change: https://gist.github.com/Oberon00/9f...xynos-call-pm-notifiers-w-irqs-disabled-patch. New MD5: 4852de8e7c3b77878290a72519b8004d

    EDIT4: Added a flashable ZIP for Note1/N7000. MD5: df4fb01a03f10ac88309616cddc787ce. WARNING: As I do not own this phone, I cannot test the kernel or the flash-procedure there. Better make a backup before you flash it!

    For a full patchset (includes Lanchon's required changes), see here: http://xdaforums.com/showpost.php?p=57654708&postcount=833.