[Bug report] Music player stops playing music from sd after some time

Search This thread

zeitferne

Senior Member
Jul 15, 2014
153
684
Upper Austria
oberon00.github.io
I must be doing something wrong. Flashed the kernel, but I am stuck at the boot animations screen (rest of the system is still Omni, no data-wipe)

I just flashed from recovery over my existing (CM default) kernel, rebooted and everything was fine. Maybe your download is damaged? MD5 of the ZIP should be d41190d1378d7e0f7f2cb8b6ba6ca1aa.
 
  • Like
Reactions: Lanchon

notabenem

Senior Member
Jul 17, 2009
118
75
TC says it's all right.
Code:
FR-kernel-cm-11-20141115-SNAPSHOT-M12-Lanchon-20141218-i9100.md5:
OK:  FR-kernel-cm-11-20141115-SNAPSHOT-M12-Lanchon-20141218-i9100.zip

Errors: 0
OK: 1, not found: 0, read error: 0, wrong checksum: 0

The android robot is wiggling its ears for >20 mins now. And blinking...

Edit: Kernel wipe did not help, had to revert to the previous kernel (Gustavo). Will do a full backup and then factory reset.
 
Last edited:

cdrivex4

Senior Member
Jan 13, 2014
113
45
Tested on all three kernels, happens everywhere (1 process version).

What did you change in these kernels?

So far, I could not reproduce the issue with this kernel, even after a restart. But I won't become euphoric before others confirm :)

That make 3 so far with no errors including me. ah ah ah. FR kernel + FPbug2 is goood, with screen on no errors like before (within 30 seconds)..unpluging/pluging power after short(5seconds) or long (3 mins) nothing...even if screen goes off...uptime 25mins so far no error, whereas before you were looking at seconds for the error to pop up....time for some serious audio playback testing...
 
Last edited:

Lanchon

Senior Member
Jun 19, 2011
2,751
4,487
thanks guys, this is great!! pheww... that was a *****

i'll clean up and post everything later cause i'm busy now, but here's what did it:
on SMP arm (eg: our case, multicore), the FPU state is saved eagerly on context switch out and restored lazily on context switch in. a given time slice starts with the FPU disabled, and only if the process later touches the FPU during the slice, the kernel restores the state in a trap (and only then it will have to save the state when the slice is up).

the state is saved in ram but also left there in the disabled FPU registers. it may happen that when the kernel decides to restore the FPU state saved in ram to a given disabled FPU, by chance the state it wants to load is already present in there. to take advantage of this lucky situation, the kernel tracks the leftover state in the disabled FPUs and optimizes the load away when it can prove that it's not needed. (the trap is still needed to enable the FPU, so the time saved is really not that much.)

in our case somehow the tracking fails (across power management state changes, or during CPU migration of tasks, or who knows), the wrong decision is made and the state is not restored when it's actually needed. FR has a bunch of mainline fixes for FPU corruption cases (same as MG8), but the change that did away with this problem is simply disabling this state restore optimization and always load the state from ram.

this looks like a full fix. it does have a performance impact, but only for processes that use the FPU, and only when they are running FPU-uncontested in a core. in any case, i'd guess the performance impact of affected processes is probably less than 1%.


now i'd really like someone with a 4412 device (eg: S3 international) to properly test FPBug2.apk (screen off) and verify that it's truly not affected.

---------- Post added at 02:22 PM ---------- Previous post was at 02:16 PM ----------

I must be doing something wrong. Flashed the kernel, but I am stuck at the boot animations screen (rest of the system is still Omni, no data-wipe)

this is a CM kernel, no reason it should work in Omni

This only patches the sdcard daemon, and thus doesn't affect testing with Lanchon's FPBug App, but it would of course affect tests that depend on playing music/copying files.

that's right!

---------- Post added at 02:28 PM ---------- Previous post was at 02:22 PM ----------

i'm curious as to the performance impact, see if it's worth the effort to restore the optimization.

if somebody wants to test:
after reboot please run antutu 2 or 3 times with each kernel (FR and MG8) and report
thanks!!!
 

cdrivex4

Senior Member
Jan 13, 2014
113
45
Realise FR was just to fix music bug hunt, but anyone else notice phone network being dropped completely after the flash.. Or is it just me.have to say it's been cool listening to music again without jetaudio not dying didn't notice no calls or texts.

What happens next for nightly cm11?

sent from my gt-i9100
 

zeitferne

Senior Member
Jul 15, 2014
153
684
Upper Austria
oberon00.github.io
Realise FR was just to fix music bug hunt, but anyone else notice phone network being dropped completely after the flash.. Or is it just me.have to say it's been cool listening to music again without jetaudio not dying didn't notice no calls or texts.

Funny, my phone lost the network connection yesterday and I had to restart (could not even *enter* flight mode) but right now, with the FR kernel, all is well.

EDIT: "Just" to fix the music bug hunt?! :D That was a hell of a hunt that Lanchon might just have ended here!

Why might? I'm playing devils advocate here: So we now know that the vfp module missed some important powerstate notification, and we have a good workaround for this. But maybe this is not the fault of the vfp module, but of some other module (hotplug?) that fails to send the notification, or sends it at the wrong time. If that was the case, other modules that also depend on such notifications could be affected too.

EDIT2: Thinking about it, its almost definitely not the vfp module's fault, since the same module works fine on 4412. Could still be that instead of wrong notifications, the FPU on the 4210 does not meet the vfp modules assumptions, i.e. loses state when it shouldn't.
 
Last edited:
  • Like
Reactions: Lanchon

Entropy512

Senior Recognized Developer
Aug 31, 2007
14,088
25,086
Owego, NY
Funny, my phone lost the network connection yesterday and I had to restart (could not even *enter* flight mode) but right now, with the FR kernel, all is well.

EDIT: "Just" to fix the music bug hunt?! :D That was a hell of a hunt that Lanchon might just have ended here!

Why might? I'm playing devils advocate here: So we now know that the vfp module missed some important powerstate notification, and we have a good workaround for this. But maybe this is not the fault of the vfp module, but of some other module (hotplug?) that fails to send the notification, or sends it at the wrong time. If that was the case, other modules that also depend on such notifications could be affected too.
Yeah. Whether his fix is the final one - don't know.

But he's made a LOT of progress towards finding root cause and is closer than anyone has ever been. At least from an Omni perspective, Lanchon's work means "there's a chance of 5.0 if a maintainer shows up" as opposed to "even if a maintainer shows up, no 5.0 nightlies until this gets fixed"
 

zeitferne

Senior Member
Jul 15, 2014
153
684
Upper Austria
oberon00.github.io
But he's made a LOT of progress towards finding root cause and is closer than anyone has ever been. At least from an Omni perspective, Lanchon's work means "there's a chance of 5.0 if a maintainer shows up" as opposed to "even if a maintainer shows up, no 5.0 nightlies until this gets fixed"

Yeah, definitely! :)

Thank you @Lanchon! The patched sdcard was already really helpful from a pragmatic point of view, and now it seems like I can finally start to trust my phone again. Even if maybe not the final one, this kernel is definitely a breakthrough!
 

Entropy512

Senior Recognized Developer
Aug 31, 2007
14,088
25,086
Owego, NY
Good to read this.
And just in case that no one will volunteer, what would be the best choice out of the current flagship phones for running Omni (Lollipop) - something like a "long term good choice for Omni"?

Right now... Oppo devices are our best supported devices and I expect that to continue. Sony devices are hit-and-miss - many of the Omni team love Sony devices, but their 6-month product cycle combined with inconsistent availability in many regions has hurt them. There's also the fact that honestly, their stock firmwares are really damn solid. My daily driver for the past month has been a bone-stock (no root, no bootloader unlock) Z3. At some point I'll work on Omni for it, but I don't have the time right now. (I need to sort out some fundamental things in my life that are non-Android related first.)

Although I'm not sure if I'd go for the N3, that device is... meeeeeh. It's the first Oppo I've put ZERO effort into daily-drivering. (I at least used the R819 for a week, I would've used it for longer if not for MTK.)
 

notabenem

Senior Member
Jul 17, 2009
118
75
Patch collection

Zeitferne and Lanchon,
For the sake of completeness, could both of you please post all the patches/cherry picked GIT commits (with hyperlinks) that you applied to the kernel (even though they did not fix the FPU corruption) ? Apparently there are a lot of things that were just lurking undetected with who knows what kind of side-effects (fixed or unfixed). Hopefully kernel/device maintainers will find this knowledge valuable.

---------- Post added at 10:21 PM ---------- Previous post was at 10:18 PM ----------

Right now... Oppo devices are our best supported devices and I expect that to continue. Sony devices are hit-and-miss ...
Any experiences with the OnePlus One?
 
  • Like
Reactions: Twiq and AndDiSa

Lanchon

Senior Member
Jun 19, 2011
2,751
4,487
hi and sorry for the absence

this evening i kicked off my build bot and now im uploading kernels here:
https://www.androidfilehost.com/?w=files&flid=22854

but first thing's first, the source patches are there too. my changes are extremely simple, but i did them over MG8 and unfortunately they depend on it. i guess it'd be trivial to rebase them on the standard kernels but i don't see the point since the MG8 changes are welcome anyway. i won't do it but you can.

the now unused hardware state tracking machinery could be disabled and optimized away, but i don't know the kernel and i don't want to risk introducing a bug. there are even a couple of assembler instructions i could've eliminated around my edit, but since i don't know a word of arm assembler i chose not to make any unnecessary changes that could have unexpected side effects; unexpected for me at least.

---------- Post added at 05:49 AM ---------- Previous post was at 05:27 AM ----------

Zeitferne and Lanchon,
For the sake of completeness, could both of you please post all the patches/cherry picked GIT commits (with hyperlinks) that you applied to the kernel (even though they did not fix the FPU corruption) ? Apparently there are a lot of things that were just lurking undetected with who knows what kind of side-effects (fixed or unfixed). Hopefully kernel/device maintainers will find this knowledge valuable.

---------- Post added at 10:21 PM ---------- Previous post was at 10:18 PM ----------


Any experiences with the OnePlus One?

but i did that already. it's here:
http://xdaforums.com/showpost.php?p=57577303&postcount=775
 
Last edited:

Top Liked Posts

  • There are no posts matching your filters.
  • 55
    CM11-M10 Music Bug Fix

    this is just sdcard.c compiled with "-mfloat-abi=soft" instead of "-mfloat-abi=softfp". disassembly confirms that regular instead of FP registers are used to temporarily save unique.

    sdcard.c is usually left alone in roms. so although this fix is for CM11-M10, it probably works fine on all kitkat roms.
    @GidiK, could you please test? just flash from recovery.
    thank you!
    42
    thanks guys, this is great!! pheww... that was a *****

    i'll clean up and post everything later cause i'm busy now, but here's what did it:
    on SMP arm (eg: our case, multicore), the FPU state is saved eagerly on context switch out and restored lazily on context switch in. a given time slice starts with the FPU disabled, and only if the process later touches the FPU during the slice, the kernel restores the state in a trap (and only then it will have to save the state when the slice is up).

    the state is saved in ram but also left there in the disabled FPU registers. it may happen that when the kernel decides to restore the FPU state saved in ram to a given disabled FPU, by chance the state it wants to load is already present in there. to take advantage of this lucky situation, the kernel tracks the leftover state in the disabled FPUs and optimizes the load away when it can prove that it's not needed. (the trap is still needed to enable the FPU, so the time saved is really not that much.)

    in our case somehow the tracking fails (across power management state changes, or during CPU migration of tasks, or who knows), the wrong decision is made and the state is not restored when it's actually needed. FR has a bunch of mainline fixes for FPU corruption cases (same as MG8), but the change that did away with this problem is simply disabling this state restore optimization and always load the state from ram.

    this looks like a full fix. it does have a performance impact, but only for processes that use the FPU, and only when they are running FPU-uncontested in a core. in any case, i'd guess the performance impact of affected processes is probably less than 1%.


    now i'd really like someone with a 4412 device (eg: S3 international) to properly test FPBug2.apk (screen off) and verify that it's truly not affected.

    ---------- Post added at 02:22 PM ---------- Previous post was at 02:16 PM ----------

    I must be doing something wrong. Flashed the kernel, but I am stuck at the boot animations screen (rest of the system is still Omni, no data-wipe)

    this is a CM kernel, no reason it should work in Omni

    This only patches the sdcard daemon, and thus doesn't affect testing with Lanchon's FPBug App, but it would of course affect tests that depend on playing music/copying files.

    that's right!

    ---------- Post added at 02:28 PM ---------- Previous post was at 02:22 PM ----------

    i'm curious as to the performance impact, see if it's worth the effort to restore the optimization.

    if somebody wants to test:
    after reboot please run antutu 2 or 3 times with each kernel (FR and MG8) and report
    thanks!!!
    36
    If you want to have a workaround until this bug is properly fixed, try the attached flashable zip.

    Warning: tested on i9100 variant only.
    Warning2: since we are dealing with an obscure memory/variable corruption bug, it is possible that something else goes wrong with the workaround.
    To go back to original, flash your rom zip again (it will overwrite the sdcard binary)

    For me, this workaround fixes any issues and the file copy testcase I have used to debug this bug has now been passing several times and across reboots

    What this workaround does:
    Code:
    $ git diff sdcard.c
    diff --git a/sdcard/sdcard.c b/sdcard/sdcard.c
    index 989ca00..67e5910 100644
    --- a/sdcard/sdcard.c
    +++ b/sdcard/sdcard.c
    @@ -1214,9 +1214,10 @@ static int handle_read(struct fuse* fuse, struct fuse_handler* handler,
             const struct fuse_in_header* hdr, const struct fuse_read_in* req)
     {
         struct handle *h = id_to_ptr(req->fh);
    -    __u64 unique = hdr->unique;
    +    volatile __u64 vars64[2];
    +    vars64[0] = hdr->unique;
         __u32 size = req->size;
    -    __u64 offset = req->offset;
    +    vars64[1] = req->offset;
         int res;
     
         /* Don't access any other fields of hdr or req beyond this point, the read buffer
    @@ -1224,15 +1225,15 @@ static int handle_read(struct fuse* fuse, struct fuse_handler* handler,
          * saves us 128KB per request handler thread at the cost of this scary comment. */
     
         TRACE("[%d] READ %p(%d) %u@%llu\n", handler->token,
    -            h, h->fd, size, offset);
    +            h, h->fd, size, vars64[1]);
         if (size > sizeof(handler->read_buffer)) {
             return -EINVAL;
         }
    -    res = pread64(h->fd, handler->read_buffer, size, offset);
    +    res = pread64(h->fd, handler->read_buffer, size, vars64[1]);
         if (res < 0) {
             return -errno;
         }
    -    fuse_reply(fuse, unique, handler->read_buffer, res);
    +    fuse_reply(fuse, vars64[0], handler->read_buffer, res);
         return NO_STATUS;
     }

    update: made the update script a bit safer
    35
    Alternative fix, one level deeper

    After my investigations in my post above, I just had to put the pieces together:
    in our case somehow the tracking fails (across power management state changes, or during CPU migration of tasks, or who knows)

    I based my kernel on @Lanchon's (this one), removed the last commit that always reloads the FPU and also merged three other commits related to CPU_PM. Here is the gist with the patches:
    https://gist.github.com/9fb17c20e635bbffcb7f

    I hope the corruption does not come back after a second restart ;) but so far the FPBug app displayed nothing.

    I don't know if these low power states are supposed to corrupt the FPU registers, but if they are, this fix should actually adress the root cause.

    EDIT: I have added a flashable ZIP with this kernel (MD5: 55b591535854b66ef0de3245183eb33e).
    EDIT2: Updated the ZIP to include driver modules, so that WiFi now works. New MD5: 4d9cd2b9021a4652f7d14ea5a101291d.
    EDIT3: Fixed a potential bug. Since the kernel was running fine for (almost?) everyone, users don't need to update, but kernel developers should take a look at the change: https://gist.github.com/Oberon00/9f...xynos-call-pm-notifiers-w-irqs-disabled-patch. New MD5: 4852de8e7c3b77878290a72519b8004d

    EDIT4: Added a flashable ZIP for Note1/N7000. MD5: df4fb01a03f10ac88309616cddc787ce. WARNING: As I do not own this phone, I cannot test the kernel or the flash-procedure there. Better make a backup before you flash it!

    For a full patchset (includes Lanchon's required changes), see here: http://xdaforums.com/showpost.php?p=57654708&postcount=833.