[Bug report] Music player stops playing music from sd after some time

Search This thread

zeitferne

Senior Member
Jul 15, 2014
153
684
Upper Austria
oberon00.github.io
i think the best way to proceed would be to make a native synthetic workload that reliably triggers and detects the bug (see -fstack-protector-all) independent of android to test stock kernels and the old abandoned 4210 cm kernel.

The stack protector was actually a great idea! :good: CM smdk4412 kernel on my SGS2 i9100 with the stack protector enabled (the normal kernel option CONFIG_CC_STACKPROTECTOR=y) crashes right while booting (full(er) last_kmsg):
Code:
<0>[   26.412257] c1 Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: bf013a2c
<0>[   26.412315] c1 
<4>[   26.412334] c1 Backtrace: 
<4>[   26.412382] c1 [<c064e5b8>] (dump_backtrace+0x0/0x10c) from [<c0b91e6c>] (dump_stack+0x18/0x1c)
<4>[   26.412439] c1  r6:e211e820 r5:c0ed4760 r4:c0f5c940 r3:271aed5c
<4>[   26.412496] c1 [<c0b91e54>] (dump_stack+0x0/0x1c) from [<c0b92204>] (panic+0x80/0x1ac)
<4>[   26.412561] c1 [<c0b92184>] (panic+0x0/0x1ac) from [<c0684be0>] (init_oops_id+0x0/0x58)
<4>[   26.412613] c1  r3:271aed5c r2:271aed00 r1:bf013a2c r0:c0cb8880
<4>[   26.412663] c1  r7:e273bc32
<4>[   26.412742] c1 [<c0684bc4>] (__stack_chk_fail+0x0/0x1c) from [<bf013a2c>] (dhd_write_macaddr+0x2e4/0x310 [dhd])
<4>[   26.412864] c1 [<bf013748>] (dhd_write_macaddr+0x0/0x310 [dhd]) from [<bf01a554>] (dhd_bus_start+0x1a4/0x2e0 [dhd])
<4>[   26.412985] c1 [<bf01a3b0>] (dhd_bus_start+0x0/0x2e0 [dhd]) from [<bf020558>] (dhdsdio_probe+0x4a4/0x72c [dhd])
<4>[   26.413097] c1 [<bf0200b4>] (dhdsdio_probe+0x0/0x72c [dhd]) from [<bf00c0ec>] (bcmsdh_probe+0xf8/0x150 [dhd])
<4>[   26.413206] c1 [<bf00bff4>] (bcmsdh_probe+0x0/0x150 [dhd]) from [<bf00e038>] (bcmsdh_sdmmc_probe+0x54/0xbc [dhd])
<4>[   26.413304] c1 [<bf00dfe4>] (bcmsdh_sdmmc_probe+0x0/0xbc [dhd]) from [<c09a7fe8>] (sdio_bus_probe+0xfc/0x108)
<4>[   26.413368] c1  r5:e2d97000 r4:e2d97008
<4>[   26.413414] c1 [<c09a7eec>] (sdio_bus_probe+0x0/0x108) from [<c0896764>] (driver_probe_device+0x94/0x1a8)
<4>[   26.413474] c1  r8:00000000 r7:bf067414 r6:e2d9703c r5:c0f6ddb8 r4:e2d97008
<4>[   26.413531] c1 r3:c09a7eec
<4>[   26.413563] c1 [<c08966d0>] (driver_probe_device+0x0/0x1a8) from [<c089690c>] (__driver_attach+0x94/0x98)
<4>[   26.413624] c1  r7:e2e631e0 r6:e2d9703c r5:bf067414 r4:e2d97008
<4>[   26.413683] c1 [<c0896878>] (__driver_attach+0x0/0x98) from [<c0895678>] (bus_for_each_dev+0x4c/0x94)
<4>[   26.413742] c1  r6:c0896878 r5:bf067414 r4:00000000 r3:c0896878
<4>[   26.413799] c1 [<c089562c>] (bus_for_each_dev+0x0/0x94) from [<c0896428>] (driver_attach+0x24/0x28)
<4>[   26.413857] c1  r6:c0f02af0 r5:bf067414 r4:bf067414
<4>[   26.413904] c1 [<c0896404>] (driver_attach+0x0/0x28) from [<c08960c8>] (bus_add_driver+0x180/0x250)
<4>[   26.413970] c1 [<c0895f48>] (bus_add_driver+0x0/0x250) from [<c0896e14>] (driver_register+0x80/0x150)
<4>[   26.414037] c1 [<c0896d94>] (driver_register+0x0/0x150) from [<c09a8128>] (sdio_register_driver+0x2c/0x30)
<4>[   26.414131] c1 [<c09a80fc>] (sdio_register_driver+0x0/0x30) from [<bf00e250>] (sdio_function_init+0x3c/0x8c [dhd])
<4>[   26.414244] c1 [<bf00e214>] (sdio_function_init+0x0/0x8c [dhd]) from [<bf00c19c>] (bcmsdh_register+0x1c/0x24 [dhd])
<4>[   26.414311] c1  r5:00000004 r4:bf06a3c4
<4>[   26.414398] c1 [<bf00c180>] (bcmsdh_register+0x0/0x24 [dhd]) from [<bf027990>] (dhd_bus_register+0x24/0x48 [dhd])
<4>[   26.414515] c1 [<bf02796c>] (dhd_bus_register+0x0/0x48 [dhd]) from [<bf07618c>] (init_module+0x18c/0x284 [dhd])
<4>[   26.414610] c1 [<bf076000>] (init_module+0x0/0x284 [dhd]) from [<c06448f8>] (do_one_initcall+0x128/0x1a8)
<4>[   26.414683] c1 [<c06447d0>] (do_one_initcall+0x0/0x1a8) from [<c06b9710>] (sys_init_module+0xdf8/0x1b1c)
<4>[   26.414756] c1 [<c06b8918>] (sys_init_module+0x0/0x1b1c) from [<c064a8c0>] (ret_fast_syscall+0x0/0x30)
<2>[   26.414861] c0 CPU0: stopping
<4>[   26.414886] c0 Backtrace: 
<4>[   26.414920] c0 [<c064e5b8>] (dump_backtrace+0x0/0x10c) from [<c0b91e6c>] (dump_stack+0x18/0x1c)
<4>[   26.414977] c0  r6:c0d54000 r5:c0eb5d08 r4:00000006 r3:271aed5c
<4>[   26.415039] c0 [<c0b91e54>] (dump_stack+0x0/0x1c) from [<c06444bc>] (do_IPI+0x258/0x29c)
<4>[   26.415102] c0 [<c0644264>] (do_IPI+0x0/0x29c) from [<c064a340>] (__irq_svc+0x80/0x130)
<4>[   26.415156] c0 Exception stack(0xc0d55ef0 to 0xc0d55f38)
<4>[   26.415197] c0 5ee0:                                     3b9ac9ff 540deacd 01c99e53 00072679
<4>[   26.415258] c0 5f00: c0f5a468 00000000 c0d54000 00000000 c1b540a8 412fc091 00000000 c0d55f64
<4>[   26.415317] c0 5f20: 540deacd c0d55f38 c06aa768 c065bd78 20000013 ffffffff
<4>[   26.415380] c0 [<c065bd3c>] (exynos4_enter_idle+0x0/0x174) from [<c099a890>] (cpuidle_idle_call+0xa4/0x120)
<4>[   26.415442] c0  r7:00000000 r6:00000001 r5:c0f815ac r4:c1b540b8
<4>[   26.415498] c0 [<c099a7ec>] (cpuidle_idle_call+0x0/0x120) from [<c064bd40>] (cpu_idle+0xc4/0x100)
<4>[   26.415554] c0  r8:4000406a r7:c0ba09a8 r6:c0f59ec4 r5:c0ebd8c4 r4:c0d54000
<4>[   26.415610] c0 r3:c099a7ec
<4>[   26.415641] c0 [<c064bc7c>] (cpu_idle+0x0/0x100) from [<c0b83238>] (rest_init+0x8c/0xa4)
<4>[   26.415694] c0  r7:c1b51180 r6:c0f59e00 r5:00000002 r4:c0d54000
<4>[   26.415752] c0 [<c0b831ac>] (rest_init+0x0/0xa4) from [<c00089c4>] (start_kernel+0x2dc/0x330)
<4>[   26.415807] c0  r5:c063d944 r4:c0eb5d34
<4>[   26.415845] c0 [<c00086e8>] (start_kernel+0x0/0x330) from [<40008044>] (0x40008044)
 

Lanchon

Senior Member
Jun 19, 2011
2,751
4,487
lol, amazing that android boots at all!

still, a synthetic workload would be better imho. we could run it on stock and other roms and see if any is not affected. i dont have a 4210 device so i cant test anything. please advice if you d like help coming up with a suitable workload.
 

HippyTed

Senior Member
Jul 17, 2012
112
45
Nottingham
The stack protector was actually a great idea! :good: CM smdk4412 kernel on my SGS2 i9100 with the stack protector enabled (the normal kernel option CONFIG_CC_STACKPROTECTOR=y) crashes right while booting (full(er) last_kmsg):
Code:
<0>[   26.412257] c1 Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: bf013a2c
<0>[   26.412315] c1 
<4>[   26.412334] c1 Backtrace: 
<4>[   26.412382] c1 [<c064e5b8>] (dump_backtrace+0x0/0x10c) from [<c0b91e6c>] (dump_stack+0x18/0x1c)
<4>[   26.412439] c1  r6:e211e820 r5:c0ed4760 r4:c0f5c940 r3:271aed5c
<4>[   26.412496] c1 [<c0b91e54>] (dump_stack+0x0/0x1c) from [<c0b92204>] (panic+0x80/0x1ac)
<4>[   26.412561] c1 [<c0b92184>] (panic+0x0/0x1ac) from [<c0684be0>] (init_oops_id+0x0/0x58)
<4>[   26.412613] c1  r3:271aed5c r2:271aed00 r1:bf013a2c r0:c0cb8880
<4>[   26.412663] c1  r7:e273bc32
<4>[   26.412742] c1 [<c0684bc4>] (__stack_chk_fail+0x0/0x1c) from [<bf013a2c>] (dhd_write_macaddr+0x2e4/0x310 [dhd])
<4>[   26.412864] c1 [<bf013748>] (dhd_write_macaddr+0x0/0x310 [dhd]) from [<bf01a554>] (dhd_bus_start+0x1a4/0x2e0 [dhd])
<4>[   26.412985] c1 [<bf01a3b0>] (dhd_bus_start+0x0/0x2e0 [dhd]) from [<bf020558>] (dhdsdio_probe+0x4a4/0x72c [dhd])
<4>[   26.413097] c1 [<bf0200b4>] (dhdsdio_probe+0x0/0x72c [dhd]) from [<bf00c0ec>] (bcmsdh_probe+0xf8/0x150 [dhd])
<4>[   26.413206] c1 [<bf00bff4>] (bcmsdh_probe+0x0/0x150 [dhd]) from [<bf00e038>] (bcmsdh_sdmmc_probe+0x54/0xbc [dhd])
<4>[   26.413304] c1 [<bf00dfe4>] (bcmsdh_sdmmc_probe+0x0/0xbc [dhd]) from [<c09a7fe8>] (sdio_bus_probe+0xfc/0x108)
<4>[   26.413368] c1  r5:e2d97000 r4:e2d97008
<4>[   26.413414] c1 [<c09a7eec>] (sdio_bus_probe+0x0/0x108) from [<c0896764>] (driver_probe_device+0x94/0x1a8)
<4>[   26.413474] c1  r8:00000000 r7:bf067414 r6:e2d9703c r5:c0f6ddb8 r4:e2d97008
<4>[   26.413531] c1 r3:c09a7eec
<4>[   26.413563] c1 [<c08966d0>] (driver_probe_device+0x0/0x1a8) from [<c089690c>] (__driver_attach+0x94/0x98)
<4>[   26.413624] c1  r7:e2e631e0 r6:e2d9703c r5:bf067414 r4:e2d97008
<4>[   26.413683] c1 [<c0896878>] (__driver_attach+0x0/0x98) from [<c0895678>] (bus_for_each_dev+0x4c/0x94)
<4>[   26.413742] c1  r6:c0896878 r5:bf067414 r4:00000000 r3:c0896878
<4>[   26.413799] c1 [<c089562c>] (bus_for_each_dev+0x0/0x94) from [<c0896428>] (driver_attach+0x24/0x28)
<4>[   26.413857] c1  r6:c0f02af0 r5:bf067414 r4:bf067414
<4>[   26.413904] c1 [<c0896404>] (driver_attach+0x0/0x28) from [<c08960c8>] (bus_add_driver+0x180/0x250)
<4>[   26.413970] c1 [<c0895f48>] (bus_add_driver+0x0/0x250) from [<c0896e14>] (driver_register+0x80/0x150)
<4>[   26.414037] c1 [<c0896d94>] (driver_register+0x0/0x150) from [<c09a8128>] (sdio_register_driver+0x2c/0x30)
<4>[   26.414131] c1 [<c09a80fc>] (sdio_register_driver+0x0/0x30) from [<bf00e250>] (sdio_function_init+0x3c/0x8c [dhd])
<4>[   26.414244] c1 [<bf00e214>] (sdio_function_init+0x0/0x8c [dhd]) from [<bf00c19c>] (bcmsdh_register+0x1c/0x24 [dhd])
<4>[   26.414311] c1  r5:00000004 r4:bf06a3c4
<4>[   26.414398] c1 [<bf00c180>] (bcmsdh_register+0x0/0x24 [dhd]) from [<bf027990>] (dhd_bus_register+0x24/0x48 [dhd])
<4>[   26.414515] c1 [<bf02796c>] (dhd_bus_register+0x0/0x48 [dhd]) from [<bf07618c>] (init_module+0x18c/0x284 [dhd])
<4>[   26.414610] c1 [<bf076000>] (init_module+0x0/0x284 [dhd]) from [<c06448f8>] (do_one_initcall+0x128/0x1a8)
<4>[   26.414683] c1 [<c06447d0>] (do_one_initcall+0x0/0x1a8) from [<c06b9710>] (sys_init_module+0xdf8/0x1b1c)
<4>[   26.414756] c1 [<c06b8918>] (sys_init_module+0x0/0x1b1c) from [<c064a8c0>] (ret_fast_syscall+0x0/0x30)
<2>[   26.414861] c0 CPU0: stopping
<4>[   26.414886] c0 Backtrace: 
<4>[   26.414920] c0 [<c064e5b8>] (dump_backtrace+0x0/0x10c) from [<c0b91e6c>] (dump_stack+0x18/0x1c)
<4>[   26.414977] c0  r6:c0d54000 r5:c0eb5d08 r4:00000006 r3:271aed5c
<4>[   26.415039] c0 [<c0b91e54>] (dump_stack+0x0/0x1c) from [<c06444bc>] (do_IPI+0x258/0x29c)
<4>[   26.415102] c0 [<c0644264>] (do_IPI+0x0/0x29c) from [<c064a340>] (__irq_svc+0x80/0x130)
<4>[   26.415156] c0 Exception stack(0xc0d55ef0 to 0xc0d55f38)
<4>[   26.415197] c0 5ee0:                                     3b9ac9ff 540deacd 01c99e53 00072679
<4>[   26.415258] c0 5f00: c0f5a468 00000000 c0d54000 00000000 c1b540a8 412fc091 00000000 c0d55f64
<4>[   26.415317] c0 5f20: 540deacd c0d55f38 c06aa768 c065bd78 20000013 ffffffff
<4>[   26.415380] c0 [<c065bd3c>] (exynos4_enter_idle+0x0/0x174) from [<c099a890>] (cpuidle_idle_call+0xa4/0x120)
<4>[   26.415442] c0  r7:00000000 r6:00000001 r5:c0f815ac r4:c1b540b8
<4>[   26.415498] c0 [<c099a7ec>] (cpuidle_idle_call+0x0/0x120) from [<c064bd40>] (cpu_idle+0xc4/0x100)
<4>[   26.415554] c0  r8:4000406a r7:c0ba09a8 r6:c0f59ec4 r5:c0ebd8c4 r4:c0d54000
<4>[   26.415610] c0 r3:c099a7ec
<4>[   26.415641] c0 [<c064bc7c>] (cpu_idle+0x0/0x100) from [<c0b83238>] (rest_init+0x8c/0xa4)
<4>[   26.415694] c0  r7:c1b51180 r6:c0f59e00 r5:00000002 r4:c0d54000
<4>[   26.415752] c0 [<c0b831ac>] (rest_init+0x0/0xa4) from [<c00089c4>] (start_kernel+0x2dc/0x330)
<4>[   26.415807] c0  r5:c063d944 r4:c0eb5d34
<4>[   26.415845] c0 [<c00086e8>] (start_kernel+0x0/0x330) from [<40008044>] (0x40008044)
Sorry if I'm teaching a grandmother how to suck eggs! :cool:I see dhd_write_macaddr() comes from drivers/net/wireless/bcmdhd/dhd_custom_sec.c which ends up in the dhd.ko kernel module.

T
rying to run
prebuilt/linux-x86/toolchain/arm-eabi-4.4.3/arm-eabi/bin/objdump -d out/target/product/i9100/system/lib/modules/dhd.ko | less

Doesnt seem to tie up for me. Maybe I'm using a different compiler than you, and/or the stack protector has changed the addresses.
Have you managed to see why it derps? I dont really know ARM assembly language much, anyway. :cool:
 
Last edited:

Lanchon

Senior Member
Jun 19, 2011
2,751
4,487
Sorry if I'm teaching a grandmother how to suck eggs! :cool:I see dhd_write_macaddr() comes from drivers/net/wireless/bcmdhd/dhd_custom_sec.c which ends up in the dhd.ko kernel module.

T
rying to run
prebuilt/linux-x86/toolchain/arm-eabi-4.4.3/arm-eabi/bin/objdump -d out/target/product/i9100/system/lib/modules/dhd.ko | less

Doesnt seem to tie up for me. Maybe I'm using a different compiler than you, and/or the stack protector has changed the addresses.
Have you managed to see why it derps? I dont really know ARM assembly language much, anyway. :cool:

the failure is probably non deterministic and happens when a hardware interrupt is triggered. by the way @zeitferne, can you confirm that the kernel dies in a random place each time?
 

zeitferne

Senior Member
Jul 15, 2014
153
684
Upper Austria
oberon00.github.io
the failure is probably non deterministic and happens when a hardware interrupt is triggered. by the way @zeitferne, can you confirm that the kernel dies in a random place each time?

Sorry, this is a very deterministic and probably unrelated bug. In drivers/net/wireless/bcmdhd/dhd_custom_sec.c:11032 replace
Code:
char buf[18]   = {0};
with
Code:
char buf[sizeof("00:11:22:33:44:55\n")]   = {0};
(where the sizeof expression evaluates to 19).
This was "just" an off-by-one error in the wireless driver initialization, causing the sprintf in line 1110 to write a rogue null byte.

After changing this, the kernel boots fine, but still exhibitis the sdcard bug. Even enabling -fstack-protector-all (which required changing the option from -fstack-protector in arch/arm/Makefile:41 and adding -fno-stack-protector add arch/arm/boot/compressed/Makefile:103 to avoid linker errors) does not find any more stack corruptions, even while triggering the sdcard bug.
 
  • Like
Reactions: HippyTed

Lanchon

Senior Member
Jun 19, 2011
2,751
4,487
Sorry, this is a very deterministic and probably unrelated bug. In drivers/net/wireless/bcmdhd/dhd_custom_sec.c:11032 replace
Code:
char buf[18]   = {0};
with
Code:
char buf[sizeof("00:11:22:33:44:55\n")]   = {0};
(where the sizeof expression evaluates to 19).
This was "just" an off-by-one error in the wireless driver initialization, causing the sprintf in line 1110 to write a rogue null byte.

After changing this, the kernel boots fine, but still exhibitis the sdcard bug. Even enabling -fstack-protector-all (which required changing the option from -fstack-protector in arch/arm/Makefile:41 and adding -fno-stack-protector add arch/arm/boot/compressed/Makefile:103 to avoid linker errors) does not find any more stack corruptions, even while triggering the sdcard bug.

well the kernel might not be the best place to stack-protect. i guess building sdcard.c with stack protect should trigger the protection fault. but sdcard is not portable, we should find better workload.
 

zeitferne

Senior Member
Jul 15, 2014
153
684
Upper Austria
oberon00.github.io
well the kernel might not be the best place to stack-protect. i guess building sdcard.c with stack protect should trigger the protection fault. but sdcard is not portable, we should find better workload.

Well, I don't know very much about Android or Linux kernel development but I can certainly test things and probably also build them. If you come up with something, I would be happy to try it. If I should also test it on Stock, it would be best in APK form and without requiring root, so that I can test in on my brothers SGS2 w/o flashing around ;) (I can't get nandroid to restore one ROM from the other).

But I also think that more important than checking if the bug appears on Stock or on the 4210 kernel is actually finding the bug. There are so many differences between these kernels that I don't know if this would be very much help in locating the bug. Still interesting to know though.

EDIT: I submitted the wireless driver fix to CyanogenMod's Gerrit: http://review.cyanogenmod.org/#/c/72657/
 
Last edited:

baldaz

Member
Sep 6, 2012
5
0
If we are in the section of the forum regarding the OMNI ROM why not submit this patch also in Gerrit OMNI?
 

zeitferne

Senior Member
Jul 15, 2014
153
684
Upper Austria
oberon00.github.io
If we are in the section of the forum regarding the OMNI ROM why not submit this patch also in Gerrit OMNI?

I would need to download the whole repo which takes hours with my bandwidth. Also I haven't tested with OmniROM (you can never be sure, although this change should be more than safe). But anyone can commit this change to OmniROM, git commit even has a --author option for giving proper credit (although this change does not hold much original value ;) ).
 

Lanchon

Senior Member
Jun 19, 2011
2,751
4,487
I also think that more important than checking if the bug appears on Stock or on the 4210 kernel is actually finding the bug. There are so many differences between these kernels that I don't know if this would be very much help in locating the bug.

if we have source without the bug (say, sammy's) then *maybe* we could find the bug in the current kernel. if all kernels have the bug, then this is probably a hardware issue for which no config or toolchain workaround (aka errata) was ever developed and could even be impossible to develop. think defective hardware, and since sammy is unlikely to replace old S2s, class action :)
 

Lanchon

Senior Member
Jun 19, 2011
2,751
4,487
Well, I don't know very much about Android or Linux kernel development but I can certainly test things and probably also build them. If you come up with something, I would be happy to try it. If I should also test it on Stock, it would be best in APK form and without requiring root, so that I can test in on my brothers SGS2 w/o flashing around ;) (I can't get nandroid to restore one ROM from the other).

hi, i completely forgot about this!

just wrote the silliest workload. it is single threaded. it doesnt invoke the kernel nor causes usb, eMMC nor sdcard transfers, so in order to provide the interruptions needed for corruption something must be going on concurrently: a usb transfer? an eMMC benchmark? dd if=/dev/<emmc_partition> of=/dev/null bs=...? i leave it up to you! :)

it does check for stack corruption, but -fstack-protector-all would be welcome too. here it is;

stack-test.c:
Code:
#include <stdbool.h>
#include <stdlib.h>
#include <stdio.h>

int depth;

typedef unsigned long long T;

T tv = 0x3f0d20d623bae889;
const T ti = 0x2f3fd9ec7013ad35;

void f(bool first) {
    if (depth) {
        T t = (tv += ti);
        depth--;
        f(first);
        if (first) printf(".");
        f(false);
        depth++;
        if (t != tv) {
            printf("\nCORRUPTION\n");
            exit(1);
        }
        tv -= ti;
    }
}

void main() {
    setbuf(stdout, NULL);
    for (;;) {
        f(true);
        printf("%i\n", depth);
        depth++;
    }
}

try it if you can and tell us if it triggers the bug and what you do concurrently to trigger the bug. thanks!
 

GidiK

Senior Member
Jan 8, 2013
113
103
Best
try it if you can and tell us if it triggers the bug and what you do concurrently to trigger the bug. thanks!

Unfortunately, this didn't reproduce the problem on my device. But then, I could never reliably reproduce it with playing music for a long time either.

Anyway, can people who can reproduce the problem on a i9100, are willing to test and can build, please cherrypick and test with this: https://gerrit.omnirom.org/#/c/9615/ (as always at your own risk). It reverts some changes I found that have been made over the years which could be influencing the behaviour we see. It survived my stress tests while running the above program, but then, so does the normal build occasionally.
 
  • Like
Reactions: Lanchon and ajislav

zeitferne

Senior Member
Jul 15, 2014
153
684
Upper Austria
oberon00.github.io
Anyway, can people who can reproduce the problem on a i9100, are willing to test and can build, please cherrypick and test with this: https://gerrit.omnirom.org/#/c/9615/ (as always at your own risk).
I tested it on my i9100 (manually applied the changes to Cyanogenmod's smdk4412 kernel) and sadly the problem seems not to be fixed, as I was able to reproduce the problem with rsync again. But I just checked KERNEL_OBJ/include/generated/autoconf.h and it seems that CONFIG_FORCE_MAX_ZONEORDER was reset to 12. Removing the
Code:
default "12" if ARCH_EXYNOS
line in arch/arm/Kconfig:1641 seems to have fixed this. I am currently compiling again. EDIT: Nope, does not fix the problem either.

Maybe one could also try to remove the cortex-a9 flags from the build of the sdcard daemon, though according to Wikipedia the smdk4210 board is cortex-a9 based.
 
Last edited:

Lanchon

Senior Member
Jun 19, 2011
2,751
4,487
Unfortunately, this didn't reproduce the problem on my device. But then, I could never reliably reproduce it with playing music for a long time either.

Anyway, can people who can reproduce the problem on a i9100, are willing to test and can build, please cherrypick and test with this: https://gerrit.omnirom.org/#/c/9615/ (as always at your own risk). It reverts some changes I found that have been made over the years which could be influencing the behaviour we see. It survived my stress tests while running the above program, but then, so does the normal build occasionally.

before trying to revert changes to some samsung kernel itd be worth it to verify that that kernel is not affected; otherwise we are going around blind. we need some way to test this on all kernels.
 
  • Like
Reactions: Nixda99

GidiK

Senior Member
Jan 8, 2013
113
103
Best
I tested it on my i9100 (manually applied the changes to Cyanogenmod's smdk4412 kernel) and sadly the problem seems not to be fixed, as I was able to reproduce the problem with rsync again. But I just checked KERNEL_OBJ/include/generated/autoconf.h and it seems that CONFIG_FORCE_MAX_ZONEORDER was reset to 12. Removing the
Code:
default "12" if ARCH_EXYNOS
line in arch/arm/Kconfig:1641 seems to have fixed this. I am currently compiling again. EDIT: Nope, does not fix the problem either.

Maybe one could also try to remove the cortex-a9 flags from the build of the sdcard daemon, though according to Wikipedia the smdk4210 board is cortex-a9 based.

That's a shame. It was a long shot anyway. Thanks for testing!

You're right about the Kconfig thing; I should have changed the default as well.

The board does indeed have Cortex A9 cores, so the compile flag is nothing weird. What I thought was suspicious was that in no official released kernel sources for any exynos4 device (phones, tablets, dev board) I could find it being used. So I thought maybe they know something we don't...

There's no point in me removing the cortex-a9 flag from the ROM build, since I can't reproduce the problem anyway... And the kernel is currently running fine with the modification; subjectively it feels better already...

Edit: by the way, are you sure the rsync test reveals the same bug? I tried to do it your way, but I don't have rsync, so I used cp. I had the same experience: it went ahead fine and then it just got stuck and the cp used loads of cpu without anything real happening. However, here's the difference: the cp process was using the cpu's and not sdcard (fuse) as in the bug. Since you used the paths that go via fuse, you'd expect fuse to freeze up, not rsync/cp. Also, while frozen, I was able to access files on the card quite fine and even play music. And testing this long enough without breaking it off, I noticed that the cp operation eventually finished and even all files were accounted for. So, weird behaviour because it shouldn't get that slow, but not actually stuck like in this bug. I did the same test using the paths that bypass fuse (so cp was taking the role of fuse, proven by the fact that cp -a produced no chmod error messages), but the effect was the same: very slow but got there in the end. When you do the rsync test over the fuse-mounted paths, is it sdcard (fuse) that gets stuck or rsync?

I'm currently doing the same test on the official nightly. It is even slower... probably really stuck: 30 minutes in and just 15 files copied... of which one twice? I'm giving up...

---------- Post added at 11:21 PM ---------- Previous post was at 10:46 PM ----------

before trying to revert changes to some samsung kernel itd be worth it to verify that that kernel is not affected; otherwise we are going around blind. we need some way to test this on all kernels.

Agreed. But how?

I've used stock firmware for more than two years, from GB 2.3.4 to JB 4.1.2, and I've never run into this problem. Loads of others, but not this. I've also used Dori's kernel under JB for while and I don't think the problem happened either. It just happened occasionally under this kernel and under KK (because of fuse). So, if we would find a way to reproduce this, and want to test the stock kernel as well, we would have to run it under KK, or in some other way with fuse. Which I assume is impossible...

Or the problem could just be because my phone is really old now (3.5 years of abuse) and the memory chips start to wear down.

Thinking about it... there have been some stock firmwares with the media scanner issue. If during its scan after startup it encountered 'a bad file', it would heat up the cpu and drain the battery. This got fixed by wiping the sdcard, getting rid of the so-called 'bad file'. I've never seen anybody explaining what such a 'bad file' looked like... Maybe that was actually the media scanner getting in the same bother that fuse is getting into now? Does that make sense? I always assumed that was a different issue.
 
Last edited:

Lanchon

Senior Member
Jun 19, 2011
2,751
4,487
dram getting old? no such thing.

4210: bug
4212: no bug
this is a SoC issue for sure

sdcard.c:
-triggers interrupts (reads eMMC)
-sensitive (crashes on corruption)
-does not heal (no restart)
-very noticeable (sdcard access gone for all apps)

other than that, there is nothing special about it. we can make a workload, a non root app preferably, to reproduce the bug.

i made a first try at a simple workload. but we need a proper app if we want people to test on stock. and i dont have any of the affected devices, so i cant test. unless someone tests, we cant progress. running my workload while reading an emmc partition would be a good start. (directly reading, not thru fuse, that way it can be done in all versions of android.)
 

CRXed

Senior Member
Jan 7, 2010
1,980
1,255
I used to be able to record video on CM11 on my S2 just fine, but lately it's impossible. Any app that records audio or video hangs immediately on pressing the record button and just creates a 0 kB file.
I soft reboot happens shortly after, since the sdcard0 is locked at this point

Could it be this same bug?

Here is a log, maybe you guys see anything unusual.

Code:
V/CAM_VideoModule( 4987): startVideoRecording
D/CameraStorage( 4987): External storage state=mounted
V/CAM_VideoModule( 4987): initializeRecorder
D/exynos_camera(  670): exynos_camera_recording_enabled(0x40d178f8)
V/CAM_VideoModule( 4987): New video filename: /storage/sdcard0/DCIM/Camera/VID_20140926_231246.mp4.tmp
W/StagefrightRecorder(  670): Target file size (12026711936 bytes) too larger than supported, clip to 4GB
D/exynos_camera(  670): exynos_camera_get_parameters(0x40d178f8)
D/exynos_camera(  670): exynos_camera_put_parameters(0x40d178f8)
D/exynos_camera(  670): exynos_camera_set_parameters(0x40d178f8, cam_mode=1;camera-mode=0;effect=none;effect-values=none,mono,negative,sepia,aqua;exposure-compensation=0;exposure-compensation-step=0.5;flash-mode=off;flash-mode-values=off,auto,on,torch;focal-length=4.03;focus-areas=(0,0,0,0,0);focus-
distances=0.15,1.20,Infinity;focus-mode=continuous-video;focus-mode-values=auto,infinity,macro,fixed,facedetect,continuous-video;full-video-snap-supported=false;gps-timestamp=1411765942;horizontal-view-angle=60.5;iso=auto;iso-values=auto,ISO50,ISO100,ISO200,ISO400,ISO800;jpeg-quality=90;jpeg-thumbna
il-height=240;jpeg-thumbnail-quality=100;jpeg-thumbnail-size-values=320x240,400x240,0x0;jpeg-thumbnail-width=320;max-exposure-compensation=4;max-num-focus-areas=1;max-zoom=30;min-exposure-compensation=-4;picture-format=jpeg;picture-format-values=jpeg;picture-size=3264x2448;picture-size-values=3264x2
448,3264x1968,2048x1536,2048x1232,1280x960,800x480,640x480;preferred-preview-size-for-video=640x480;preview-format=yuv420sp;preview-format-values=y
D/exynos_camera(  670): exynos_camera_params_apply: Preview size: 1280x720, picture size: 3264x2448, recording size: 1920x1080
D/exynos_camera(  670): exynos_camera_get_parameters(0x40d178f8)
D/exynos_camera(  670): exynos_camera_put_parameters(0x40d178f8)
D/exynos_camera(  670): exynos_camera_store_meta_data_in_buffers(0x40d178f8, 0)
E/exynos_camera(  670): exynos_camera_store_meta_data_in_buffers: Cannot disable meta-data in buffers!
D/exynos_camera(  670): exynos_camera_store_meta_data_in_buffers(0x40d178f8, 1)
D/OMXCodec(  670): Successfully allocated OMX node 'OMX.SEC.AVC.Encoder'
E/OMXCodec(  670): [OMX.SEC.AVC.Encoder] Found supported color format: 2130706434
W/OMXCodec(  670): Use baseline profile instead of 8 for AVC recording
D/TinyALSA-Audio Hardware(  670): audio_hw_get_input_buffer_size(0x4001f8b0, 48000, 1, 2)++
D/TinyALSA-Audio Mixer(  670): tinyalsa_mixer_get_input_props(0x4001f960)
D/TinyALSA-Audio Hardware(  670): audio_hw_get_input_buffer_size(0x4001f8b0, 48000, 1, 2)--
D/AudioPolicyManagerBase(  670): getInput() inputSource 5, samplingRate 48000, format 1, channelMask c, acoustics 0
D/TinyALSA-Audio Input(  670): audio_hw_open_input_stream(0x4001f8b0, -2147483644, 0x412d6868, 0x412d6854)
D/TinyALSA-Audio Mixer(  670): tinyalsa_mixer_get_input_props(0x4001f960)
D/TinyALSA-Audio Mixer(  670): tinyalsa_mixer_set_input_state(1)
D/TinyALSA-Audio Mixer(  670): tinyalsa_mixer_set_state(direction=1, state=1)++
D/TinyALSA-Audio Mixer(  670): Current state is already: 1
D/TinyALSA-Audio Mixer(  670): tinyalsa_mixer_set_device(80000004)
D/TinyALSA-Audio Mixer(  670): tinyalsa_mixer_set_route(card=0,device=-2147483644)++
D/TinyALSA-Audio Mixer(  670): tinyalsa_mixer_set_route(card=0,device=-2147483644)--
D/Yamaha-MC1N2-Audio(  670): yamaha_mc1n2_audio_set_route(80000004)
 
  • Like
Reactions: sharp87

Twiq

Senior Member
I used to be able to record video on CM11 on my S2 just fine, but lately it's impossible. Any app that records audio or video hangs immediately on pressing the record button and just creates a 0 kB file.
I soft reboot happens shortly after, since the sdcard0 is locked at this point

Could it be this same bug?

Here is a log, maybe you guys see anything unusual.

Code:
V/CAM_VideoModule( 4987): startVideoRecording
D/CameraStorage( 4987): External storage state=mounted
V/CAM_VideoModule( 4987): initializeRecorder
D/exynos_camera(  670): exynos_camera_recording_enabled(0x40d178f8)
V/CAM_VideoModule( 4987): New video filename: /storage/sdcard0/DCIM/Camera/VID_20140926_231246.mp4.tmp
W/StagefrightRecorder(  670): Target file size (12026711936 bytes) too larger than supported, clip to 4GB
D/exynos_camera(  670): exynos_camera_get_parameters(0x40d178f8)
D/exynos_camera(  670): exynos_camera_put_parameters(0x40d178f8)
D/exynos_camera(  670): exynos_camera_set_parameters(0x40d178f8, cam_mode=1;camera-mode=0;effect=none;effect-values=none,mono,negative,sepia,aqua;exposure-compensation=0;exposure-compensation-step=0.5;flash-mode=off;flash-mode-values=off,auto,on,torch;focal-length=4.03;focus-areas=(0,0,0,0,0);focus-
distances=0.15,1.20,Infinity;focus-mode=continuous-video;focus-mode-values=auto,infinity,macro,fixed,facedetect,continuous-video;full-video-snap-supported=false;gps-timestamp=1411765942;horizontal-view-angle=60.5;iso=auto;iso-values=auto,ISO50,ISO100,ISO200,ISO400,ISO800;jpeg-quality=90;jpeg-thumbna
il-height=240;jpeg-thumbnail-quality=100;jpeg-thumbnail-size-values=320x240,400x240,0x0;jpeg-thumbnail-width=320;max-exposure-compensation=4;max-num-focus-areas=1;max-zoom=30;min-exposure-compensation=-4;picture-format=jpeg;picture-format-values=jpeg;picture-size=3264x2448;picture-size-values=3264x2
448,3264x1968,2048x1536,2048x1232,1280x960,800x480,640x480;preferred-preview-size-for-video=640x480;preview-format=yuv420sp;preview-format-values=y
D/exynos_camera(  670): exynos_camera_params_apply: Preview size: 1280x720, picture size: 3264x2448, recording size: 1920x1080
D/exynos_camera(  670): exynos_camera_get_parameters(0x40d178f8)
D/exynos_camera(  670): exynos_camera_put_parameters(0x40d178f8)
D/exynos_camera(  670): exynos_camera_store_meta_data_in_buffers(0x40d178f8, 0)
E/exynos_camera(  670): exynos_camera_store_meta_data_in_buffers: Cannot disable meta-data in buffers!
D/exynos_camera(  670): exynos_camera_store_meta_data_in_buffers(0x40d178f8, 1)
D/OMXCodec(  670): Successfully allocated OMX node 'OMX.SEC.AVC.Encoder'
E/OMXCodec(  670): [OMX.SEC.AVC.Encoder] Found supported color format: 2130706434
W/OMXCodec(  670): Use baseline profile instead of 8 for AVC recording
D/TinyALSA-Audio Hardware(  670): audio_hw_get_input_buffer_size(0x4001f8b0, 48000, 1, 2)++
D/TinyALSA-Audio Mixer(  670): tinyalsa_mixer_get_input_props(0x4001f960)
D/TinyALSA-Audio Hardware(  670): audio_hw_get_input_buffer_size(0x4001f8b0, 48000, 1, 2)--
D/AudioPolicyManagerBase(  670): getInput() inputSource 5, samplingRate 48000, format 1, channelMask c, acoustics 0
D/TinyALSA-Audio Input(  670): audio_hw_open_input_stream(0x4001f8b0, -2147483644, 0x412d6868, 0x412d6854)
D/TinyALSA-Audio Mixer(  670): tinyalsa_mixer_get_input_props(0x4001f960)
D/TinyALSA-Audio Mixer(  670): tinyalsa_mixer_set_input_state(1)
D/TinyALSA-Audio Mixer(  670): tinyalsa_mixer_set_state(direction=1, state=1)++
D/TinyALSA-Audio Mixer(  670): Current state is already: 1
D/TinyALSA-Audio Mixer(  670): tinyalsa_mixer_set_device(80000004)
D/TinyALSA-Audio Mixer(  670): tinyalsa_mixer_set_route(card=0,device=-2147483644)++
D/TinyALSA-Audio Mixer(  670): tinyalsa_mixer_set_route(card=0,device=-2147483644)--
D/Yamaha-MC1N2-Audio(  670): yamaha_mc1n2_audio_set_route(80000004)

If you're using FAT32 filesystem then the maximum filesize is somewhere at 4GB. And your log tells that camera app is trying to clip your video to 4GB from originally ~12GB ish, nothing more unusual, maybe you should post the whole log
 

CRXed

Senior Member
Jan 7, 2010
1,980
1,255
If you're using FAT32 filesystem then the maximum filesize is somewhere at 4GB. And your log tells that camera app is trying to clip your video to 4GB from originally ~12GB ish, nothing more unusual, maybe you should post the whole log

This is the whole log......after this, nothing else related to camera shows in the log, until a soft reboot.

Makes no sense to make a 4GB file when I press the record button! No file was 12GB, I have no idea where it gets this number from....
 

Lanchon

Senior Member
Jun 19, 2011
2,751
4,487

i made an app. has native stacktest.c compiled with -fstack-protector-all inside it, should work on non-root.

so now people have to test. some other workload that generates interrupts has to be run concurrently: disk activity, disk benchmark, usb activity, something; get creative.
 

Attachments

  • StackTest.apk
    795.5 KB · Views: 16
  • Like
Reactions: ajislav and Twiq

Top Liked Posts

  • There are no posts matching your filters.
  • 55
    CM11-M10 Music Bug Fix

    this is just sdcard.c compiled with "-mfloat-abi=soft" instead of "-mfloat-abi=softfp". disassembly confirms that regular instead of FP registers are used to temporarily save unique.

    sdcard.c is usually left alone in roms. so although this fix is for CM11-M10, it probably works fine on all kitkat roms.
    @GidiK, could you please test? just flash from recovery.
    thank you!
    42
    thanks guys, this is great!! pheww... that was a *****

    i'll clean up and post everything later cause i'm busy now, but here's what did it:
    on SMP arm (eg: our case, multicore), the FPU state is saved eagerly on context switch out and restored lazily on context switch in. a given time slice starts with the FPU disabled, and only if the process later touches the FPU during the slice, the kernel restores the state in a trap (and only then it will have to save the state when the slice is up).

    the state is saved in ram but also left there in the disabled FPU registers. it may happen that when the kernel decides to restore the FPU state saved in ram to a given disabled FPU, by chance the state it wants to load is already present in there. to take advantage of this lucky situation, the kernel tracks the leftover state in the disabled FPUs and optimizes the load away when it can prove that it's not needed. (the trap is still needed to enable the FPU, so the time saved is really not that much.)

    in our case somehow the tracking fails (across power management state changes, or during CPU migration of tasks, or who knows), the wrong decision is made and the state is not restored when it's actually needed. FR has a bunch of mainline fixes for FPU corruption cases (same as MG8), but the change that did away with this problem is simply disabling this state restore optimization and always load the state from ram.

    this looks like a full fix. it does have a performance impact, but only for processes that use the FPU, and only when they are running FPU-uncontested in a core. in any case, i'd guess the performance impact of affected processes is probably less than 1%.


    now i'd really like someone with a 4412 device (eg: S3 international) to properly test FPBug2.apk (screen off) and verify that it's truly not affected.

    ---------- Post added at 02:22 PM ---------- Previous post was at 02:16 PM ----------

    I must be doing something wrong. Flashed the kernel, but I am stuck at the boot animations screen (rest of the system is still Omni, no data-wipe)

    this is a CM kernel, no reason it should work in Omni

    This only patches the sdcard daemon, and thus doesn't affect testing with Lanchon's FPBug App, but it would of course affect tests that depend on playing music/copying files.

    that's right!

    ---------- Post added at 02:28 PM ---------- Previous post was at 02:22 PM ----------

    i'm curious as to the performance impact, see if it's worth the effort to restore the optimization.

    if somebody wants to test:
    after reboot please run antutu 2 or 3 times with each kernel (FR and MG8) and report
    thanks!!!
    36
    If you want to have a workaround until this bug is properly fixed, try the attached flashable zip.

    Warning: tested on i9100 variant only.
    Warning2: since we are dealing with an obscure memory/variable corruption bug, it is possible that something else goes wrong with the workaround.
    To go back to original, flash your rom zip again (it will overwrite the sdcard binary)

    For me, this workaround fixes any issues and the file copy testcase I have used to debug this bug has now been passing several times and across reboots

    What this workaround does:
    Code:
    $ git diff sdcard.c
    diff --git a/sdcard/sdcard.c b/sdcard/sdcard.c
    index 989ca00..67e5910 100644
    --- a/sdcard/sdcard.c
    +++ b/sdcard/sdcard.c
    @@ -1214,9 +1214,10 @@ static int handle_read(struct fuse* fuse, struct fuse_handler* handler,
             const struct fuse_in_header* hdr, const struct fuse_read_in* req)
     {
         struct handle *h = id_to_ptr(req->fh);
    -    __u64 unique = hdr->unique;
    +    volatile __u64 vars64[2];
    +    vars64[0] = hdr->unique;
         __u32 size = req->size;
    -    __u64 offset = req->offset;
    +    vars64[1] = req->offset;
         int res;
     
         /* Don't access any other fields of hdr or req beyond this point, the read buffer
    @@ -1224,15 +1225,15 @@ static int handle_read(struct fuse* fuse, struct fuse_handler* handler,
          * saves us 128KB per request handler thread at the cost of this scary comment. */
     
         TRACE("[%d] READ %p(%d) %u@%llu\n", handler->token,
    -            h, h->fd, size, offset);
    +            h, h->fd, size, vars64[1]);
         if (size > sizeof(handler->read_buffer)) {
             return -EINVAL;
         }
    -    res = pread64(h->fd, handler->read_buffer, size, offset);
    +    res = pread64(h->fd, handler->read_buffer, size, vars64[1]);
         if (res < 0) {
             return -errno;
         }
    -    fuse_reply(fuse, unique, handler->read_buffer, res);
    +    fuse_reply(fuse, vars64[0], handler->read_buffer, res);
         return NO_STATUS;
     }

    update: made the update script a bit safer
    35
    Alternative fix, one level deeper

    After my investigations in my post above, I just had to put the pieces together:
    in our case somehow the tracking fails (across power management state changes, or during CPU migration of tasks, or who knows)

    I based my kernel on @Lanchon's (this one), removed the last commit that always reloads the FPU and also merged three other commits related to CPU_PM. Here is the gist with the patches:
    https://gist.github.com/9fb17c20e635bbffcb7f

    I hope the corruption does not come back after a second restart ;) but so far the FPBug app displayed nothing.

    I don't know if these low power states are supposed to corrupt the FPU registers, but if they are, this fix should actually adress the root cause.

    EDIT: I have added a flashable ZIP with this kernel (MD5: 55b591535854b66ef0de3245183eb33e).
    EDIT2: Updated the ZIP to include driver modules, so that WiFi now works. New MD5: 4d9cd2b9021a4652f7d14ea5a101291d.
    EDIT3: Fixed a potential bug. Since the kernel was running fine for (almost?) everyone, users don't need to update, but kernel developers should take a look at the change: https://gist.github.com/Oberon00/9f...xynos-call-pm-notifiers-w-irqs-disabled-patch. New MD5: 4852de8e7c3b77878290a72519b8004d

    EDIT4: Added a flashable ZIP for Note1/N7000. MD5: df4fb01a03f10ac88309616cddc787ce. WARNING: As I do not own this phone, I cannot test the kernel or the flash-procedure there. Better make a backup before you flash it!

    For a full patchset (includes Lanchon's required changes), see here: http://xdaforums.com/showpost.php?p=57654708&postcount=833.