Post Reply

[Bug report] Music player stops playing music from sd after some time

Yesterday, 04:45 AM   |  #551  
Member
Flag Best
Thanks Meter: 45
 
40 posts
Join Date:Joined: Jan 2013
More
I must stand corrected: I believed a few posts ago that the rsync-method was not a good method of reproducing the problem. But I guess I was wrong. Copying music files around did cause other effects, related to the media scanner, but the reading problem was in there too. To isolate the reading problem, I used the following method: in one 'adb shell':

Code:
tar cv /storage/sdcard1/TWRP > /dev/null
while looking with another 'adb shell' at:

Code:
iostat -k 5
After a few GBytes, iostat shows that reading hangs. If you try again, it only reads a few bytes and hangs again.

However if you avoid fuse, by doing in a root shell:

Code:
tar cv /mnt/media_rw/sdcard1/TWRP > /dev/null
then it invariably succeeds.

Whether this is the same problem as the sound-stops problem, I don't know, but the solution to it can be found in sdcard.c, see https://gerrit.omnirom.org/#/c/9648/. My reasoning for the problem is as follows:

In handle_read some values from the request buffers are put into local variables, which usually reside on the stack. Then, due to a 'union' (and noted by a comment), the read buffer for pread64 overwrites these original request buffers, intentionally. However, the request buffers are marked as 'const struct *', and the compiler feels free to not actually create a local variable, but use the value/address from the received structure (I assume). Then, when pread64 returns, the local variable for unique is used; but its value at the address in the request buffer is already overwritten. Usually this is ok, because it will reside in a cpu register (I assume). But if an interrupt has occurred, this may have changed. I'm quite vague on why the interrupt would not have saved all registers on stack, but maybe some assumption has crept in that it didn't need to do this, because the value was 'guaranteed' to be available somewhere already?

I tried removing the const from the request buffers, or putting a dummy buffer before the (smaller) read buffer in the union, but that didn't work. So, that's a point against the reasoning above.

However, with the submitted change, the problem is actually solved, judged by the measurement above. I'm not happy about the solution though, so if somebody has a more elegant way of solving it, please do!
The Following 7 Users Say Thank You to GidiK For This Useful Post: [ View ]
Yesterday, 12:32 PM   |  #552  
Senior Member
Thanks Meter: 206
 
423 posts
Join Date:Joined: Jun 2011
Quote:
Originally Posted by GidiK

I must stand corrected: I believed a few posts ago that the rsync-method was not a good method of reproducing the problem. But I guess I was wrong. Copying music files around did cause other effects, related to the media scanner, but the reading problem was in there too. To isolate the reading problem, I used the following method: in one 'adb shell':

Code:
tar cv /storage/sdcard1/TWRP > /dev/null
while looking with another 'adb shell' at:

Code:
iostat -k 5
After a few GBytes, iostat shows that reading hangs. If you try again, it only reads a few bytes and hangs again.

However if you avoid fuse, by doing in a root shell:

Code:
tar cv /mnt/media_rw/sdcard1/TWRP > /dev/null
then it invariably succeeds.

Whether this is the same problem as the sound-stops problem, I don't know, but the solution to it can be found in sdcard.c, see https://gerrit.omnirom.org/#/c/9648/. My reasoning for the problem is as follows:

In handle_read some values from the request buffers are put into local variables, which usually reside on the stack. Then, due to a 'union' (and noted by a comment), the read buffer for pread64 overwrites these original request buffers, intentionally. However, the request buffers are marked as 'const struct *', and the compiler feels free to not actually create a local variable, but use the value/address from the received structure (I assume). Then, when pread64 returns, the local variable for unique is used; but its value at the address in the request buffer is already overwritten. Usually this is ok, because it will reside in a cpu register (I assume). But if an interrupt has occurred, this may have changed. I'm quite vague on why the interrupt would not have saved all registers on stack, but maybe some assumption has crept in that it didn't need to do this, because the value was 'guaranteed' to be available somewhere already?

I tried removing the const from the request buffers, or putting a dummy buffer before the (smaller) read buffer in the union, but that didn't work. So, that's a point against the reasoning above.

However, with the submitted change, the problem is actually solved, judged by the measurement above. I'm not happy about the solution though, so if somebody has a more elegant way of solving it, please do!


it seems like the same bug.

its an interesting start. i was so sold on the kernel bug idea, the bug being there before kitkat, and sdcard.c being ok in other SoCs, that i didn't even look for bugs in sdcard.c.

however i now did, and i don't see a bug.

the hypothesis:
the declaration "const struct fuse_in_header* hdr" lets the compiler assume *hdr is constant, and thus can cache its value or delay reading its value any way it sees fit.

the compiler optimizes:
__u64 unique = hdr->unique;
pread64(...)
fuse_reply(...unique...)

...to:
pread64(...)
fuse_reply(...hdr->unique...)

...across the pread64() call because *hdr is constant.

problems:
1) this would be a compiler bug, not an sdcard.c bug. the compiler cannot assume *hdr is constant just because hdr is declared const T*. there could be non-const aliases, other ways to access *hdr (by name, references or pointers) that permit its modification. see here:
http://www.compileroptimizations.com..._qualifier.htm

pread64() cannot be inlined, so it must be assumed that it can potentially modify *hdr.
even if handle_read and its callers are inlined, the compiler cannot prove that *hdr is constant simply because it is not: hdr is not initialized to point to a const object.

2) more importantly, handle_read compiled code would not be 'designed' to function differently if there is an interrupt. the code would always read unique from *hdr, always getting a clobbered value. and that is not happening.

well... that is, unless most of the reads are too short to clobber. but this is not the case as we see here:
http://forum.xda-developers.com/show...php?p=51237856


so the hypothesis is wrong. something else is going on that i dont understand. your fix could be working just by chance.

for starters, lets see if simply resizing the struct kills the bug.

change:
/* To save memory, we never use the contents of the request buffer and the read
* buffer at the same time. This allows us to share the underlying storage. */
union {
__u8 request_buffer[MAX_REQUEST_SIZE];
__u8 read_buffer[MAX_READ];
};

to:
/* To save memory, we never use the contents of the request buffer and the read
* buffer at the same time. This allows us to share the underlying storage. */
union {
__u8 request_buffer[MAX_REQUEST_SIZE];
__u8 read_buffer[MAX_READ];
};
__u8 dummy_read_buffer[MAX_READ];

and see if the bug is still there.
(sorry i dont have a device to test.)

---------- Post added at 08:32 AM ---------- Previous post was at 08:27 AM ----------

i don't have the tools.

could someone please disassemble handle_read so we can see where unique is kept across pread64()?
The Following 6 Users Say Thank You to Lanchon For This Useful Post: [ View ]
Yesterday, 06:04 PM   |  #553  
Junior Member
Thanks Meter: 25
 
15 posts
Join Date:Joined: Jul 2014
if someone could read ARM assembler
Now it would be really helpful if someone could read ARM assembler… Or at least show a way to produce a decently readable listing.
Without being able to do so I can only speculate: A compiler bug is not totally unlikely, especially since flags such as -mtune=cortex-a9 are used which might lead to optimizations being made that are not that well tested. Still, interrupt handling should be the same for FAT32 and ext4 (maybe not for exFAT which supposedly uses another FUSE layer) so why would using ext4 make this bug go away? (That being said, I've only tested exFAT myself, not ext4.)

Quote:
Originally Posted by GidiK

I'm not happy about the solution though, so if somebody has a more elegant way of solving it, please do!

If it's a compiler bug, the most elegant solution would of course be to get the GCC devs to fix this.
Or when you're more concerned with efficiency than with elegance, since both buffers have the same type, simply use only one name for the buffer instead of two, removing the union (warning, this is totally just from the top of my head), i.e. #define MAX_HANDLER_BUF_SIZE (maximum of the old MAX_REQUEST_SIZE and MAX_READ) and replace request_buffer and read_buffer with a simply generically named buffer and use that everywhere in place of the old names. Should be semantically equivalent.
Yesterday, 06:13 PM   |  #554  
AndDiSa's Avatar
Senior Member
Flag Heidelberg
Thanks Meter: 336
 
1,588 posts
Join Date:Joined: Dec 2009
There is one interesting thing to note: comparing sdcard.c from omnirom/cm with sdcard.c from AOSP, there is is the following difference:
Code:
...
 static bool int_equals(void *keyA, void *keyB) {
@@ -232,7 +232,7 @@ struct fuse_handler {
      * buffer at the same time.  This allows us to share the underlying storage. */
     union {
         __u8 request_buffer[MAX_REQUEST_SIZE];
-        __u8 read_buffer[MAX_READ];
+        __u8 read_buffer[MAX_READ + PAGESIZE];
     };
 };
...
probably the buffersize is just too small.
Yesterday, 06:23 PM   |  #555  
Junior Member
Thanks Meter: 25
 
15 posts
Join Date:Joined: Jul 2014
Quote:
Originally Posted by AndDiSa

There is one interesting thing to note: comparing sdcard.c from omnirom/cm with sdcard.c from AOSP, there is is the following difference:
probably the buffersize is just too small.

Corresponding commit (from 2014-02-14) also does some more (partly) obscure changes: sdcard: direct I/O file access fix:
Quote:
Originally Posted by Commit message

If a file is opened in direct I/O mode (with O_DIRECT flag),
the read buffer addess must be aligned to memory page size
boundary. The Direct I/O is not needed for normal files,
however, some special hardware access (e.g. smart SD cards)
will not work without it.

It seems that there have been more changes to sdcard last month, as the log indicates.

"will not work" is not the most detailed description, but with some fantasy… I'm wondering, however, if O_DIRECT is used in the problematic cases. If so, this could very well be related. The next commit's message seems to indicate that it is a well behaved "will not work", namely writes returning EINVAL.
Last edited by zeitferne; Yesterday at 06:36 PM.
Yesterday, 07:24 PM   |  #556  
AndDiSa's Avatar
Senior Member
Flag Heidelberg
Thanks Meter: 336
 
1,588 posts
Join Date:Joined: Dec 2009
It's not only since last month. I replaced sdcard.c from cm/omni by sdcard.c from AOSP on my builds already in March.

Sent from my Nexus 7 using XDA Free mobile app
Post Reply Subscribe to Thread
Previous Thread Next Thread
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes