Post Reply

[KERNEL] Performance can be gained from a patch.

OP zodttd

27th December 2010, 03:38 PM   |  #1  
OP Junior Member
Thanks Meter: 46
 
5 posts
Join Date:Joined: Jun 2010
Hi,

I am the author of psx4droid. It's a PSX emulator that uses a dynarec. Due to the nature of this code I can have to invalidate the instruction cache on these Android's ARM processors. Just like Yong must do in GameBoid.

I noticed performance loss on the Evo 4G, MyTouch 4G and potentially others. As do some people running GameBoid (though this emulator lays heavy on frame limiting as it runs faster than 60 FPS so it's not noticed as much).

Both these arch's (ARCH_QSD8X50 for the Evo 4G and ARCH_MSM7X30 for the MyTouch 4G) have an oddity when it comes to flushing the "icache".

As seen in these platform's kernel sources at ./arch/arm/mm/cache-v7.S the *ENTIRE* icache is flushed on each cacheflush syscall!

The fix for my performance loss, and others, is to only clear the range specified for userland cacheflush's. While this may not help much besides apps that use cacheflush a lot like emulators, it will help these apps greatly.

My hope is someone can release ROM(s) for the Evo 4G and/or the MyTouch 4G with this fixed. Thanks!

Here's the offending code. Note "mcr p15, 0, r0, c7, c5, 0" clears the entire icache as well as others.


ENTRY(v7_coherent_user_range)
UNWIND(.fnstart )
dcache_line_size r2, r3
sub r3, r2, #1
bic r0, r0, r3
1:
USER( mcr p15, 0, r0, c7, c11, 1 )
add r0, r0, r2
2:
cmp r0, r1
blo 1b
dsb
mov r0, #0
mcr p15, 0, r0, c7, c5, 0
dsb
isb
mov pc, lr

9001:
mov r0, r0, lsr #12
mov r0, r0, lsl #12
add r0, r0, #4096
b 2b
UNWIND(.fnend )
ENDPROC(v7_coherent_kern_range)
ENDPROC(v7_coherent_user_range)



Here's a faster version of this function from the Samsung Fascinate (Galaxy S) that clears a range as it's supposed to:

ENTRY(v7_coherent_user_range)
dcache_line_size r2, r3
sub r3, r2, #1
bic r0, r0, r3
1: mcr p15, 0, r0, c7, c11, 1
dsb
mcr p15, 0, r0, c7, c5, 1
add r0, r0, r2
cmp r0, r1
blo 1b
mov r0, #0
mcr p15, 0, r0, c7, c5, 6
dsb
isb
mov pc, lr
ENDPROC(v7_coherent_kern_range)
ENDPROC(v7_coherent_user_range)
The Following 32 Users Say Thank You to zodttd For This Useful Post: [ View ]
27th December 2010, 04:32 PM   |  #2  
Senior Member
Thanks Meter: 15
 
299 posts
Join Date:Joined: Aug 2010
huh....please post a evo performance for dummies.

Sent from my PC36100 using XDA App
27th December 2010, 04:44 PM   |  #3  
Member
Thanks Meter: 7
 
83 posts
Join Date:Joined: Jul 2007
Quote:
Originally Posted by HTCRALEIGHFAN

huh....please post a evo performance for dummies.

Sent from my PC36100 using XDA App

This fix isn't for end users, this is for devs to put into their ROMs for you.
The Following 2 Users Say Thank You to lordofokra For This Useful Post: [ View ]
27th December 2010, 05:11 PM   |  #4  
whoamanwtf's Avatar
Senior Member
Thanks Meter: 88
 
624 posts
Join Date:Joined: Nov 2008
More
ZodTTD as a long time user of your various emus on various platforms I would just like to say thank you for your contributions. You have gone above and beyond here by hunting out this problem for us Evo users and posting it here even finding a solution!

You are awesome man keep up the good work, no matter how often you hear the thanks your work is appreciated!
The Following User Says Thank You to whoamanwtf For This Useful Post: [ View ]
27th December 2010, 05:17 PM   |  #5  
Unknownforce's Avatar
Recognized Developer
Thanks Meter: 4,289
 
2,033 posts
Join Date:Joined: Nov 2008
Donate to Me
More
Wow, good find! I'm curious to see how many "daily" apps use this function and that can be sped up by putting this patch in...
27th December 2010, 05:17 PM   |  #6  
OP Junior Member
Thanks Meter: 46
 
5 posts
Join Date:Joined: Jun 2010
Thanks!

To continue this discussion, I found out this kernel might be using a VIPT (Virtually Indexed Physically Tagged) instruction cache. Due to this usage, it requires the entire icache to be invalidated.

Can we disable VIPT icache on this kernel with some amount of ease?

Noted on ARM's site:

Example 7.1. A situation where software must be aware of the Instruction cache tagging strategy

Two processes, P1 and P2, share some code and have separate virtual mappings to the same region of instruction memory. P1 changes this region, for example as a result of a JIT, or some other self-modifying code operation. P2 must see the modified code.

As part of its self-modifying code operation, P1 must invalidate the changed locations from the instruction cache. If this invalidation is performed by MVA, and the instruction cache is a VIPT cache, then P2 might continue to see the old code. For more information, see the ARM Architecture Reference Manual.

In this situation, if the instruction cache is a VIPT cache, after the code modification the entire instruction cache must be invalidated to ensure P2 observes the new version of the code.
27th December 2010, 05:23 PM   |  #7  
OP Junior Member
Thanks Meter: 46
 
5 posts
Join Date:Joined: Jun 2010
Quote:
Originally Posted by Unknownforce

Wow, good find! I'm curious to see how many "daily" apps use this function and that can be sped up by putting this patch in...

Besides some emulators such as GameBoid and psx4droid... perhaps not many. Mono uses it, so games written with Unity 3D that use Mono could benefit. Basically any app that uses a JIT / dynarec will benefit. But those are limited by far.
27th December 2010, 05:44 PM   |  #8  
xlGmanlx's Avatar
Senior Member
Thanks Meter: 544
 
6,709 posts
Join Date:Joined: Jul 2010
More
thanks for the post, i took the liberty of posting it in the kernel forums with a link to your thread
The Following User Says Thank You to xlGmanlx For This Useful Post: [ View ]
27th December 2010, 05:53 PM   |  #9  
vin255764's Avatar
Senior Member
Flag Seattle,WA
Thanks Meter: 2,348
 
2,841 posts
Join Date:Joined: Dec 2008
Donate to Me
More
Gonna have to subscribe to this thread.Cant loose this precious
27th December 2010, 05:56 PM   |  #10  
cmart4's Avatar
Senior Member
Flag Prosper, TX
Thanks Meter: 138
 
1,291 posts
Join Date:Joined: Nov 2007
More
Thanks for posting this.... hopefully our awesome defs can whip this up into their next batches of goodness!

Post Reply Subscribe to Thread

Tags
cache, evo 4g, icache, kernel, mytouch 4g
Previous Thread Next Thread
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes