A good OMAP 3640 vs snapdragon vs humming bird article

Search This thread

jul644

Senior Member
Jul 22, 2009
955
200
London Ontario
www.lmgtfy.com
CPU performance from the new TI OMAP 3640 (yes, they’re wrong again, its 3640 for the 1 GHz SoC, 3630 is the 720 MHz one) is surprisingly good on Quadrant, the benchmarking tool that Taylor is using. In fact, as you can see from the Shadow benchmarks in the first article, it is shown outperforming the Galaxy S, which initially led me to believe that it was running Android 2.2 (which you may know can easily triple CPU performance). However, I’ve been assured that this is not the case, and the 3rd article seems to indicate as such, given that those benchmarks were obtained using a Droid 2 running 2.1.
Now, the OMAP 3600 series is simply a 45 nm version of the 3400 series we see in the original Droid, upclocked accordingly due to the reduced heat and improved efficiency of the smaller feature size.
If you need convincing, see TI’s own documentation: http://focus.ti.com/pdfs/wtbu/omap3_pb_swpt024b.pdf
So essentially the OMAP 3640 is the same CPU as what is contained in the original Droid but clocked up to 1 GHz. Why then is it benchmarking nearly twice as fast clock-for-clock (resulting in a nearly 4x improvement), even when still running 2.1? My guess is that the answer lies in memory bandwidth, and that evidence exists within some of the results from the graphics benchmarks.
We can see from the 3rd article that the Droid 2’s GPU performs almost twice as fast as the one in the original Droid. We know that the GPU in both devices are the same model, a PowerVR SGX 530, except that the Droid 2’s SGX 530 is, as is the rest of the SoC, on the 45 nm feature size. This means that it can be clocked considerably faster. It would be easy to assume that this is reason for the doubled performance, but that’s not necessarily the case. The original Droid’s SGX 530 runs at 110 MHz, substantially less than its standard clock speed of 200 MHz. This downclocking is likely due to the memory bandwidth limitations I discussed in my Hummingbird vs Snapdragon article, where the Droid original was running LPDDR1 memory at a fairly low bandwidth that didn’t allow for the GPU to function at stock speed. If those limitations were removed by adding LPDDR2 memory, the GPU could then be upclocked again (likely to around 200 MHz) to draw even with the new memory bandwidth limit, which is probably just about twice what it was with LPDDR1.
So what does this have to do with CPU performance? Well, it’s possible that the CPU was also being limited by LPDDR1 memory, and that the 65 nm Snapdragons that are also tied down to LPDDR1 memory share the same problem. The faster LPDDR2 memory could allow for much faster performance.
Lastly, since we know from the second article at the top that the Galaxy S performs so well with its GPU, why is it lacking in CPU performance, only barely edging past the 1 GHz Snapdragon?
It could be that the answer lies in the secret that Samsung is using to achieve those ridiculously fast GPU speeds. Even with LPDDR2 memory, I can’t see any way that the GPU could achieve 90 Mtps; the required memory bandwidth is too high. One possibility is the addition of a dedicated high-speed GPU memory cache, allowing the GPU access to memory tailored to handle its high-bandwidth needs. With this solution to memory bandwidth issues, Samsung may have decided that higher speed memory was unnecessary, and stuck with a slower solution that remains limited in the same manner as the current-gen Snapdragon.
Lets recap: TI probably dealt with the limitations to its GPU by dropping in higher speed system RAM, thus boosting overall system bandwidth to nearly double GPU and CPU performance together.
Samsung may have dealt with limitations to the GPU by adding dedicated video memory that boosted GPU performance several times, but leaving CPU performance unaffected.
This, I think, is the best explanation to what I’ve seen so far. It’s very possible that I’m entirely wrong and something else is at play here, but that’s what I’ve got.
CPU Performance

Before I go into details on the Cortex-A8, Snapdragon, Hummingbird, and Cortex-A9, I should probably briefly explain how some ARM SoC manufacturers take different paths when developing their own products. ARM is the company that owns licenses for the technology behind all of these SoCs. They offer manufacturers a license to an ARM instruction set that a processor can use, and they also offer a license to a specific CPU architecture.
Most manufacturers will purchase the CPU architecture license, design a SoC around it, and modify it to fit their own needs or goals. T.I. and Samsung are examples of these; the S5PC100 (in the iPhone 3GS) as well as the OMAP3430 (in the Droid) and even the Hummingbird S5PC110 in the Samsung Galaxy S are all SoCs with Cortex-A8 cores that have been tweaked (or “hardened”) for performance gains to be competitive in one way or another. Companies like Qualcomm however will build their own custom processor architecture around a license to an instruction set that they’ve chosen to purchase from ARM. This is what the Snapdragon’s Scorpion processor is, a completely custom implementation that shares some similarities with Cortex-A8 and uses the same ARMv7 instruction set, but breaks away from some of the limitations that the Cortex-A8 may impose.
Qualcomm’s approach is significantly more costly and time consuming, but has the potential to create a processor that outperforms the competition. Through its own custom architecture configuration, (which Qualcomm understandably does not go into much detail regarding), the Scorpion CPU inside the Snapdragon SoC gains an approximate 5% improvement in instructions per clock cycle over an ARM Cortex-A8. Qualcomm appeals to manufacturers as well by integrating features such as GPS and cell network support into the SoC to reduce the need of a cell phone manufacturer having to add additional hardware onto the phone. This allows for a more compact phone design, or room for additional features, which is always an attractive option. Upcoming Snapdragon SoCs such as the QSD8672 will allow for dual-core processors (not supported by Cortex-A8 architecture) to boost processing power as well as providing further ability to scale performance appropriately to meet power needs. Qualcomm claims that we’ll see these chips in the latter half of 2010, and rumor has it that we’ll begin seeing them show up first in Windows Mobile 7 Series phones in the Fall. Before then, we may see a 45 nm version of the QSD8650 dubbed “QSD8650A” released in the Summer, running at 1.3 GHz.
You might think that the Hummingbird doesn’t stand a chance against Qualcomm’s custom-built monster, but Samsung isn’t prepared to throw in the towel. In response to Snapdragon, they hired Intrinsity, a semiconductor company specializing in tweaking processor logic design, to customize the Cortex-A8 in the Hummingbird to perform certain binary functions using significantly less instructions than normal. Samsung estimates that 20% of the Hummingbird’s functions are affected, and of those, on average 25-50% less instructions are needed to complete each task. Overall, the processor can perform tasks 5-10% more quickly while handling the same 2 instructions per clock cycle as an unmodified ARM Cortex-A8 processor, and Samsung states it outperforms all other processors on the market (a statement seemingly aimed at Qualcomm). Many speculate that it’s likely that the S5PC110 CPU in the Hummingbird will be in the iPhone HD, and that its sister chip, the S5PV210, is inside the Apple A4 that powers the iPad. (UPDATE: Indications are that the model # of the SoC in the Apple iPad’s A4 is “S5L8930”, a Samsung part # that is very likely closely related to the S5PV210 and Hummingbird. I report and speculate upon this here.)
Lastly, we really should touch upon Cortex-A9. It is ARM’s next-generation processor architecture that continues to work on top of the tried-and-true ARMv7 instruction set. Cortex-A9 stresses production on the 45 nm scale as well as supporting multiple processing cores for processing power and efficiency. Changes in core architecture also allow a 25% improvement in instructions that can be handled per clock cycle, meaning a 1 GHz Cortex-A9 will perform considerably quicker than a 1 GHz Cortex-A8 (or even Snapdragon) equivalent. Other architecture improvements such as support for out-of-order instruction handling (which, it should be pointed out, the Snapdragon partially supports) will allow the processor to have significant gains in performance per clock cycle by allowing the processor to prioritize calculations based upon the availability of data. T.I. has predicted its Cortex-A9 OMAP4440 to hit the market in late 2010 or early 2011, and promises us that their OMAP4 series will offer dramatic improvements over any Cortex-A8-based designs available today.
GPU performance

There are a couple problems with comparing GPU performance that some recent popular articles have neglected to address. (Yes, that’s you, AndroidAndMe.com, and I won’t even go into a rant about bad data). The drivers running the GPU, the OS platform it’s running on, memory bandwidth limitations as well as the software itself can all play into how well a GPU runs on a device. In short: you could take identical GPUs, place them in different phones, clock them at the same speeds, and see significantly different performance between them.
For example, let’s take a look at the iPhone 3GS. It’s commonly rumored to contain a PowerVR SGX 535, which is capable of processing 28 million triangles per second (Mt/s). There’s a driver file on the phone that contains “SGX535” in the filename, but that shouldn’t be taken as proof as to what it actually contains. In fact, GLBenchmark.com shows the iPhone 3GS putting out approximately 7 Mt/s in its graphics benchmarks. This initially led me to believe that the iPhone 3GS actually contained a PowerVR SGX 520 @ 200 MHz (which incidentally can output 7 Mt/s) or alternatively a PowerVR SGX 530 @ 100 MHz because the SGX 530 has 2 rendering pipelines instead of the 1 in the SGX 520, and tends to perform about twice as well. Now, interestingly enough, Samsung S5PC100 documentation shows the 3D engine as being able to put out 10 Mt/s, which seemed to support my theory that the device does not contain an SGX 535.
However, the GPU model and clock speed aren’t the only limiting factors when it comes to GPU performance. The SGX 535 for example can only put out its 28 Mt/s when used in conjunction with a device that supports the full 4.2 GB per second of memory bandwidth it needs to operate at this speed. Assume that the iPhone 3GS uses single-channel LPDDR1 memory operating at 200 MHz on a 32-bit bus (which is fairly likely). This allows for 1.6 GB/s of memory bandwidth, which is approximately 38% of what the SGX 535 needs to operate at its peak speed. Interestingly enough, 38% of 28 Mt/s equals just over 10 Mt/s… supporting Samsung’s claim (with real-world performance at 7 Mt/s being quite reasonable). While it still isn’t proof that the iPhone 3GS uses an SGX 535, it does demonstrate just how limiting single-channel memory (particularly slower memory like LPDDR1) can be and shows that the GPU in the iPhone 3GS is likely a powerful device that cannot be used to its full potential. The GPU in the Droid likely has the same memory bandwidth issues, and the SGX 530 in the OMAP3430 appears to be down-clocked to stay within those limitations.
But let’s move on to what’s really important; the graphics processing power of the Hummingbird in the Samsung Galaxy S versus the Snapdragon in the EVO 4G. It’s quickly apparent that Samsung is claiming performance approximately 4x greater than the 22 Mt/s the Snapdragon QSD8650’s can manage. It’s been rumored that the Hummingbird contains a PowerVR SGX 540, but at 200 MHz the SGX 540 puts out 28 Mt/s, approximately 1/3 of the 90 Mt/s that Samsung is claiming. Either Samsung has decided to clock an SGX 540 at 600 MHz, which seems rather high given reports that the chip is capable of speeds of “400 MHz+” or they’ve chosen to include a multi-core PowerVR SGX XT solution. Essentially this would allow 3 PowerVR cores (or 2 up-clocked ones) to hit the 90 Mt/s mark without having to push the GPU past 400 MHz.
Unfortunately however, this brings us right back to the memory bandwidth limitation argument again, because while the Hummingbird likely uses LPDDR2 memory, it still only appears to have single-channel memory controller support (capping memory bandwidth off at 4.2 GB/s), and the question is raised as to how the PowerVR GPU obtains the large amount of memory bandwidth it needs to draw and texture polygons at those high speeds. If the PowerVR SGX 540 (which, like the SGX 535 performs at 28 Mt/s at 200 MHz) requires 4.2 GB/s of memory bandwidth, drawing 90 Mt/s would require over 12.6 GB/s of memory bandwidth, 3 times what is available. Samsung may be citing purely theoretical numbers or using another solution such as possibly increasing GPU cache sizes. This would allow for higher peak speeds, but it’s questionable if it could achieve sustainable 90 Mt/s performance.
Qualcomm differentiates itself from most of the competition (once again) by using its own graphics processing solution. The company bought AMD’s Imageon mobile-graphics division in 2008, and used AMD’s Imageon Z430 (now rebranded Adreno 200) to power the graphics in the 65 nm Snapdragons. The 45 nm QSD8650A will include an Adreno 205, which will provide some performance enhancements to 2D graphics processing as well as hardware support for Adobe Flash. It is speculated that the dual-core Snapdragons will utilize the significantly more powerful Imageon Z460 (or Adreno 220), which apparently rivals the graphics processing performance of high-end mobile gaming systems such as the Sony PlayStation Portable. Qualcomm is claiming nearly the same performance (80 Mt/s) as the Samsung Hummingbird in its upcoming 45 nm dual-core QSD8672, and while LPDDR2 support and a dual-channel memory controller are likely, it seems pretty apparent that, like Samsung, something else must be at play for them to achieve those claims.
While Samsung and Qualcomm tend to stay relatively quiet about how they achieve their graphics performance, T.I. has come out and specifically stated that its upcoming OMAP4440 SoC supports both LPDDR2 and a dual-channel memory controller paired with a PowerVR SGX 540 chip to provide “up to 2x” the performance of its OMAP3 line. This is a reasonable claim assuming the SGX 540 is clocked to 400 MHz and requires a bandwidth of 8.5 GB/s which can be achieved using LPDDR2 at 533 MHz in conjunction with the dual-channel controller. This comparatively docile graphics performance may be due to T.I’s rather straightforward approach to the ARM Cortex-A9 configuration.
Power Efficiency

Moving onward, it’s also easily noticeable that the next generation chipsets on the 45 nm scale are going to be a significant improvement in terms of performance and power efficiency. The Hummingbird in the Samsung Galaxy S demonstrates this potential, but unfortunately we still lack the power consumption numbers we really need to understand how well it stacks up against the 65 nm Snapdragon in the EVO 4G. It can be safely assumed that the Galaxy S will have overall better battery life than the EVO 4G given the lower power requirements of the 45 nm chip, the more power-efficient Super AMOLED display, as well as the fact that both phones sport equal-capacity 1500mA batteries. However it should be noted that the upcoming 45 nm dual-core Snapdragon is claimed to be coming with a 30% decrease in power needs, which would allow the 1.5 GHz SoC to run at nearly the same power draw of the current 1 GHz Snapdragon. Cortex-A9 also boasts numerous improvements in efficiency, claiming power consumption numbers nearly half that of the Cortex-A8, as well as the ability to use multiple-core technology to scale processing power in accordance with energy limitations.
While it’s almost universally agreed that power efficiency is a priority for these processors, many criticize the amount of processing power these new chips are bringing to mobile devices, and ask why so much performance is necessary. Whether or not mobile applications actually need this much power is not really the concern however; improved processing and graphics performance with little to no additional increase in energy needs will allow future phones to actually be much more efficient in terms of power. This is because ultimately, power efficiency relies in a big part on the ability of the hardware in the phone to complete a task quickly and return to an idle state where it consumes very little power. This “burst” processing, while consuming fairly high amounts of power for very short periods of time, tends to be more economical than prolonged, slower processing. So as long as ARM chipset manufacturers can continue to crank up the performance while keeping power requirements low, there’s nothing but gains to be had.
http://alienbabeltech.com/main/?p=19309
http://alienbabeltech.com/main/?p=17125

its a good read for noobs like me, also read the comments as there is lots of constructive criticism [that actually adds to the information in the article]
 
Kind of wild to come across people quoting me when I'm just Googling the web for more info. :p

I'd just like to point out that I was probably wrong on the entire first part about the 3640. I can't post links yet, but Google "Android phones benchmarked; it's official, the Galaxy S is the fastest." for my blog article on why.

And the reason I'm out here poking around for more information is because AnandTech.com (well known for their accurate and detailed articles) just repeatedly described the SoC in the Droid X as a OMAP 3630 instead of the 3640.
EDIT - I've just found a blog on TI's website that calls it a 3630. I guess that's that! I need to find a TI engineer to make friends with for some inside info. ;)

Anyhow, thanks for linking my work! :)
 
Last edited:

SpeeDemon

Senior Member
Dec 13, 2009
97
2
Abbotsford
Make no mistake, OMAP 3xxx series get left in the dust by the Hummingbird.

Also, I wouldn't really say that Samsung hired Intrinsity to make the CPU - they worked together. Intrinsity is owned by Apple, the Hummingbird is the same core as the A4, but with a faster graphics processor - the PowerVR SGX 540.

There was a bug in the Galaxy S unit they tested, which was later confirmed in the authors own comments later on.