As of June 14. 2011 it appears the only viable way to solve this problem is to reheat the cpu with specialized equipment. I am currently testing if this can be done in a standard gas/electric oven (NOT MICROWAVE !!). If successful this method will solve the problem and the following guide (cpu cooling system) may help prevent it from happening again. Only building this cooling system is not enough and will not solve your problems.
Hello dear members of XDA
First of all please excuse my english, I will try to explain myself as well as I can. It will be a long post, it could be boring, it could be scary or whatever you like but bear with me on this one.
So.. got myself a HTC (T-Mobile) HD2. A bit late in the game, but hell no, still a good phone. I've dreamed of having one but couldn't afford. Anyway finally i got one. A second hand broken one, damn it.
As i found out the problem is pretty common: damn thing restart itself - thermally related - the old CPU overheat problem. By searching the net I found out that it's pretty common with some HTC models. HD2 has it, Desire has it, Nexus One has it, hell even some xperia models have it.. about half of the devices powered by anything from the Snapdragon series could have it.
The problem could be easely described as : phone hanging, restarts, the dreaded 7-8 short vibrate sequence - phone locked etc.
Mine was worst then i've seen on the forums or with other people. It locked itself for just about every reason i could get. Taking pictures, browsing the menu, using gps, the browser, 3g or wifi, watching a movie ... all concluded with restarts or lock-ups after some couple of minutes. I've found out that keeping the phone at 4-5 degrees celsius would solve my problems in most cases, but anything above 10-15 degrees would make the thing go crazy.
Well, I'm pasionate about electronics, development in this area, trying to solve problems and things like that. Also experienced in heat and semiconductor related problems. I also had one macbook air that suffered from core shutdown because of overheating (also a well known problem for MBA rev 1.0) and managed to design an alternate cooling system that solved the problem. So i gave it a shot, i know there are many users that have similar problems and altrough i don't suggest them explicitly to make this hacks to their phones.. this is one way to solve the problem if you buy your unit second hand or don't have some form of warranty.
So here we go.
Big fat warning!!! Don't attempt these things with your phones unless you are familiar with the concepts or the tools involved in the process. Also, there is a real risk to permanently damage your phone. Not just real.. but big if you get something wrong.
First step is to run some simple tests to determine the cause of the problem or the range it extends to.
So, I used a multimeter with a K type thermal probe to measure the temperature of various components of the phone during intensive use.
this is the back of the mainboard of my HD2. If you notice, HTC placed a blue-ish thermal pad over one metallic shield covering the back components. I don't know what's the purpose as the back casing in that area is made of plastic - no heat dissipation, or a bad one. Anyway that's a good place to place my probe. Some tape holded the probe in position. Because we don't have perfect mechanical contact between the probe tip and the casing or chips i expect +1 or +2 degrees celsius to be added to each measurement i will later describe.
i now placed the battery over the back of the phone and secured it with some other tape and some toothpicks
we're at 19.3 degrees. That's were we'll going to start from.
there's a usefull little app that allows users to overclock or stress test their phones cpu. Found it here on XDA, i'll use it for some heat making purposes.
as you can see.. we're already at 25.8 degrees, after 5 minutes of testing.. not to mention the actual heat making primary suspect - qualcomm chipset is on the other side. At 29.5 degrees at this point.. the phone locked itself. I reapeted the experiment 2 more times - got exactly the same result.. at least the readings were consistent.
Ok, i then removed the motherboard to take some readings from the actual CPU.
same procedure.. next readings. - at around 33.4 - 34.2 degrees (varies) on the CPU itself the phone will either restart or lock itself up. So you see how serious my problem is. Summer will come so I won't be able to use my phone...
Measures have to be taken.
Let's make a small introduction about heat related to semiconductors.
Well, simply put a conductor (semiconductors act the same way) generates some amount of heat when an electric current is passed along it. This is because of the fact that small electrons moving along the conductor (in a simple way that's the definition of any electric current) will ocasionaly collide with the atoms of the material their passing through. In the collision the electron loses some amount of energy. That energy is heat. Also, heat itself can be described at an atomic level as the intensification of natural ocuring brownian movement of atoms. If they move a lot, if they are more agitated they create more heat. If they are more agitated, they are more likely to be hit by passing electrons. So a hot conductor is more likely to get even hotter because of that. There is a point were the heat generated makes the conductor's atoms prone to more hits from passing electrons in kind of like a geometric progression. That's called thermal runaway. It will tend to destroy electronics by overheating, melting or burning themselves up.
Back to our phones now. The CPU produces heat. Because of the same effect described above. The heat in this case will either melt or break the small "balls" that comprise the BGA matrix on what these cips are mounted on. The small balls will either melt (extreme cases) or dilatate with increasing temperature. However it seems most of the new processors used by HTC are mounted in some epoxy resin that has both dilatation point and melting point higher then the flux and welding compound used to solder those cips. So the actual cip will tend to stay fixed in a particular position, unable to expand or contract with temperature variations, but the balls used in the BGA matrix underneath it will contract or expand with these variations. This could lead to a case when at least one of that balls (some couple hundreds in total) become "loose" or out of position, thus breaking the electrical contact it should have made. Therefore our problems. At fist large amounts of heat must be applied in order to actually break the bond between the cpu and board, but after that, once broken the tiny links are very sensible to temperature variations and they will expand or contract freely.
Most users notice that at it's core, the problem seemed related to overheating (in the begining) but after time it's effects are degenerative.. phones seem to restart with no apparent reason. It's still overheating, but things are starting to get more and more worse as the chip and it's connections become more sensitive to heat variations. Thus, even small variations now produce these problems - my CPU restarts at 34 degrees .. that sucks.
So, my only option was to try to reheat the cpu in the attempt to partially melt the broken "balls" in the bga matrix and hopefully.. i repeat HOPEFULLY they remake contact with the mainboard. A re-ball of this chip is not possible, as the resin placed around it by HTC doesn't melt at the normal temperature i could remove the chip itself, so heating it at even higher temperatures would risk killing the cpu long before the resin melts. Strange move by HTC to make things like this.
Anyway.. here goes nothing..
I've placed the usual aluminium foil designed to protect surrounding components by the heat generated by the rework station and the hot air used to heat up the CPU.
I preheated the CPU for about 10 minutes, from both sides of the board, then switched to heating it at 360 degrees. I applied even pressure above it after it was heated in order to tighten the space between it and the board, just a little bit. THIS IS VERY RISKY. Normally not recommended because of the risk damaging the BGA. In this case the resin would prevent me from moving the chip to much so it's less risky. Not safe.. but less risky.
I've let the board to cool on it's own for half an hour and repeated the temperature monitoring tests.
Now i had an increase of maximum temperature before a restart from 34 degrees on the cpu to about 42. It's not much but it's a start. However above these temperature.. the phone will still lock or restart.
I went for another round of reheating with the hot air station. After this, i've got slightly better results. Some 2-3 degrees more. My lucky break was when i suspected thermal runaway for the CPU. So i tried to make some sort of a heat sink for that chip using some mica foils for to220 can transistors, some thermal grease and a bunch of aluminum and copper foils. My theory was that heat dissipation will eventually accelerate faster above a specific level, a point from witch thermal runaway occur. In my case in the initial tests, even after the phone locked itself and i manually restarted (battery out - in) the temperature continued to increase even faster altrough the phone wasn't doing anything intensive.
The role of my "heat sink" would be to dissipate more heat rapidly and in some manner to press the cpu against the board.
After I placed the mica foils directly above the cpu with thermal grease above and beyond i mounted back the metal shield over that area. On it, i placed some more silicon paste and some thick copper foil (used in some broken laptops i have over here). It looks ugly but.. worth a shot:
after that i begin making the rest of the heat sink using aluminum foil. I folded about 12 layers, between each of them having placed... more thermal grease and at the 6-7 layer another round of mica crystal foil.
Here's the aluminum foil
I then pressed the foils very hard between two flat surfaces in order to remove the excess thermal grease.
I "anodized" the first layer (the one in contact with the cpu shielding) with some ferric chloride. Before that, the board looked like this:
After the logic board was mounted back, i remade all the connections and after some preliminary tests, mounted the phone back together. It now looks like this
I only have to re-attach the serial no. and imei, plastic sticker.
Of course i then run tests. I heated up the phone with a hair dryer to simulate a hot summer day. About 40 degrees, just to be sure. I then run cpu stress tests and a full divx movie (impossible in the past). On preliminary testing, i had indications that i avoided the thermal runaway the cpu now running stable at 24 degrees (19.3 in the room - ambient temperature). No more, heating up by itself to about 40 degrees then restart.
On the final testing, with the phone put together, i heated it with the hair dryer and achieved 40 degrees. I started it and run stress tests. No more lockups or restarts, not even a single one. However with the phone put together i can't measure inside temperature on it's components. As i feel it, it get's warmer, it heats up to some degree, but now it's spread all over it's surface. For some particular reason it doesn't restart anymore.
I then tried, cpu stress test, wlan connection, pc connection and browsing the net all at the same time. NO RESTART I watched a full 1.30 hour movie at max playable quality, the phone was really hot (43-44 degree at it's surface) but still no problems.
It appears that for the moment i saved the phone. However, future behavior is still to be determined.
I'll get back with more testing, in the following days and eventually i hope to devise a general method for building heat sinks for phones (yeaah, ridiculous....) using combinations of metal and thermal conductive cristals. The ideea is to find out if reheating the chip by hot air station can be avoided (this involves the most risk). But the start is promising. By the time warranties will expire and phones like the new droids or winmo 7's start to break from thermal problems, maybe i'll have some sort of a more user friendly solution.
EDIT JUNE 04.2011
since i have a dead hd2 motherboard here, i tried to remove the cpu to expose the BGA soldering. Just for fun, no chance of BGA reball, as there aren't any tools available for this particular chip. The resin prevents a proper removal, at about 450 degrees celsius it was still kind of hard, so i had to forcefully remove the chip and break some of the BGA. The chip is very thin, kind of like a micro sd card. It heats up pretty quick and fast, the solder points underneath it got melted in about 2-3 min at 370 degrees celsius.
Here's how it looks.
This is the motherboard without the chip. The BGA matrix is broken, some balls were simply ripped out when i forcefully removed the chip.
This is the actual chip compared with a mini sd and and standard sd card.
...and this is the underside of the chip. belive it or not, the chip is actually alive and it's pins are ok. It cannot be used because it cannot be properly soldered to a board. Guess i'm gonna punch a hole through it and use it at my key chain, along with a laptop cpu already there
In the following days i will experiment with the solder points&materials in order to try to produce a more safer method to reheat future boards with thermal problems. It seems this board died because of overheating and a short circuit made over the center of the array by 3 solder balls that got in contact once they were melted.