One of the hottest trends in enterprise computing right now is virtualization. As I'm sure you can imagine, it...
takes a fairly beefy server to handle the task of hosting multiple virtual servers. Therefore, most of the machines I've been using as host servers have multiple CPUs and internal, high-performance disk arrays. One of the side effects to this I hadn't counted on, though, is that cramming all of these components into a case generates a whole lot of heat.
To help cope with the problem of excessive heat, I have invested in special cases that are specifically designed for high-end systems. Although these cases work well, I wondered if they were really dissipating heat in an effective manner. After all, excessive heat dramatically shortens the lifespan of many computer components.
In order to find out if my servers were running too hot, I turned to thermography. In case you are unfamiliar with the term, thermography involves using a special type of camera to render a picture based on heat. Unlike a normal camera, a thermal imaging camera does not photograph visible light. In fact, the image produced by a thermal imaging camera will be exactly the same whether the lights are turned on or off.
To give you an idea of what I'm talking about, take a look at Figure A. In this figure you can see a handprint. To get this image, I simply held my hand against the wall for a few seconds, and then removed it. Although my hand was gone, my body heat remained on the wall, and I was able to photograph the heat signature.
Keep in mind that this is not a true photograph, as a thermal imaging camera does not photograph heat. Instead, it works on a principle similar to that of an infrared, non-contact thermometer. The basic idea is that every object gives off infrared energy. An infrared thermometer passively collects that energy and then mathematically converts what was collected into a temperature.
A thermal imaging camera takes this principle a step further. Instead of having one temperature sensor, it has an array of sensors. In my camera, the sensor array consists of 320 x 200 sensors. Each sensor measures the temperature of a specific point on the target. When the values have all been calculated, the camera's operating system assigns the color white to the highest temperature recorded and black to the lowest temperature. The various shades of gray are assigned to the temperatures in between, relative to the hottest and coldest recorded temperatures.
Now that we've seen how thermal imaging cameras work, let's turn our attention to the task at hand. In Figure B, I used a secondary lens on the thermal imaging camera to take a visible light image (not a thermal image) of one of my servers. I am including this image for reference because without a visible image to compare them against, thermal images can be tough to decipher.
I then switched the camera to thermal mode and took another picture. You can see the results in Figure C.
In this image, lighter colored areas are hotter, while darker colored areas are cooler. In the image, you can see a rectangular shaped area that is pretty hot, and another hot area above it that is round. If you compare this image to Figure B, you can see that these hot spots are vents in the computer case. I thought that this was fairly interesting since the breeze coming from these vents feels cool to the touch.
There are some other areas of the case that appear to be fairly hot too, but this is an illusion. Like light, heat can be reflected. The other hot spots are heat reflections from other things in the room.
The problem with the thermal image above is that it shows the temperature of the computer's case, not the temperature of the components inside the computer. To get a better feel for the temperature inside, I removed the side panel, and snapped the image shown in Figure D.
I actually expected the hottest thing in the case to be the CPU, but it isn't. In fact, if you look at the image, you can see that the CPU fan isn't even spinning at that moment. There are some really hot areas inside the case though.
Unfortunately, you can't tell by looking at the image exactly which components are running hot. The thermal imaging camera will show you hot areas, but it won't show you the same details that you would get from a visible light image. To figure out where the hot spots were, I printed out the image and compared it to my server. The "dots" of heat you see in the image are my server's video card. The long vertical line on the right side of the image that is giving off so much heat is the server's memory.
Sometimes when you are working with a thermal imaging camera, you can get a better feel for the temperatures being recorded by reversing the camera's polarity. In this case, reversing the polarity means making the hot spots black and the cool spots white. Figure E shows a reverse polarity image.
In this image, the hot spots are very clearly defined in black. I took the photo at a slightly different angle than the previous image, and it includes the hard drives, shown on the lower right portion of the figure. Notice that the bottom drive is hot, but the other drives are relatively cool.
On this particular system, the bottom drive contains the Windows operating system, and the other drives contain virtual hard drives. Obviously, the main operating system drive is working harder than the other drives.
A reality check
As you can see, a thermal imaging camera allows you to easily see whether or not your servers are overheating. The problem is that these cameras tend to be really expensive. I looked around on the Internet and the cheapest thermal imaging camera I could find was about $4,000, but prices went up to about $37,000 for cameras with higher resolution, color and built-in temperature reporting.
If you don't have thousands of dollars to spend on a thermal imaging camera, or don't have the time or the electronics background to build one, there are other alternatives. Infrared thermometers work on the same basic principle as thermal imaging cameras, but measure the surface temperature at one specific point rather than at thousands of points like a thermal camera does. You can get an infrared thermometer at any hardware store for about $50.
Since the thermal image identified my server's memory as its hottest component, I decided to use an infrared thermometer to see just how hot the memory really was. In Figure F, you can see that the memory is running at about 105 degrees. By way of comparison, the ambient temperature inside the case was about 83 degrees. In case you are wondering, the laser spot in the figure is just the thermometer's way of helping you identify what the thermometer is pointed at.
What if the CPU had been too hot?
In this particular case, my computer really wasn't running very hot (only about 93 degrees). If it had been overheating, however, there are a few ways of dealing with it. One option is to check the server's BIOS for temperature-related settings. Most newer machines monitor the CPU and ambient temperature inside the case. On the newer machines, you can enable a BIOS option that shuts the server down if the temperature becomes dangerously high. Nobody wants to have their server shut down on them, but it beats risking damage.
Depending on the case design, you may also be able to add additional fans or larger fans to the server. If possible, you should try to position the fans close to the hot spots.
One last option is to invest in liquid cooled cases. Liquid cooled cases have dropped dramatically in price in recent years, and it's now possible to get one for a few hundred dollars.
ABOUT THE AUTHOR
Brien M. Posey, MCSE, has received Microsoft's Most Valuable Professional Award four times for his work with Windows Server, IIS and Exchange Server. He has served as CIO for a nationwide chain of hospitals and healthcare facilities, and was once a network administrator for Fort Knox. You can visit his personal Web site at www.brienposey.com.