I spent most of today dealing with server emergencies. Last night we had severe thunderstorms pummeling through the Dallas-Fort Worth metro area with high winds, even few tornado alerts. No tornadoes were officially spotted in the city area, but winds and the lightning were strong enough to do some damage to the power grid. Servers were still working normally at night (I was up, watching the weather radar at 4am), but by the morning the dedicated servers I manage were unreachable. A quick call to CI Host’s tech support produced no help: a busy tone. Dialing repeatedly for the next half an hour didn’t make any difference, so there didn’t seem to be support available today. According to the recorded “current network status” at the company’s main phone number there were “no current network outages or other issues”. Yeah, right. Being only 20 minutes or so away from the facility I decided to go to investigate.
At the hosting company’s Bedford facility (“CDC-01”) chaos reigned supreme. All the doors were open, diesel generators were spewing fumes into the air (while being cooled by rigged water-hoses), and a mixture of technicians and concerned looking nerds were running around. Being one of the nerds, I joined in. There was no usual security, I strolled in to the lobby and chatted with one of the CI Host’s admins. Mains power was down as I had gathered from the diesel generators running outside of the building. Since I was there, I decided to take a look at the co-located servers on two different floors. Elevators were not working, of course, so it was up the stairs. Approaching the 2nd floor server room the temperature was increasing on every step â€” the generators were able to provide electricity for the servers, but not for the A/C!. Inside the room, the thermometer on the wall was displaying 90Â°F (32Â°C), but someone who had been there for several hours working on their server swore the thermometer was pegged to not go over the 90Â°F mark. My server’s internal temperature sensors were indicating 43Â°C for the case temperature.
After a few moments I decided to shut down the servers to prevent hardware damage.. the CPU temperatures were reasonable but the hard drives were running rather hot â€” normally the server room is some 30-40 degrees (C) cooler.
After shutting down the servers I was ready to leave, and picked up the phone to have someone to come to let me out. Line busy! Was I trapped in the sauna? No… I forgot there was no security today; all the doors were unlocked. So I decided to pay a visit to the third floor co-lo room where the A/C was supposed to be running and where another of the servers I manage is located. Once I made it there (through a staircase), I found just another hot room full of concerned nerds and their baking computers. I switched off the server there, too, and left.
According to the case temperature sensors the A/C started working again around 10:30 in the evening. I switched the servers back online through remote access.
With the dust settled, I’m starting to look for alternative co-lo facilities. While the power outage was not the fault of CI Host, their level (or lack of) disaster preparedness is disheartening. Firstly, it is very irresponsible to let the clients’ servers run in that kind of “torture test” environment â€” I think they should not provide electricity for the servers if there is no electricity for the A/C. This exact same thing happened few years back after a major storm, but early summer rather than in the spring, so the temperatures were even higher. Clearly there has been no improvement in the emergency power since that time.
The strongest contender at the moment is Colo4Dallas. I’m going to tour their facility in the next few days, and likely start planning a move there.