I’ve had quite a bit of off-hours work, but not a lot of off-hours time in which to do it (mostly having to do with house issues, apartment dwellers should forgive me for being jealous of them a fair portion of the time). The end result of this is cramming several days worth of work into a window of a few hours.
Last night I had to patch several servers via Windows Update, upgrade the memory in one server (which required re-cabling due to a half installed cable arm), updated the MS DPM agent on two other servers, updated the firmware on our Barracuda spam firewall (which was the second Barracuda update in a row that created more problems than it solved), and replaced the batteries in our battery backup unit (which itself required carefully shutting down several different servers and processes).
After everything came back up I caught a non-production virtual machine that wasn’t starting (which will be a story for a different post), the Citrix servers were running slow, and I was having issues getting a database process to start correctly. After wrestling with the host of issues for an hour and resolving them for the most part I took off while I was ahead, or so I thought.
In my rush to wrap up I forgot my cardinal rule when touching anything to do with e-mail: test with an outbound and reply it back. That night my diagnosis of Exchange consisted of making sure Outlook wasn’t popping an error up in the tray before I got bogged down on the other issues. To make matters worse a user e-mailed me to let me know it wasn’t working, but unfortunately he e-mailed me at the time when the server was down due to the battery replacement, so I thought nothing more of it and told him that it should be working (while I only testing the OWA splash page). I admit that I also improperly relied on my Windows Mobile phone for testing, an unreliable device even when everything is working properly.
In the end the error was caused by the Exchange 2003 server booting up before (probably by seconds) any of the domain controllers and as a result most of the Exchange services did not start. It’s worth noting that my near-production Exchange 2007 server did not experience this fault. Long term I should have a more reliable test mechanism (this happened before after an extended power outage), but most of all I just need to remember to perform my diagnostic procedures before attempting to fix the first issue that grabs my time.