In-Depth

Hard Drive Fall Down, Go Boom!

Further investigation of a hard drive failure revealed one tiny but important tip that now lets one company's e-mail server run problem-free.

In February 2001, we brought our e-mail hosting in-house using Exchange 2000. We installed the application on an IBM server with dual Pentium III 800 MHZ processors, 1GB RAM and four 10,000 RPM 36GB hard drives in a RAID 5 configuration. This server was more powerful than we needed, but we wanted room for growth.

Two months later we started having some problems with our new e-mail server. For no apparent reason it would stop responding: The monitor would go black, and no combination of keystrokes would bring it back. The only way we could bring the server back was to do a hard reboot. The first time the server stopped responding, we chalked it up to a Windows 2000 glitch. However, when the server started crashing on a regular basis, the level of concern increased exponentially.

At the time we had just 125 e-mail users, so the hardware was more than sufficient to handle the traffic. That wasn’t the problem. The server wasn’t going into sleep mode, so that wasn’t the cause either. The event logs were clear (we were logging not just Windows events but also Exchange events). As far as monitoring performance, it appeared that all commonly used counters were well within acceptable ranges.

Also in this issue:

 Get Active Directory Replication Right!
by Andrew Lindley

 Exchange 2000 Upgrade, Times Two
by Cynthia Balusek

 Wireless Meets Mother Nature
by Justin Melot

 The Expiration Date That Did Us In
by Jeremy Dillinger

 Troubleshooting Under Pressure
by James D. Pollock

(Back to introduction.)

One Saturday in April our e-mail world collapsed. I got a call from the senior IT director at around 10 a.m. He was attempting to use Outlook Web Access from home and it wouldn’t respond. He decided to go into the office and check out the server. He noticed the e-mail server wasn’t responding, so he did a hard reboot. During the boot process, he received a horrible message: Inaccessible Boot Device.

I ran over to the office. We tried another hard reboot, with no luck. I immediately got on the phone with IBM support. Since the drives were in a RAID 5 configuration, we should have been able to get the server back up. We were able to determine which of the hard drives was the problem. However, the IBM technician determined that the parity stripe had become corrupt. Thus, the only thing we could do was replace the drive, reinstall the OS, reinstall Exchange and restore from backups. Since we had 24x7x4-hour support, a new hard drive was in my hands in four hours. By about 4 a.m. Sunday morning the server was back up and all key employees were notified by voice mail of the problem and told they might be missing some mail.

The server was back up, but we still had no explanation as to why the crash occurred. We needed an answer and needed it fast, in case the problem occurred again. We felt it was absolutely a hardware issue, so we continued to work with IBM support. Finally, an extremely bright IBM technician made a discovery. Evidently, a batch of hard drives was sent out with bad microcodes. We downloaded a tool from IBM to examine the microcodes on our hard drives and the three “old” hard drives in the e-mail server had bad microcodes (the new hard drive was fine). We updated the microcodes on these three hard drives and our e-mail server has been continuously running now for over a year without any problems.

About the Author

Christopher M. Roscoe, MCSA, CIW, is the senior network administrator at National Packaging Solutions Group, a manufacturer of corrugated boxes.

Featured

  • Microsoft Releases Windows 10 Version 1909

    Microsoft on Tuesday announced the release of Windows 10 version 1909, a new operating system product that's also known as the "Windows 10 November 2019 Update."

  • November Microsoft Security Bundle Addresses 75 Vulnerabilities

    Of that number, 13 vulnerabilities are rated "Critical" to patch, while 62 vulnerabilities are deemed "Important."

  • The Future of Office 365 Pricing

    With a raft of new Office 365 features in the pipeline, Microsoft also seems ready to change the way it bills its subscribers. Will it replicate Azure's pay-per-use model, or will it look like something else entirely?

  • Microsoft Offers 1 Year of Free Windows 7 Extended Security Updates to E5 Licensees

    Microsoft is offering one year of free support under its Extended Security Updates program to Windows 7 users if their organizations have E5 licensing.

comments powered by Disqus

Office 365 Watch

Sign up for our newsletter.

Terms and Privacy Policy consent

I agree to this site's Privacy Policy.