Never Again

Virtual Panic

On one sunny summer day, this IT manager's network turned slow and lazy -- to his consternation.

I still vividly remember the day. I came into work in a good mood on a sunny summer morning in Vancouver, and was getting ready to do a regular check of the firewall log. As an IT Manager at the University of British Columbia, I managed a network of 400 nodes and supported applications on a variety of platforms. These ranged from large HP/Compaq and IBM enterprise servers at the top end, to Windows Server 2003, Novell Netware 6.X, Unix and Citrix servers in the mid-range, all the way down to Windows and Macintosh desktops at the client.

That sunny summer day started going badly when my assistant reported that she had received more than 20 calls from users at different departments (including Payroll). Users complained that they either could not log in to the Novell server or their Microsoft Outlook e-mails were extremely slow. My assistant mentioned that she had tried to reset (delete and recreate) the Outlook profiles of a few users, but this restored normal operation for just a few minutes.

I checked both the Novell server and Exchange 2003 server -- everything was functioning properly. A review of the server log and multiple virus scans produced nothing. I turned my attention to the network traffic monitoring software, which showed that the network was unexpectedly busy.

"What could be causing it?" I wondered desperately, as I stared at the switches in the machine room. The phone calls were piling up and the situation was getting worse with pay day the next day and the payroll systems still down. I tried to hide my growing frustration as I patiently explained to managers that we were working hard on the problem.

I was approaching the point of outright panic when suddenly I remembered there had been a power outage the day before.

Shell shocked

Our network employed a gigabit backbone and high-speed switched Ethernet connections at both the core and the edge. Ethernet switches located in floor distribution wiring closets divide the network into 10 geographic sections. The result is a tree structure starting from the switch and expanding to every wiring floor closet and eventually workstation.

The network, however, had been implemented with virtual local area network (VLAN) technology, to provide flexibility. By layering a logical network structure atop the physical network, client computers could participate in a departmental subnet regardless of physical location. Just as important, the virtualized structure compartmentalized traffic, preventing congestion.

Remembering the power outage, I quickly went through the settings for each routing switch. I soon discovered that a Cisco 2900 routing switch had ceased retaining its VLAN settings since the power failed. As a result, three VLANs had collapsed into a single default VLAN, and the unmanaged traffic was choking the network.

Once I discovered the problem, it took me 20 minutes to reconfigure the switch and restore the network to normal operation.

It was a difficult yet challenging day, no doubt. From this "Never Again" experience, I learned that problems can often arise from forgotten events, and that the solutions we employ to boost productivity can fail in ways that destroy productivity.

About the Author

Hong-Lok Li, MCSE 2003, MCSA, MCDBA, MCSD, is an information technology manager at the University of British Columbia, in Vancouver, Canada.

comments powered by Disqus

Reader Comments:

Tue, Aug 22, 2006 Leonid Nilva Denver, CO

That was good troubleshooting. This could have been caused by many different things. I would probably be playing with packet sniffers by now.

Leo

Fri, Aug 18, 2006 Joel Havenridge Omaha, NE

We had a similar issue with 3Com switches (Baseline 2226 Plus' and another, larger one I don't have the model handy ) not holding their configurations after power going out. They didn't hold anything: IP addresses, VLANs, name, location. The backed up configuration didn't restore the VLAN info or much of anything else. The latest firmware revision readme mentioned the backup option being fixed, but we still had issues after installing this version. After calling 3Com and explaining what was going on they RMA'd all 5 switches with no problems. Apparently they got a bad lot of EEPROMs that failed to hold the configuration after loss of power, he explained to me. It seems hard to believe something like that would make it through QA/QC either at the supplier or at 3Com, but in the end after several frustrating rebuilds of VLAN configurations we were back on a solid footing.

Add Your Comment Now:

Your Name:(optional)
Your Email:(optional)
Your Location:(optional)
Comment:
Please type the letters/numbers you see above

Redmond Tech Watch

Sign up for our newsletter.

I agree to this site's Privacy Policy.