White-Coat Computer Science

Those who test products and changes before rolling them into production stand a higher chance of continued employment. Use that technical version of Darwin’s natural selection to your advantage.

Since the release of Windows NT Service Pack 4, the engineering team had been hard at work making sure the new product would be compatible with all the current hardware standards and mission-critical applications in use throughout the enterprise. After months of compatibility testing in the lab, the project was finally passed over to the software distribution team for global deployment. Microsoft Systems Management Server pushed the job out to the workstations with little problem; it was now time to focus on the servers. The site in Bangor, Maine was the first to deploy—and thus the first to witness the “blue screen” boot failures on some of the older Compaq servers.

Another Methodical Approach
For another perspective on a methodical approach to your work, read Lee Christopher Grant’s exclusively online article “Survive Chaos.

It never fails that efforts in the controlled safety of the lab often don’t yield the same results when we apply them to a production environment. Despite our best efforts to look at the task at hand from every angle, we tend to run into problems that cause us to be up all night racking our brains about where we diverged from the beaten path. When the problem is finally solved, we often find that the issue was caused by some incompatibility that was either well known (except to us), or we realize that the lab configuration didn’t accurately reflect our production environment.

Many of the horror stories we hear regarding production environment failures during deployments come not from lack of knowledge or skill, but because of some divergence from what was expected. Microsoft’s claim that Service Pack 4 (SP4) was a simple upgrade shouldn’t have freed you from having to test the product in your environment. You may have applied SP4 successfully to your desktop, but when you applied it to the file server hosting all of the executive’s home directories, you witnessed the blue screen of death. After standing in the data center scratching what remains of your hair for the balance of the night and trying everything under the sun short of voodoo, you receive the dreaded call. It’s the director of IT, asking, “Why can’t I access my home directory?” You don’t really want to tell him that you never tested it on this hardware platform, do you?

For those who don white lab coats for a living, existence is dependent not upon work done in the lab, but on the ability to repeat experiments successfully on demand. Successful scientists maintain pristine laboratories and document every step of every process they perform to assure that their results will be repeatable if success is attained. If a scientist believes she found the cure for cancer, wouldn’t it be a shame if the results were unrepeatable? Did she really find the cure if she can’t repeat the findings of the experiment?

When we explain to our peers, customers, and bosses that a procedure worked in the lab but doesn’t work as planned in the production environment, our credibility is put at stake. As technologists, we’re typically a financial liability to an organization, unless we work for a contracting firm whose business is to sell our services. We rarely make any money for the organization, but instead we must justify our existence within the enterprise for the value our work adds to existing business processes. We build solutions that enable business users to do their work more efficiently, allowing them to spend more time on the profit-generating business processes rather than on the tools needed for the job.

Avoiding TechnoDarwinism

If you prefer to fly by the seat of your pants rather than apply some basic scientific principles to your work, Darwin’s theory of natural selection will work against you within your organization. Quite simply, those who test products and changes before rolling them into production stand a higher chance of continued employment. On the converse side, those who choose to take their chances by failing to test a product before deploying it in a production environment quickly fall victim to Darwin’s theory of natural selection. These are the individuals often “selected” to leave the organization after failing to grasp the importance of applying scientific principles to their work.

In any well-devised deployment plan, there should always be time reserved for research and testing. But when things run late, lab time is usually the first item to get cut. Most project managers seem to think that the week of testing you entered on your deployment project plan is merely a code word to describe the extra time added to every project plan to accommodate our inability to accurately predict the unknown. Immediately he targets this seemingly bogus entry for deletion or reduction from the project plan.

Inevitably, once you move your project from the development domain into production, a host of unforeseen circumstances keeps you from seeing daylight for the next few days. This prevents the project from completing anywhere near the milestone set by the project manager, raising questions as to whether or not it was truly worth it to cut out that week of pre-production testing.

All too often, the work we do is so new or unique that we can’t accurately estimate the time we’ll need or the obstacles we’ll encounter along the way. Did the NASA scientists accurately estimate the time or money required to put the first man on the moon? The moon landing proved to be an event that NASA would repeat, and inevitably, the knowledge gained from the first mission would benefit the time and resource estimates for subsequent missions. Armed with a bit of knowledge learned from our own lab experiments, we too can begin to benefit from our previous experiences.

For systems administrators, there’s often little reason why we can’t practice in a non-critical environment to prepare ourselves for the pitfalls that may lie ahead in the upgrade. Not to say that every upgrade, migration, and deployment will go smoothly if we practice it once or twice in the lab—there will always be unforeseeable problems. But generally speaking, significant amounts of practice beforehand will yield a better success ratio for our efforts than if we just give it a try and see what transpires.

The time to research incompatibility issues, test changes to the environment, and devise disaster plans isn’t after the event occurs, but long before. If you work in an environment where you feel you should be donning a fire helmet most days, you’re already familiar with the dangers of avoiding a proactive approach to problem solving. Those who are constantly in a reactive state have no time to prepare technologies that will increase competitive advantages for the enterprise. Considering the increasing role of technology in today’s super-competitive market, even entire organizations can easily fall victim to the selective nature of TechnoDarwinism.

A Few Guidelines

To help ensure that efforts in the lab are indeed useful, consider the following guidelines.

Standardize the User Environment

Too many enterprises lack strict standards for the user environment. Instead, they let machines exist with varying directory structures, office automation suites, hardware platforms, and even operating systems. Because we’re generally financial liabilities to most organizations, we must find ways to reduce the cost of supporting machines in the environment to justify our continued existence. If each machine is different, there’s no way to benefit from the economies of scale that we’d enjoy in large enterprise environments. While a discussion on the importance of enterprise standards is well outside the scope of this article, organizations that lack a strict policy on hardware and software standards are destined to drive IT support costs significantly higher than truly necessary. Without a normalized environment, we have no way to predict successfully our ability to re-create the results derived in the lab in a production environment.

Research Known Incompatibilities Before Trying to Change Production Environments

The inability for certain Compaq servers to boot Windows NT successfully after installation of Service Pack 4 is well documented on Compaq’s Web site, but we most likely didn’t find that out until after the blue screen appeared. All too often, bonus-protecting managers insist that a deployment be done by some arbitrary date, leaving us with little time to perform the required testing or research. A simple visit to Compaq’s Web site could have saved us hours of downtime (thus killing the manager’s bonus) and kept us from having to answer the dreaded queries from senior management of how this could have happened.

By visiting the Compaq site before the upgrade, we would have learned that there’s a known incompatibility between firmware v.1.36 and below on SMART/2P and SMART/2E array controllers and Microsoft Windows NT Service Pack 4. Armed with such knowledge, we could have applied SSD 2.08 (as per the guidance of the Customer Advisory) while we had the scheduled downtime. Had we taken a single proactive step to gather more information regarding the task at hand, the SP4 installation on the server might have succeeded.

Document All Procedures Performed in the Lab Environment

The most important way to increase the repeatability of your work in the lab is to make sure you document every step of the process, no matter how trivial it may seem. Our notes must be so detailed that a third party can easily re-create our work without our involvement.

It’s also essential that you have a peer (or a QA group, if your organization has one) review your documentation. As authors, we have a tendency to make assumptions that we may not clearly document in the text.

Create Identical Lab and Production Environments

If we hope to gain any useful data from our lab experiments, the lab must closely resemble the production environment for the task at hand. For example, if we want to simulate the interaction of an application across domain trusts, we must first establish a similar environment to what we have in production. While it’d be ideal to match every aspect of the production environment in the lab, this is often cost-prohibitive. Instead, we may be able to simulate the 10 servers making up the domain architecture using decommissioned desktops and servers to simulate the interaction of our product in a multi-domain environment. The same is true for testing driver updates, hot fixes, and other system-level software changes to hardware. This includes making sure that the firmware revisions, drivers, card locations, memory, processor count, etc. in the lab equipment match what’s being used in production.

Each application installed on a machine wants to install its own DLLs in the system directory, and perhaps the latest version of MDAC installed with Office 2000 may just break the critical database application the primary user runs each day. Without significant testing in a lab that mirrors your standardized production environment, you can’t provide any assurances (beyond mere guesswork) to those who count on you that your efforts will be truly successful.

Use Scripting Methods to Improve Repeatability of Results

One of the best ways to make sure you can repeat complex operations is to write a script to perform the upgrade. Once the script runs the way you want it to, it can be easily run in the production environment to duplicate your efforts exactly. This is especially useful when trying to apply complex NTFS permissions, create users or groups, or modify the Registry. Scripts also help assure that the environment has been initialized to a known state for each test we perform, which is essential for garnering valid data from our experiments.

Using the Active Directory Service Interfaces (ADSI) with our favorite programming language, we can perform almost any Windows NT, Windows 2000, Exchange, IIS, or Novell administrative function programmatically. This can be useful not only for developing scripts that will re-create our actions in the lab in a production environment, but we can also use Visual Basic and ADSI to create powerful scripts that can re-create the production user domain SAM in our lab environment.

If you find the concepts in this article interesting, you might enjoy the following links:

Avoiding Extinction

To help increase your chances of success for implementing new changes in your production environment, here are some steps to follow:

  • If you’re operating in a non-standardized environment, seize the opportunity to implement standards when performing a major upgrade to the enterprise (such as Windows 2000).
  • Research potential known incompatibilities for the software or hardware you’re about to install.
  • Re-create the elements of the production environment that will be affected by your changes in a non-critical environment or isolated network.
  • Document your experiences and lab procedures with meticulous detail.
  • Script procedures in the lab environment where possible to guarantee the same procedure will be followed when it’s moved to production. Whether it’s being used to initialize the environment during the testing or to perform the actual task at hand, scripting can help assure consistent results.
  • Test the impact of a new application or system update with all critical applications. Simply logging into the client isn’t an adequate test for most deployments.
  • Have a third party validate your documentation to make sure it can be reproduced without your intervention.

The next time you avert a major system outage because you found the problem and resolution before the change was implemented in a production environment, raise a glass to the parents of scientific thought for their contribution to your success.

comments powered by Disqus

Reader Comments:

Mon, Aug 25, 2008 Anonymous Anonymous

Cool blog
Thanks, webmaster.

Thu, Mar 22, 2007 Anonymous Anonymous

Yeah not really 22-30k since I literally JUST got hired to work help desk at Microsoft in Fargo, ND on Monday, and am working Help Desk starting at $39k.

Tue, Mar 20, 2007 Anonymous Anonymous

help desk 42,000? Thats ajoke try 22-30k

Sun, Jan 14, 2007 Chuck Houston

I think the numbers look a little low, at least in the database area. Having a certification (better to have many) really helps get the interviews, especially if you have experience also on your resume. If you cannot demonstrate in the interview that you know your stuff, then no degree or certification will help you. I much prefer to hire someone who is certified to fix my brand of car, then just any mechanic who expects me to believe their good just because they say so. If the mechanic has good communication skills and a personality that I can trust, he/she will have my business for a long time and I don't mind paying a higher rate.

Wed, Dec 27, 2006 Caliban Darklock Anonymous

Certifications and degrees are ways that you demonstrate ability. If you don't have one, you have to do something else to show your ability - you have to speak the right language, or solve the right problems, or just know the answers to a few IT riddles. (e.g. "Why are manhole covers round?" - either you already know when you get there, or you lose.)

So if you have bad interpersonal skills, like many systems-oriented IT professionals do, your degree and your certifications can shore up a creepy or weird interview. But what really matters is your ability to convince the interviewer you're right for the job... and those who are good at this don't bother to get certifications.

Thu, Dec 21, 2006 gabi Anonymous

This totally bullsh*t, i work as an A+ tech and get paid 9$ an hour and make around 16000$ a year. In this article they claim that a computer technician with an A+ cert makes over 50 000$ a year. ya right with 50 years experience maybe. Besides if we do the math and how much they lied in this article, an MCSE would makearound 13 to 14$ an hour,not too great for spending 2000$ to become an MCSE.

Mon, Nov 27, 2006 William Orlando, FL

Could someone please discuss why the MCTS: SQL Server 2005 certification pays so well? Is there an industry shift towards Databases or lack of qualified DBAs?

Tue, Oct 24, 2006 Anonymous Anonymous

In some environments certifications aren't the thing they look for. They want individuals with people skills and business acumin first. I'm certified for several verdors including MCSE. I now find that "omitting" to mention that doesn't make any difference to the clients. Can you convince a board. As long as you can deliver then the money flows...

Mon, Oct 16, 2006 anonymous text illinois

anonymous wed. 9 - 6 - 06 said

"Actually, Microsoft did NOT do this survey. And, if you look at the salaries, the people who do not hold a Microsoft cert actually have a higher average base salary than those that do"

--stop right there--
let me guess? you don't have any certs and you are trying to justify your lack of motivation and skill for achieving status of MCSA/MCSE of did you just fail and you now have an agenda?

I am the only person here
with an MCSA and yes I make more than all my coworkers so don't spread B.S.

Tired of all the microsoft cert haters out there.

Tue, Sep 19, 2006 Michael Memphis

The "Other Certifications" lists A Plus certified techs averge 58,000K a year? I have my A Plus and Net Plus Cert and the best I have found is 12 an hour. I think this survey is a bit askew.

Wed, Sep 6, 2006 Anonymous Anonymous

Actually, Microsoft did NOT do this survey. And, if you look at the salaries, the people who do not hold a Microsoft cert actually have a higher average base salary than those that do.

Wed, Sep 6, 2006 JGC NY

Bull$hit Bull$hit Bull$hit
Letting Microsoft do salary surveys reminds me of Foxes guarding the hen house!!!!


Fri, Sep 1, 2006 Wayne Anderson Brighton, CO

This actually looks fairly accurate, as a broad spectrum of IT based survey information would indicate. Most of the position information I have seen over the last year or two has fallen within or near the median ranges for the statistics presented.

Also, on the relavence of certifications to your job, I think that for most jobs a cert will get you in the door but will not really be a long term boon to your performance or job function. The only time where this is not strictly true is really any position where you are in a client-facing engagement. For example, microsoft based consultants will get a lot more ongoing mileage out of having the MCSE on the resume, than say a sr systems administrator who came in with an MCSE to a publishing firm.

FOr the consultant, these credentials are an initial indicator for every new project and increase the saleability of the resource, thereby enhancing his value to the company. For the system administrator, once he uses the certification to sell himself, really the business sees no value enhancement in any way to directly leverage the certification itself. Thus it has little impact in most businesses outside of being a nice shiney seal for the HR folks to gaze at as they decide who gets laid off next month.

Fri, Sep 1, 2006 Anonymous Anonymous

I'm not saying the respondents to this survey are lying.

I will say that last month I was one of about 170 applicants for a single IT job at a good-sized homebuilder here in Austin.

I will also say that a staffing firm here in Austin seemed to have no shortage of applicants for phone jobs supporting the new MS Windows Live rollout -- jobs paying $9.50 and $10.00 an hour.

Fri, Sep 1, 2006 Anonymous Anonymous

It looks strange to me atleast that Architects are being paid lesser then everybody in the pool. same with BizTalkers.
Not sure but need to revalidate the information.

Thu, Aug 31, 2006 Anonymous Anonymous

You are missing a digit. No Architect makes $42,500.

Mon, Aug 28, 2006 Anonymous Anonymous

Great article. Thanks for the useful information.

Add Your Comment Now:

Your Name:(optional)
Your Email:(optional)
Your Location:(optional)
Please type the letters/numbers you see above

Redmond Tech Watch

Sign up for our newsletter.

I agree to this site's Privacy Policy.