This team’s mission: to move the entire Kentucky public school system to Windows Server 2003. In the meantime, they got their own kind of education.
—Tim Cornett’s mad. Those who
know Tim Cornett know that’s not easily done. But he’s angry
now, really angry. He’s seeing top-level organizational units (OUs)
created by people who shouldn’t have authority to do that. He’s
discovering ordinary users that somehow have all-powerful administrative
“I don’t get riled that often, but this just gets me mad,” Cornett seethes.
He’s ticked off because someone—someone important to this huge, multimillion dollar project—didn’t follow The Doc. “It’s like no one told him what we’re trying to do,” Cornett says, exasperation playing along his face as he ponders how serious the damage is.
Cornett, Active Directory architect for the Kentucky Office of Education Technology (OET), and his three-person team is in the middle of a gigantic undertaking. This migration from Windows NT 4.0 to Windows Server 2003 is epic in scale: A consolidation of about 4,000 servers to approximately one-tenth that number. Migration of approximately 160,000 computer accounts. The move of more than 700,000 users from the old system to the new.
And for all its mammoth complexity, in the end, the key factor for a smooth transition to the new OS comes down to one simple rule: Follow The Doc.
“The Doc” is the 11-page, 48-step document detailing the steps necessary to upgrade servers to Windows 2003 and move information from the outdated boxes to the cutting-edge ones in the 176-district Kentucky elementary school system. Following The Doc to the letter normally means clear skies. Failing to follow The Doc leads to storm clouds and rolling thunder.
Microsoft Certified Professional Magazine spent three days at the end of June in the middle of the massive migration project, with unfettered access to the people doing the job. We saw firsthand the power of The Doc.
And one of the engineers on the upgrade team, with primary responsibility for making sure the old primary domain controller (PDC) and backup domain controller (BDC) in a school district are properly migrated to the Windows 2003 machines, has failed to follow The Doc. Repercussions are sure to follow.
The Doc, although it may seem like a rather ordinary checklist (albeit one containing hundreds of items), is really the culmination of a process that started several years ago. That’s when Cornett decided there had to be a better way.
Recipe for Disaster
“We had 400 completely autonomous domains, set up
and managed by district personnel,” Cornett said. The vast bulk of his
time, 99 percent by his estimate, was spent troubleshooting problems due
to the unwieldy domain structure.
“Our distributed support and security system was failing,” said Chuck Austin, senior project manager for OET and head of the migration project. One example: One school district has eight schools. Every school had a BDC that was insecure and handling many different roles. Another example: A large number of servers had public IP addresses rather than private addressing with Network Address Translation (NAT). Many of the 176 districts were in the same situation. In a network that size, the potential for disaster loomed large.
That disaster struck in the form of Code Red and Nimda. The viruses shut down IT operations to 150 of the state’s 176 districts—for four of the first eight weeks of the 2001 school year. Each of the affected districts had to unplug its WAN port to stop the carnage from spreading.
In an odd way, though, it was a blessing in disguise for OET, like having your kitchen destroyed by fire, forcing you to do that remodel you’ve always wanted. “From a marketing standpoint, [Code Red and Nimda] were the greatest things that ever happened to us,” Austin said. It allowed OET to make a strong case to the state of the need for the massive upgrade.
Once the project was green-lighted, the next step was to decide on an operating system. That was a relatively simple process. “The migration path to anything but Microsoft would require a forklift overhaul,” Austin said, noting that the state Department of Education had been using Windows NT and Exchange 5.5 for years.
But that doesn’t mean they only considered Microsoft, without any thought to the alternatives. “We looked at Sun, Linux and Novell,” said Cornett. “They didn’t give us exactly what we needed. I like the single-vendor system.”
Austin agreed. “One cost people don’t factor in is that of a tried-and-true
support mechanism. If [a Microsoft] product fails, we have a single phone
number we can call.” Another factor is that users and the tech staff in
each school were already familiar with Microsoft interfaces and processes.
It was important to maintain that comfort level.
They were glad they did. “Microsoft has stepped up to the plate in a
way we’ve never seen before,” Austin said.
|Figure 1. Before. A typical school district’s
layout under NT 4.0 in the state of Kentucky.
Choosing a Design—Then Scrapping It
Once the OS decision was made, the design phase kicked off. It was probably
the most frustrating, aggravating time of the entire project. Not having
anyone on staff with enterprise-level upgrade, migration and Active Directory
architecture experience meant OET spent six months in the first half of
2002 “spinning our wheels,” Austin remembered. Cornett said “We looked
at everything, from a single forest, single domain [structure] to 176
separate forests. We looked at all design [options].”
It didn’t help that they had trouble getting consistent answers from outside consultants. After endless meetings, thousands of consulting hours and lots of fingernail chewing, they settled on a single forest, single domain model for all 176 districts. Everyone was happy. Then, yet another Microsoft consultant saw their design and gave them a hard kidney punch. “He said a single forest, single domain wouldn’t work because replication would eat our shorts,” Austin said. “I can’t tell you how emotional we were, because we felt like we were back at square one.” It had to be especially wrenching for Austin, a hard-driving, get-things-done, type-A personality who pushes his team constantly to excel.
They finally gave up trying to do it themselves and called 9-1-1. In August 2002, OET brought Hewlett-Packard and Microsoft in as lead supervisors for the project, and they soon had a workable design: A single-forest, empty-root domain with 184 child domains (the eight extra domains were for administrative purposes). Each district was to be its own domain and its own site.
During these discussions, they also decided to scratch Windows 2000 in favor of Windows 2003, as it would be the latest OS by the time they began the upgrade. Surprisingly, they had no fear of deploying a pre-service pack 1 Microsoft server OS.
“We were tickled to death with Release Candidate 2,” Austin said. Their anxiety was further eased when OET joined the Early Adopter program for Windows 2003, giving it access to the OS’ pre-deployment code, a dedicated MCS (Microsoft Consulting Service) consultant, and a week in Redmond at the Microsoft Enterprise Engineering Center. During that week, they used VMware to simulate the upgrade and find potential traps and pitfalls.
At the same time, they decided to standardize on hardware, making troubleshooting
and support significantly easier. The winner in the hardware derby was
Dell; OET went with identically configured PowerEdge 2600 servers with
single processors, 2GB RAM and 100MB NICs.
With the hardware decisions settled and design and infrastructure questions
answered, it was time to start building the environment. Windows 2003
was put on servers in a test lab in September 2002. The following March
saw the pilot program rolled out with about 30,000 users, including the
OET for the “eating your own dogfood” factor. Following a successful pilot
run of several months, the time came to roll out to the school districts.
To smooth the upgrade, each school district was outfitted with at least one point-to-point T1 connection, and each was made a hub site, with at least one domain controller (DC) and global catalog server (GC).
Many people in the state Department of Education—whose authority OET is under—may have been nervous about the rollout, but Cornett wasn’t among them. “After seeing [Windows 2003] beta 3 and how it worked, I was very confident it would go well,” he said.
And, for the most part, it has—exceedingly, almost implausibly, well. That is, when The Doc is followed. “We’ve got seven different migration teams, and if they follow it to the letter, it works every time. It’s amazing.” Cornett said. “We’ve tried to stress, ‘Follow The Doc. Don’t be stressed, don’t be impatient, don’t try to figure it out. Call us, and follow the doc.’”
But on this Wednesday, about two weeks into the full migration, someone didn’t follow The Doc. It turns out to be an employee of Nicholasville, Kentucky-based Accent Systems, OET’s technology partner responsible for building the servers and performing the physical migrations in each school district. Each Accent employee is paired with a Kentucky Educational Technology system (KETS) engineer, making two-person teams in the field. Members of OET have said they’re delighted with their partnership with Accent.
Except for now. Cornett learned of the problem when the Microsoft Operations Manager (MOM) agent—the chief network monitoring tool—wouldn’t install on a server at a migration site. He checked user rights on the box and saw all kinds of stuff he shouldn’t see: top-level OUs created that no one at the district level should have the ability to create; permissions not assigned correctly in Exchange; ordinary users with Administrator privileges. In one school district, Estill, every user was a member of the Domain Administrators group.
The Accent worker had missed an important step in the process—that of removing all accounts from the Domain Administrators group except the Administrator account. A basic, critical step. And this was the third district for which he’d done the migration. Cornett was seeing red.
Four upgrades were scheduled for the next day, Thursday, including the Pike district, one of the biggest in the state. “He’s supposed to do one tomorrow [Thursday]”, Cornett told the erring employee’s Accent manager over the phone. “Well, he’s not doing one tomorrow.”
Cornett flies into damage control mode. Using Active Directory Users
and Computers, he starts deleting accounts on the servers that were messed
up, going district by district. By the end of the day, he’s restored order,
and only the people who are supposed to have administrative access have
|Figure 2. After. The district layout
after the migration. Note how the domain controllers are reserved
for the district office only in the new system. (Click image to view
The migration process is complicated, but involves these basic steps:
1. Each migration team brings with it a temporary server that serves as the migration box (which OET calls TEMPMIG).
2. The migration server is built as a BDC and then promoted to a PDC on the network, automatically demoting the district PDC to a BDC.
3. The two new servers (the DC and GC) are built as member servers with DNS, WINS and DHCP already loaded.
4. TEMPMIG is then upgraded to Windows 2003 and configured for DNS.
5. DC and GC are then promoted to domain controllers.
6. FSMO roles are transferred from TEMPMIG to the DC.
That completes most of the first day’s migration work. On day two:
1. DNS records are verified as being correct.
2. The DC and GC are pointed to themselves for DNS.
3. Sysvol is checked for login scripts.
4. A custom-developed script is run that configures OUs for the domain.
5. District administrators are re-permissioned.
6. NTFS permissions are set.
7. TEMPMIG is demoted through DCPromo.
8. TEMPMIG is removed from the domain, to be taken to the next district migration.
Upgrades are a two-day process. That in itself is an achievement, said Brad Gray, a member of Cornett’s AD team, who provides tech support for the migration. “It’s a streamlined process. It takes a day and a half to do the migration and a half day to do the knowledge transfer. Switching from NT 4.0 to Windows 2003—with everything that has to happen—in two days is amazing.”
Gray, who speaks in a slow, easy-going Southern drawl, works with a lot of the district technology coordinators (DTCs) during the process. The DTCs, who serve as the local Administrators, along with their users, have been thrilled with the upgrade, he said. Part of it is because the biggest change is for the OET, not the DTCs. “Ninety-nine percent of the things they used to be able to do, they can still do. They just have a different process for doing things.
“Overall, after a migration, everybody seems to be happy,” Gray added. “We hear a lot of comments on the decrease in the amount of time it takes to log on.”
For his part, Gray hasn’t seen major issues arise. “I probably spend most of my time on troubleshooting little nit-picky things.”
That’s the way Austin, head of the migration project, likes it. “The amazing thing to me is that any issue we’ve had, any time we’ve stumbled our toe, it hasn’t been because of [Windows 2003]. The technology works. For the past three months, we’ve been waiting for the ceiling to fall, and it hasn’t.”
There are still numerous challenges ahead, though. “Beyond the next six months is when I think it will be hairy,” Austin said. “That’s when we start instituting group policy. For instance, some school districts may not have the bandwidth to handle group policy.”
Also in the pipeline is an e-mail infrastructure overhaul. By the end
of the year, Exchange 5.5 will be replaced by Exchange 2003, as Windows
2003 won’t run Exchange 2000. At the same time, OET will consolidate its
Exchange server farm. “We have 320 Exchange servers [now],” Cornett said.
“We want to get to under 50 and hopefully, under 20.”
But that’s months down the road. Right now, on this Thursday, June 26,
Cornett is worried about Pike County, a district comprised of 23 schools.
It’s the biggest migration yet, in terms of the number of schools. Cornett
and his AD team don’t normally go into the field; they man the office
and take questions and solve problems from here. There are a number of
points during the migration when engineers need feedback from OET before
proceeding. For instance, step 23 of The Doc requires the field teams
to contact the AD team to authorize and activate DHCP scopes on the new
DC and GC.
Adding to the pressure is that one member of Cornett’s team, Gray, is on vacation, leaving only Cornett and Mac Nelson, the third member of the team, to handle a busy migration day. Perhaps owing to their short-handedness, or simply because it’s his nature, Cornett is a whirlwind as the Pike engineering team migrates the domain. He’s working a server problem with the help of MOM. He also has multiple MSN Instant Messenger windows open, as he communicates with multiple parties. (It’s clear that IM has become a critical business application for OET. During today’s migration, at least eight IM windows are open on Cornett’s monitor, messages constantly flying back and forth between him and the field teams. Nelson’s screen is similarly cluttered.)
Cornett’s also troubleshooting slow network connections in the Murray district, and at the same time, dispensing phone advice to a DTC about setting up OUs. “Don’t go any deeper than three levels on OUs, because it really overworks the domain controllers. They say you can go as deep as five levels, but they don’t have 200 domains in the forest,” Cornett explains.
In the meantime, Nelson’s getting an error message from a DTC that they can’t remove users from the Domain Administrator’s group. It sounds like an outer-space movie in his cubicle; he’s got various beeps and boops coming from his computer, which he’s set up to mimic the cacophony from the bridge of the Starship Enterprise.
All this frantic yet effective troubleshooting would be impossible without OET’s best friend: MOM (specifically, MOM 2000, SP1). Its predictive power, combined with that of Dell OpenManage for Servers on the hardware side, is stunning. Cornett, Nelson and Gray keep MOM running constantly, and Cornett keeps the MOM administrator console open on his monitor, searching every few minutes through voluminous MOM events. “The whole push here is about proactive management, rather than reactive firefighting,” Cornett said. “We’re hoping to call a district and say, ‘You’re going to have a problem with a switch,’” rather than having the district call OET and informing it of a problem.
For example, MOM’s been configured to send an alert if DHCP requests take longer than 150ms to fulfill. “If it takes longer than that, I want to know about it,” Cornett said, as it could be a first indicator of network congestion.
Not only is MOM effective in this environment, it’s financially practical. The department gets huge educational discounts on the software. That’s important, since OET has a budget of $3 million for the entire migration. Current projections are that they’ll come in at, or very slightly under, budget.
That’s also a reason that no third-party migration tools were used. Given the reality of education budgets in the current economic climate, there simply weren’t the funds available to seriously consider other tools. But that doesn’t seem to have hurt them too much. That is, again, where The Doc comes in. Doing some tasks manually that a third-party tool could do might have speeded up some processes. On the other hand, it would have cost significantly more, and there would have been a learning curve with another product. As long as The Doc is followed, the need for other tools is eliminated.
Of course, with an undertaking this size, the human element will continue
to cause problems. A serious issue, for example, comes up later in the
day with a script written by their MCS consultant. The script automatically
sets up DNS information on the new servers. The problem is that he has
added an “s” to the end of the DNS server domain name, so the script is
handing out the wrong DNS suffix to all the DCs and GCs. The newly built
servers, therefore, have the wrong name for the DNS server to which they’re
supposed to connect and register their own names. The problem’s affected
DCs in six school districts, which have lost all connectivity, adding
to the day’s pressure. But the bug is eventually found, and the MCS consultant
e-mails a new script that corrects the issue. Just another day in Migrationville.
All About the Kids
Migrations are scheduled through the end of the year, when OET hopes to
have the last of the school districts moved to Windows 2003. The smoothness
of the upgrade has perhaps its biggest benefit of all for the main customer
base: the 700,000-plus students and teachers in the Kentucky school system.
As Gray, who has a five-year-old son and three-year-old daughter, says,
“The most rewarding thing [about the project] is knowing that the kids
will have technology available that’s reliable and robust. They are the
final end-users. The kids gain in education through technology.”