From the Trenches: The Confusing Case of the Two-PDC Domain
Getting a new ERP application up and running was only the first challenge for this MCP.
- By Chris Gerrib
I work as a networking consultant, providing services
to a number of small and mid-sized clients in the Chicago
area. In my work I’ve seen various disasters, “weird”
quandaries, and just plain old problems. But the worst
thing I saw was when my single NT domain decided to have
two Primary Domain Controllers (PDCs) at once!
I was on-site at a mid-sized steel manufacturer, helping
it implement J.D. Edwards’ OneWorld enterprise resource
planning (ERP) product. This Windows-based application
requires IBM AS/400, an NT server running a SQL database,
and fixed TCP/IP addressing. The client software runs
on Windows 95 or Windows NT 4.0 Workstation, but any development
work (and with any ERP project, there are tons of development
work) has to be done on NT Workstation.
The Scene of the Crime
The client had completed the project’s first phase, and
so a number of users were “live” on OneWorld, running
on the PDC. The developers now needed a test machine for
the next phase. I built a standard NT server on a Dell
server-class chassis, and added it to the domain as a
Backup Domain Controller (BDC) during the install process.
So, I now had a fairly simple network, consisting of a
PDC (let’s call it “OLD”) a BDC (“NEW”), an AS/400, all
in a single NT domain. All devices had the then-latest
service packs installed.
About two weeks later OLD crashed due to a hardware failure.
It was a fairly quick fix; but to allow users some network
services, I promoted NEW to PDC while I fixed OLD. Within
a few hours, I got OLD back up. It came up as a PDC in
Server Manager. I demoted NEW to BDC in Server Manager
without errors—or so I thought. After a brief refresh
both Server Manager displays agreed; OLD was the PDC,
NEW was the BDC. I had taken all client PCs down during
this swap and now asked them to log back in.
The majority of people, including all production users
(with Windows 95 PCs), were able to proceed with their
work. However, none of the developers with NT workstations
could get past the domain login screen. Thanks to another
consultant’s oversight, we didn’t have the local user
account name or password. The developers’ PCs were now
While attempting to troubleshoot this problem, I realized
that the domain was having more serious trouble. Changes
made to user account information on the PDC weren’t being
communicated to the BDC, despite repeated use of the “Force
Synchronization” command on both machines’ Server Manager.
Also, NT workstations were unable to join the domain even
on a fresh install.
I checked name resolution, and could PING by name. I
installed NetBIOS on both servers, rebooted after hours,
and again found that the two PCs weren’t talking. I tried
to promote NEW to PDC, and was able to do so without errors.
(Of course, that should have failed.) Frustrated and with
an unhappy client, I called Microsoft Technical Support.
After retracing my steps with IP name resolution, tech
support had me try a command line utility called Nltest
(available in the Windows NT 4.0 Resource Kit). This has
several options, including “force synchronization” and
“query” options. (For more information, see TechNet article
Q158148, “Domain Secure Channel Utility: Nltest.exe” on
Microsoft’s Web site). The end result was failure. The
two servers, OLD and NEW, weren’t talking.
The diagnosis was that the “secure channel” between the
two PCs had failed. NT servers use this “secure channel”
to pass RPC calls between controllers in a domain or between
domains in trust relationships. Specifically, the failure
was on OLD—the production server! This was why I couldn’t
get my NT workstations to connect, even if I did a clean
install. The only reason my Windows 95 production machines
were working was because that OS doesn’t integrate into
the domain as tightly as NT 4.0 Workstation.
The Solution Revealed
At this point I had one choice: Format OLD’s hard drive,
reinstall everything, and restore from backup. This was
on Tuesday night. Not wanting to lose the weekend as well
as three days of the developers’ work, I pressed for another
option. Tech support offered a potential way out: Rename
the PDC! I’d been taught that doing this would be equivalent
to putting a gun to my head, but I had nothing to lose.
After hours that night I stopped all the services (SQL,
backup, and the like), ran a backup, and set all but the
minimum services to “manual.” Then I renamed OLD (to GIHTW
for “God, I Hope This Works”) and rebooted. Much to my
surprise, GIHTW came up clean and declared itself PDC
of the domain. More important, changes made on OLD/GIHTW’s
User Manager immediately appeared on NEW. Time for step
two: Change GIHTW back to OLD and reboot. Again, everything
worked fine. The two machines, OLD and NEW, were talking
to each other and propagating changes. Plus, I was able
to restart all the services—including the critical SQL
databases—without incident. Even better, the developers
could log into their NT workstations without a hitch.
The results left me happy (and my weekend plans intact).
And since I’d prepared the client for the worst (re-install
and recover), I looked like a hero to him. Also, I learned
two valuable lessons from this situation.
First, you should be very careful about promoting and
demoting domain controllers. Although it should work fine,
it may not. Likewise, you need to verify that your domain
is working—looking at Server Manager isn’t enough.
Second, don’t give up. In my odyssey several people suggested
I “just format and re-install.” By being persistent, I
got the client up without risk of data loss or excessive
Chris Gerrib, MCP, CNE, has been in high-tech for five years, the last four with Hinsdale, Illinois-based consulting firm Information Technologies International. He started out as a “screwdriver holder” for the senior technicians and worked his way up to his current position as VP of Operations. He holds degrees from Southern Illinois University and the University of Illinois.