Posey's Tips & Tricks
The Case of the Spontaneous Hyper-V Modifications
When mysterious changes to the Hyper-V virtual switch caused all of his virtual machines to lose connectivity, Brien was left puzzling over the source of the problem and the workaround.
As someone who works with Hyper-V almost every day, I thought that I had pretty much seen it all. Last week, however, I ran into a really strange problem. Even now, I am at a loss to explain how or why it happened, but I wanted to pass along the fix just in case anyone else experiences the issue.
To prepare for recording some videos about Azure AD Connect, I had created a lab environment consisting of eight or nine Hyper-V virtual machines (VMs). Each of these VMs was running a fully patched copy of Windows Server 2016 Datacenter Edition. The Hyper-V machine was also fully patched.
About three days into the recording process (with the VMs having been powered on and heavily used the entire time), all of my VMs suddenly lost network connectivity. Well, sort of. Here is where things got strange.
All of the VMs could still ping one another. The VMs could resolve Internet domain names and could even ping some Internet Web servers. Even so, all of the VMs reported that Internet connectivity was not available. Attempts at accessing Web sites through a browser failed, even though I could ping those Web sites without issue.
Ultimately, I traced the problem to what I can only describe as spontaneous modifications to the Hyper-V virtual switch. Initially, I thought that perhaps a hacker or malware was responsible for the change, but after spending a considerable amount of time digging through the event logs, I couldn't find any evidence to suggest that the system had been compromised.
As crazy as it sounds, the modification really does seem to have been spontaneous. As I said in the beginning, I can't explain it.
So what was changed? Actually there were two changes. In order to understand why those changes were significant, you need to know a little bit about how the Hyper-V virtual switch works.
Hyper-V supports three different types of virtual switches. The most commonly used type of virtual switch is known as an external virtual switch. When you create an external virtual switch, Hyper-V binds the virtual switch to a physical network adapter. Whenever a VM connects to an external virtual switch, all communications that are destined for the outside world are routed through the physical adapter that is assigned to the virtual switch.
Because the physical network adapter is dedicated to Hyper-V's use, the host operating system no longer uses the adapter directly. Instead, the host operating system is provisioned with a virtual network adapter that maps to the Hyper-V virtual switch, which in turn maps to the physical network adapter.
So with that said, let's talk about my findings. First, my Hyper-V virtual switch (which had been working fine for days) had been reconfigured as an internal virtual switch. An internal virtual switch allows communications between the VMs that are connected to it (the host OS can also communicate with the VMs through an internal virtual switch), but it does not allow for any sort of external communications. Remember, though, that I could still ping the outside world. That should have been impossible given that my VM was connected to an internal virtual switch.
When I tried to fix the problem, I found that the option to change the virtual switch type was grayed-out. That was also when I found the second modification. For whatever reason, the server's control panel revealed that the host's virtual network adapter had been replaced by a virtual network adapter that was tied to a network port that wasn't even plugged into anything. Remember, these changes happened spontaneously, while I was actively using the machine.
The only way that I was able to fix the problem was to delete the virtual switch and create a new one with the same name as the one that I deleted. I also deleted the host's virtual network adapter that pointed to an unplugged port.
The only explanation that I can come up with was that corruption must have occurred within Hyper-V. I am not totally satisfied with that explanation, but I can't really explain what happened in any other way. If anyone else has ever experienced this particular problem, I would love to hear about it.
About the Author
Brien Posey is a 22-time Microsoft MVP with decades of IT experience. As a freelance writer, Posey has written thousands of articles and contributed to several dozen books on a wide variety of IT topics. Prior to going freelance, Posey was a CIO for a national chain of hospitals and health care facilities. He has also served as a network administrator for some of the country's largest insurance companies and for the Department of Defense at Fort Knox. In addition to his continued work in IT, Posey has spent the last several years actively training as a commercial scientist-astronaut candidate in preparation to fly on a mission to study polar mesospheric clouds from space. You can follow his spaceflight training on his Web site.