Posey's Tips & Tricks

Why Restarting a Failed SCVMM Job Might Be a Bad Idea

Occasionally, restarting a failed System Center Virtual Machine Manager job can leave your virtualization infrastructure in an unknown state. Here's how to avoid that.

If you have spent much time working with System Center Virtual Machine Manager (SCVMM), then you know that most of the administrative actions that you take against virtual machines (VMs) result in the creation of a job. These jobs, which are visible through the Jobs workspace, occasionally fail.

There are any number of reasons why these failures might occur. A dependency might be missing. A communication problem might exist. Even a clock skew issue could potentially cause a job to fail.

One of the nice things about the way the SCVMM management console works is that you have the ability to restart a failed job. Just right-click on the failed job and choose the Restart option. You can see what this looks like in Figure 1.

[Click on image for larger view.] Figure 1: You can right-click on a failed job and run it again.

I have to admit that I absolutely love having the ability to restart a failed job. I have occasionally found myself in situations where I have spent a considerable amount of time preparing an administrative operation only to have the corresponding job fail over something silly. It's nice to be able to fix the problem and restart the job without having to start from scratch.

At the same time, though, restarting a failed SCVMM job might not always be the best idea.

I can think of at least a couple of instances in which I have been in a hurry and restarted the wrong job. That, of course, resulted in a completely unplanned administrative action with totally unpredictable results. Even if you are more careful than I am, there can still be consequences to restarting a failed SCVMM job.

The reason why restarting a failed SCVMM job can sometimes have consequences is subtly revealed in up in Figure 1. Notice that there is a menu option to restart the job and another menu option that allows you to restart the job while skipping the last failed step. Herein lies the problem.

SCVMM jobs are simply a collection of steps that are performed in an automated fashion to accomplish a specific task. You can think of an SCVMM job as being a lot like a PowerShell script. In fact, a job can be a reflection of a PowerShell script that is running behind the scenes.

Consider the act of using SCVMM to create a new VM. This action results in a job being run, but it also causes a PowerShell script to be created. If you look at Figure 2, you can see that the Create Virtual Machine Wizard's final screen includes a View Script button. Clicking this button reveals the PowerShell script that is being used to build the VM. Notice in the script how all of the various steps are being added to a single job group (in other words, they are all a part of the same job).

[Click on image for larger view.] Figure 2: Many SCVMM tasks leverage underlying PowerShell scripts.

The reason I bring this up is to illustrate the point that even though some jobs are simple and consist only of a single step, other jobs are more complex and require many different steps. When a job fails, it means that SCVMM was unable to perform one of the steps associated with the job. It also means that there is at least a chance that some of the steps were successful.

If you blindly restart an SCVMM job, there is a chance that you may be repeating steps within the job that have already been performed. As such, you may end up causing some additional issues if you attempt to rerun the job without first taking corrective action. Similarly, using the option to skip a failed step could end up causing problems because the job's remaining steps may depend on the successful completion of the failed step.

If you are thinking about restarting a job, my advice is to take a look at the job's Details tab. This tab will show you all of the individual steps that make up the job. That way, you can see which (if any) of the steps were successful. It could be that you need to undo some of the successful steps prior to rerunning the job.

Even if you don't have to take any corrective action, though, reviewing the Details tab can sometimes help you to figure out why the job is failing.

[Click on image for larger view.] Figure 3: The Details tab shows the steps that make up a job.

Ultimately, there is no rule that says that you can't restart a job; Microsoft wouldn't give you the option to restart a job if doing so were truly a violation of best practices. Even so, it's important to be careful about restarting failed jobs so that you don't accidentally leave your virtualization infrastructure in an unknown state.

About the Author

Brien Posey is a 22-time Microsoft MVP with decades of IT experience. As a freelance writer, Posey has written thousands of articles and contributed to several dozen books on a wide variety of IT topics. Prior to going freelance, Posey was a CIO for a national chain of hospitals and health care facilities. He has also served as a network administrator for some of the country's largest insurance companies and for the Department of Defense at Fort Knox. In addition to his continued work in IT, Posey has spent the last several years actively training as a commercial scientist-astronaut candidate in preparation to fly on a mission to study polar mesospheric clouds from space. You can follow his spaceflight training on his Web site.


comments powered by Disqus

Subscribe on YouTube