Special Reports

Avoid Disaster Downtime

Find information on disaster recovery and backup strategies.

A surprising number of businesses today lack good backup and recovery planning. Worse, many struck by a disaster simply don't survive. What would happen if a hurricane, flood, tornado, or an earthquake destroyed your business' data center? Or what if a simple hardware or software failure caused one of your vital databases to become corrupt? If you don't have a backup and recovery plan, you need to start implementing one today. If you do have a plan, it's a good time to review it and make sure you're doing everything possible to protect your company's valuable data.

In this article, I'll discuss some best practices for creating a backup and recovery plan to ensure your business can get back up and running quickly. In the new .NET world, the traditional guidelines still apply; however, there are a few things that you should pay particular attention to and I'll address those as well.

Creating and implementing a backup and recovery plan can be complex and time consuming. As a first step, consider putting together a team to tackle this task. The team should include management, the IT staff, and, possibly, third-party vendors. Management can provide the business-analysis and decision-making expertise, while the IT staff can offer the technical expertise and can inform management about what is and what isn't possible to implement. Third-party vendors have both technical and business case information with regard to their products and services. They might also impart a valuable external point of view.

It's often helpful to create a plan for each individual application when outlining backup and recovery strategies. When they're complete, you can combine them into an overall plan for your business. At this time, it's wise to consider any application interdependencies that might affect backup and recovery procedures.

First, determine every application, and its associated data, you need to back up. Simply make out a big list, and collaborate with your team to make sure nothing is overlooked. Include any services your application relies on, such as the Domain Name System (DNS), directory, and certificate services (see the sidebar, "Create a Backup Checklist"). The ability to restore an application itself isn't useful if it doesn't function because a required service it depends on is unavailable. If you plan to back up any databases, be sure to include any transaction logs as well.

Set Your Schedule
Once you know what to backup, you need to determine how often to do it. To create a backup schedule, consider these questions: How much data can my business afford to lose? (A week, a day, a single transaction?) And how much can my business afford to spend on backups?

A typical scenario is to perform full backups once a week, and incremental backups daily (see Table 1). Full backups might impact application performance, so doing them daily often isn't feasible. Additionally, because they back up everything, full backups also require more storage media, which can increase costs. Incremental backups, however, back up only what has changed since the last full or incremental backup, saving backup time and media storage space.

The downside to incremental backups is that they take longer to restore. For example, suppose you perform a full backup on Saturday night and incremental backups daily Sunday through Friday night. If your application fails on Saturday morning, you'll have to restore your last full backup and the last six incremental backups. Differential backups offer an alternative to incremental backups. Differential backups also back up changes from the last full or incremental backup, but they don't back up changes since the last differential backup. So, if you replace the incremental backups in the previous scenario with differential backups, at most, you'll have to restore the last full backup and the last differential backup.

Storing backups offsite is another critical aspect of any backup and recovery plan. If you store your backups in the same location as your production environment, you run the risk of losing both in one fell swoop. So, you might want to store the weekly full backup offsite. Be sure to select a secure offsite location: It's probably a bad idea to have one of your IT employees storing them at home. If the employee becomes disgruntled, or if you have to let the employee go, his or her home is no longer ideal. You should also consider how easy it would be for someone to break into the employee's home and steal your sensitive corporate data.

The ideal offsite location is distant geographically, so it remains unaffected by large-scale disasters, such as earthquakes and hurricanes. You might also want to store multiple copies of the same backups at different locations. For example, you might keep one copy on site for quick access, and another at a distant location in case the local copy is destroyed or unavailable. If you decide to store backups on site, consider purchasing a safe that can withstand fire and water damage to store them in.

Next, you need to decide what hardware and software will be required to perform the backups. When determining this, consider these criteria: capacity, speed, cost, reliability (of backup hardware and media), the lifetime of the media, and monitoring capabilities.

There are many off-the-shelf software products for performing and monitoring backups for you to select from (see Table 2). If you have large amounts of data to back up, you should consider special hardware configurations to improve performance (see the sidebar, "Improve Backup Performance on a Budget").

For mission-critical applications, such as e-commerce applications, you might not be able afford the downtime involved in restoring a server. In that case, consider using replicated servers or clustering as a supplement to backups. This adds redundancy, so if one of your servers goes down, users will still be able to access the same application and data on the remaining servers. It also allows you to restore the downed server, without interrupting service.

Be sure to use time synchronization on your servers. Network Time Protocol (NTP) is a well-established standard for providing time synchronization services with implementations on most platforms, including Windows, Unix, and many network appliances, such as routers and switches. A complete explanation of NTP is beyond the scope of this article, but in general, you set up an NTP server on each of your LANs (see Resources). Each of these servers receives its time information from another parent NTP server on your WAN, or from a public server on the Internet. The NTP clients, in this case your application servers, request time information updates periodically from the local NTP server. Without time synchronization, it's difficult to tell which data is the most current. Also, accurate time information in log files is critical when you attempt to determine the sequence of events leading to a failure. This, in turn, helps you prevent the failure from reoccurring.

Review the recommended backup and recovery procedures for each of your software components. Most commercially available server applications include detailed white papers or sections in their manuals on backup and recovery options and procedures. Be sure to examine all these resources when planning and implementing your own strategy.

Create a Recovery Strategy
Once you have a backup strategy, you need a plan should you ever have to use it. When you create your recovery strategy, start with a plan that includes everything required to restore the application completely, from the ground up, including the network, hardware, and software. This can be daunting for applications that have been in place for some time. For this reason, you might want to consider consulting external parties that specialize in backup and recovery to assist you. Several companies provide a range of support, from consulting to complete hardware and software solutions and services for disaster recovery. Determine which spare hardware components can be kept on site so they're readily available. Ideally you should have a replacement for every component, but this is rarely affordable. Evaluate each spare part based on cost, the likelihood of failure, how long it will take to obtain a replacement, and how important that part is to running of your application.

Recovery involves more than just restoring from backups. Have "what if" sessions to determine possible types of failures. Then, create plans with all the necessary steps to recover from each particular failure. Obviously, you won't be able to think of every possible failure scenario, but consider a full range of failures, from a hard drive failure to a flood or a fire. If a major disaster was to strike, plan how you would replace all the existing physical facilities, networks, hardware, and software. You should also consider building relationships, or striking agreements, with hardware and software vendors and network service providers so you can replace everything that might be lost quickly.

Many of the choices that go into a backup and recovery plan are complex business decisions. For this reason, most companies can benefit from performing detailed risk and cost-benefit analyses. As part of this process, you might want to consider outsourcing all, or some, of your operations to meet your particular backup and recovery needs more cost effectively. Service providers have redundancy and detailed backup and recovery procedures already in place typically, and can provide a higher level of service for less than what it costs if you do it on your own. Also, as part of your risk analysis, prioritize your business processes and applications. This helps determine which processes and applications will be recovered first.

If you haven't done so already, you'll need to get buy-in from upper management and will need to obtain funding. This can be difficult, as the costs of creating and implementing such a plan is often expensive. However, if you make your case using a few "what if" disaster scenarios, the value of such a plan usually becomes apparent rapidly. For mission-critical applications, consider using third parties to advise you on your backup and recovery plan, and/or provide a complete solution for you in case of a disaster.

Once your backup and recovery plans are complete, print them out and store duplicate copies off site. If your backup and recovery plans exist only in an electronic format, you might not be able to get to them if your IT infrastructure is down.

Even though you've completed your plan, you're not finished yet. You must test and refine your plan to ensure it will work when you need it. The testing will probably take several iterations to work out all the issues you discover during this time.

Start by creating a test environment to simulate the production environment. The expense involved with creating this environment can be a hard sell with the bean counters. However, you can justify the cost by using the test environment for other purposes, such as quality assurance testing, testing updates and patches, and possibly for a pilot if you're deploying a new application.

When you create your test environment, make it match your production environment as closely as you can. Be sure to use backup copies of existing data, so you can discover as many problems as possible during your testing. The test environment should be isolated, enabling you to simulate various failures without impacting your production network.

When you test your new backup and recovery plans, you're sure to discover several issues you can use to refine your plan. Implement any changes, then re-test your plans. Keep in mind that whenever you make changes, patches, updates, or add new features to your application, you'll need to update your plan, then test it again.

In addition to testing your backups during the initial setup of your backup and recovery plan, you should test your backups from the production environment periodically. The best way is to attempt to restore the production backups to your test environment. This will give you confidence that your backups are complete, and that they can be used to restore your data if needed. There's nothing worse than finding out your backups are incomplete or won't restore your system when you need them most.

You might also want to institute scheduled or surprise recovery drills. This ensures your staff is familiar with all the steps and procedures in your recovery plan, and further validates it. Typically, you won't want to simulate failures on your production network, so these drills are another way to use your test environment.

It's important that you have some sort of monitoring in place to ensure automated backups are being performed successfully. There are many software backup solutions that do this, or you can implement your own if the off-the-shelf software does not meet your needs (see Table 2).

In the unfortunate event you have to put your backup and recovery plan into action, make the most of it. Perform a complete review of how the staff and the plan performed, so that you can make adjustments and further refine the details. This way, if it happens again, you'll be better prepared.

Factor in .NET
Backup and recovery plans don't change significantly in the .NET world, but the .NET Framework has a few facets you should pay particular attention to. .NET applications are often distributed—such as when you use XML Web services—and this can be both an advantage and a cause for concern.

On one hand, it's often easier to duplicate a distributed application's components, so they can serve as a backup in case the primary component fails. Also, the different components can reside in different physical locations, offering some protection against a single point of failure. For example, suppose you have an e-commerce site that uses a Web service to provide current product availability. If you have two separate servers, your application can fail-over from one to the other automatically.

On the other hand, with a distributed application, you might have less control over how backups and recoveries are performed. If in the previous example, the supplier provides the product availability Web service, you're subject to that third party's backup and recovery procedures. Be sure to consider reliance on third parties in your own plan, along with what their plans entail. This way you can add responsive measures to your own plan to compensate for any shortcomings. For example, you might choose to copy the supplier data locally on a regular basis. Then, if there was an outage, you wouldn't have the most recent supplier data available, but it would be better than nothing. Also, consider establishing Service Level Agreements (SLAs) with the third parties. This gives you a more concrete sense of what the third party is able to live up to, and might give you some recourse if it fails to meet the agreement. .NET applications rely heavily on the network infrastructure, so pay special attention to your internal network's redundancy and recoverability, and to your Internet connectivity if your application uses it.

Finally, you might want to get creative and use Web services to assist in performing backups across the network for smaller data sets. For example, you could create a Web service for your company that allows your employees to back up files from their laptops easily, either connected to the local LAN or through the Internet. You could implement Web services to help you monitor your backup processes. By writing Web services that have a common interface for two or more different backup applications, you'll be able to aggregate views for several different backup logs into a single integrated view. Or, using Web services and the ubiquity of the Internet, you could store data on a server that in the past might have had to be stored on a client machine. For example, suppose you have a sales force in the field that currently gives their sales managers feedback on potential clients by e-mailing the managers a spreadsheet. You could write a Web service to capture this information. The service stores the information centrally, and, thus, makes it much easier to back up.

These examples should provide you with a starting place for implementing or refining your own backup and recovery plan. The amount of effort and money your business invests in a backup and recovery plan is a decision that must be based on your particular circumstances. But remember: In business, and especially in IT, it always pays to be prepared for the worst.


comments powered by Disqus

Subscribe on YouTube