FCI Can Set You Free
The Microsoft File Classification Infrastructure can help you simplify data identification.
- By Alan Maddison
In recent years the growing amount of legislation affecting data retention and records management has placed an enormous burden on IT. As part of a broader compliance framework, there's a tremendously complex set of requirements that must be dealt with; companies must address the needs of State and Federal legislation and, in some cases, international laws and corporate governance mandates. Some estimates calculate that there are 35,000 rules or pieces of legislation relating to records management worldwide. To add to this burden, IT is also seeing significant year-over-year increases in the amount of data that's being stored on the infrastructure for which it's responsible.
Many organizations have addressed these challenges by implementing archiving solutions, particularly for e-mail. In addition, if an organization has a content-management system or collaboration framework -- such as Microsoft Office SharePoint Server 2007 -- the records management and discovery needs of the company are often handled by built-in workflow processes and indexing engines.
However, the data residing on file servers is often overlooked, which can significantly affect the performance of data management and retrieval processes. As part of your company's information lifecycle-management process, this creates an ineffective situation in which old or stale data is maintained in expensive Tier 1 storage. Additionally, there's often no way to identify the type of data or its value, which means there's no way to efficiently move it to Tier 2 or 3 storage, where it would cost less. This discrepancy can also impose a tremendous burden in terms of time and cost if you're required to produce data in response to a discovery request -- whether it's for internal purposes or in response to litigation. With the release of Windows Server 2008 R2, Microsoft has introduced a native toolset that may help overcome the challenges you face associated with data residing on your file servers.
File Classification Infrastructure (FCI) is a policy-based tool that automates the process of applying a pre-defined classification to a file based on its location and content. Microsoft has also provided an extensible infrastructure for third-party integration. The two primary extensibility points are classified plug-ins, which allow a plug-in to use FCI classification rules, and property storage module plug-ins, which allow third parties to read and write classification properties. The out-of-the-box classification capabilities include automatic classification that's based on the folder; manual classification that's based on the file properties of Microsoft Office documents and is set by the user; and custom classification that uses the FCI distributed component object model (DCOM) API to set the classification properties of the file.
Because the API is DCOM-based, custom classification can also be manipulated through scripts, including Windows PowerShell and VBScript -- though not through the use of cmdlets. This allows administrators to quickly and easily extend the out-of-the-box capabilities and assign properties to files based on their company's needs. In terms of data management, the out-of-the-box capabilities include storage expiration, which allows for files to be expired based on age, location or some other classification property. These properties can also be used as the basis for custom tasks such as moving data to lower-tier storage. Finally, reporting is included so you can track files based on classification. As a new feature of Windows Server 2008 R2, FCI is exposed through the File Server Resource Manager (FSRM), which is part of the File Services role.
Install and Configure
The first step to start using FCI is to install the File Services role and ensure that the FSRM role feature is also selected. After installation you can launch the FSRM Management Console from Administrative Tools. Once FSRM has opened you'll see a Classification Management node in the left-hand pane. Once you expand this node, you'll see two sub-nodes -- Classification Properties and Classification Rules -- that are the core of file classification.
Next, you must define the properties you need. Microsoft's current recommendation is that you attempt to minimize the number of classification properties in order to minimize administrative overhead -- as well as the associated disk overhead. The properties that you create are stored in NTFS alternate data streams, as this allows FCI to support any file type.
In addition, if you create properties and target Microsoft Office documents, the properties are also stored in the file itself. This feature allows for FCI to integrate with Microsoft Office SharePoint and, in effect, share metadata. While Microsoft recommends that you minimize the number of properties, it's important to remember that the properties should always align with your business needs. This means that while there may be common elements across industries, each organization's requirements will be different. For example, law firms may create a property of "Attorney Client privilege-Confidential" and companies in the health industry might create a "HIPAA-Confidential" property.
To ensure you're creating the right properties, you should involve the various business units and executive management in the decision-making process. Once you've defined the properties, it's easy to implement them.
In order to add a new property, select Classification Properties in the FSRM management console and then click Create Property in the action pane on the right-hand side of the console. The first items to enter are the property name and a description. While the description isn't mandatory, it's always best practice to create one, as it will make administration easier in the long run if you know exactly what each property is and how it's used.
The next step is to choose a type of property with the possible options being YES/NO (Boolean), Date-time, Number (integer), Multiple Choice List, Ordered List, String and Multi-String. Once the properties have been created they become available to the Content Classifier, an integral component of FCI. The Content Classifier is the component that assigns properties to files by examining their contents and matching the rules that you create to that content. In order to do this, the classifier uses IFilters to extract text. IFilters have been part of Microsoft indexing since Windows 2000 and are widely available.
If the default file types aren't sufficient for your needs, it's possible to add additional file types by adding additional IFilters. However, it's important to remember that in order for an IFilter to work it must support text extraction.
Once the text has been extracted the file will be classified based on the rules that you've applied. You'll almost certainly want to look into what other IFilters are available, as the default list is quite sparse -- and fortunately there are many other IFilters available. For example, Adobe Systems Inc. provides an IFilter for PDFs, and you can also download the Microsoft Office 2007 System Converter Filter Pack, which will add a number of files types from the Office suite of products including Word, Excel, PowerPoint and Visio. The Office filter pack can be downloaded at http://tinyurl.com/36yrlj.
Once you've created your properties, they're available for use via the API. This means that you can begin to manually classify files through the use of a script, such as PowerShell, or a third-party application that can leverage the FCI API. However, if you wish to use Automatic Classification, you must create Classification Rules.
To create a classification rule, select the Classification Rules node in the FSRM console and click Create a New Rule in the Action pane on the right-hand side of the console. There are two sections to complete in a classification rule, and these are exposed via the tabs Rule Settings and Classification.
In Rule Settings you need to enter a name, and, as was discussed earlier, with the properties configuration, it's also good practice to enter a description, though it's not required. In addition, you have to enter a scope, which supports multiple entries but must be a volume or folder.
Once you've entered a scope, select the Classification tab, and then select a classification mechanism, which can be either Folder Classifier or Content Classifier. If you select Folder Classifier then all documents in that folder will be classified in the same way. If you select Content classifier then the documents' contents will be searched. As was discussed earlier, this mechanism relies on the IFilter text extraction process.
To specify search parameters, you first click on the Advanced button. However, before doing so, you should complete this tab by selecting a property name -- these are the properties that you created earlier -- and then select a property value. Once this is complete, you can select the Advanced button if you chose Content Classifier earlier. Then the Advanced dialog box will open, and you'll be able to enter the content-search parameters under the Additional Classification Parameters tab.
Parameters can consist of multiple entries; however, the content match will only occur if it matches all of entered criteria. The parameter must be one of three types: a string, a (.NET) regular expression or a string that's case-sensitive. When you enter criteria you must enter a name that matches the type, for example StringCaseSensitive or RegularExpression. You can also choose to check the Re-Evaluate Existing Property Values check box on the Evaluation Type tab. Checking this box will mean that files are re-evaluated in the event that the file or rule has changed.
Once you've defined the content-search parameters and decided on the evaluation type, the next step is to create the file-management tasks that will apply these rules.
File Management Tasks
File Management Tasks are another feature of Windows Server 2008 R2 that's accessed through FSRM. These tasks let you manage your files by running scripts, batching files and performing other processes on a pre-defined schedule. To create new tasks, select the File Management Tasks node on the left-hand side of the FSRM Management console. Next, click on Create File Management Task and then enter the task name, a description and a scope.
Next, select the Action tab and an action type of either File Expiration or Custom. The File Expiration task type name is slightly misleading because it actually only moves data and doesn't delete data as you might expect. If you select this type of task you must also select a target directory. If you select the Custom task type you have the flexibility of using your own scripts or batch files to process the file data.
After creating the script you complete the Action tab with information such as the path to the script and any arguments you wish to pass to the script. After completing the Actions tab you can then move onto configuring Notifications, which can be in the form of an e-mail, an event log entry or even calling another script. If you're using the Custom action, don't forget to change the default test in the notification tab section, as it specifically refers to file expiration.
Additionally, you have the ability to generate the notification before the action takes place. This can be particularly useful if data is being moved, expired or deleted. The next tab to complete is the Report tab, which allows you to send reports via e-mail. After completing the Report section you can move on to the Condition section. This section allows you to select files based on such things as file property -- for example, those set by FCI; the value of that property; days since creation or modification; or file names matching a wild card pattern you create.
Ease of use 20%
Key: 1: Virtually inoperable or nonexistent 5: Average, performs adequately 10: Exceptional
The final tab is Schedule, which lets you control when the task runs. For most organizations the classification process will generate measurable I/O on your storage subsystem. The amount of I/O is determined by the complexity and quantity of your content-classification rules, as well as how many files they affect. This means that, at present, there are no guidelines as to how to size storage to accommodate the additional I/O. As such, a best practice would be to run the task in off hours or during periods of quiet utilization until you understand the impact on your file servers.
After you've created properties and classification rules and configured the file management tasks, you're finished. Of course you should thoroughly test all of your properties and rules prior to deploying across the enterprise.
In the current regulatory environment, there's a tremendous incentive for companies to manage the risk associated with records management. Microsoft has begun to address these needs with records management and compliance capabilities built into products such as Exchange Server 2007 and Microsoft Office SharePoint Server 2007. However, there has been a glaring omission in terms of a native toolset to manage data on file servers.
With the release of Windows Server 2008 R2, Microsoft has provided a flexible and cost-effective toolset that begins to address these problems. The flexibility and extensibility of a policy-based toolset such as FCI will help many organizations tackle the burdens associated with an ever-increasing amount of data sitting on file servers. Moreover, given
Microsoft's strong partner ecosystem, you can expect that many third parties will strengthen and extend FCI by leveraging the APIs that Microsoft has provided.
File Classification Infrastructure
Part of Windows Server 2008 R2