Lookup Before You Cache
The Cache Transformation allows you to build up your lookup cache before you need it, saving you a step or two in the process.
- By Eric Johnson
The Lookup transformation in SSIS has changed a lot in SQL Server 2008. One of the best new features is the ability to pre-build your cache, which gives you a lot of control over what is cached and how the cached data is managed. The basic lookup offers additional cache features such as Full, Partial, or even No Cache, but the real power comes in the new Cache Transformation.
The Cache Transformation uses the new Cache Connection manager to allow you to build your cache before it is needed. The connection manager and the Cache Transformation are pretty simple. For the Cache Manager Connection, you just need to specify the columns that will be stored and which columns will be indexed (see Fig. 1). The indexed columns will be the only columns you can use as a lookup column in a Lookup Transformation.
Once you've configured the connection manager, you just use a Cache Transformation to pump data into the cache. The Cache Transformation is very simple; you specify a Cache Connection manager and set up the mapping of the incoming data to the columns in the cache. Last but not least, you pull data from your source and send it into the Cache Transformation. Fig. 2 shows a very simple example data flow chart showing how to populate a lookup cache; this would be the example T-SQL code:
SELECT ContactID, EmailAddress
WHERE EmailAddress IS NOT NULL
AND EmailPromotion <> 0
This code allows me to populate my cache with all e-mail addresses that are not blank, where the person has not opted out of the e-mail promotion. Remember, the data flow to populate your cache can be as complex as you want. This can really allow you to pare down large sets of data to a smaller, easier to use data set that is appropriate for caching.
Once you have populated your cache, all you have left to do is use it in a Lookup Transformation. On the first page of your Lookup Transformation properties, specify a Connection Type of Cache Manager and on the connection page, specify your Cache Connection Manager object. Everything else is just like using the Lookup with a regular data source. You will need to set up your mappings and the column (or columns) that you want to return.
|Figure 1. Specify which columns will be cached and indexed via the Cache Transformation dialog.
|Figure 2. How to populate a lookup with the cached data.
One other note: When using a Cache Connection Manager with a lookup, you have to select Full Cache as the cache mode. The other two are disabled and for good reason. You have taken control of your cache at this point and don't want SSIS doing anything further.
Managing your own lookup cache should allow you to make your SSIS packages a little more streamlined and allow them to run a bit faster. Not to mention, you should be able to minimize the memory-hogging packages that occurred when older lookups got a little out of hand.
Eric Johnson, SQL Server MVP, is the owner of Consortio Services in Colorado Springs providing IT systems management and technology
consulting. He is also the President of the Colorado Springs SQL Server
User Group. He can be contacted at www.consortioservices.com.