The Trouble with SSIS Sorting
Sort it all out, but sort it the right or there'll be trouble, of a sort.
- By Eric Johnson
Many SSIS transformations, such as the Pivot and the Data Profiling Tasks, require that the data being fed into them be sorted. Without first sorting, some transformations will throw an error and not run, while others will run but the results will not be as expected.
The answer is to sort the data before it is fed into the transformation; the problem is that this isn't as simple as you may think.
The obvious and simple way to get data that's sorted correctly is to use the Sort Transformation in SSIS. The odd thing is that the Sort Transformation will complete the objective of sorting the data, but the performance might not be exactly what you would expect. The Sort Transformation does all its work in memory; SSIS packages do things in individual sets. This means your data may need to be sorted and re-sorted again. If you try to work with large data sets, this sorting will take a long time and use a lot of memory.
The solution is to sort the data from the source using an ORDER BY clause. That alone, however, won't get it done. You also have to tell SSIS that the data is sorted and how it is sorted. This is done in the Advanced Editor of your data source.
First, set the IsSorted property of the source's output to true. Next, set the SortKeyPosition property of each column you specified in your ORDER BY clause; 1 for first, 2 for second, and so on. A SortKeyPosition value of 0 is the default and means that the column is not used in the sort.
That's all there is to it! Using this method, you will be able to sort your data so the SSIS transformations are happy, and can work without hours of waiting and memory hogging.
Eric Johnson, SQL Server MVP, is the owner of Consortio Services in Colorado Springs providing IT systems management and technology
consulting. He is also the President of the Colorado Springs SQL Server
User Group. He can be contacted at www.consortioservices.com.