The Trouble with SSIS Sorting

Sort it all out, but sort it the right or there'll be trouble, of a sort.

Many SSIS transformations, such as the Pivot and the Data Profiling Tasks, require that the data being fed into them be sorted. Without first sorting, some transformations will throw an error and not run, while others will run but the results will not be as expected.

The answer is to sort the data before it is fed into the transformation; the problem is that this isn't as simple as you may think.

The obvious and simple way to get data that's sorted correctly is to use the Sort Transformation in SSIS. The odd thing is that the Sort Transformation will complete the objective of sorting the data, but the performance might not be exactly what you would expect. The Sort Transformation does all its work in memory; SSIS packages do things in individual sets. This means your data may need to be sorted and re-sorted again. If you try to work with large data sets, this sorting will take a long time and use a lot of memory.

The solution is to sort the data from the source using an ORDER BY clause. That alone, however, won't get it done. You also have to tell SSIS that the data is sorted and how it is sorted. This is done in the Advanced Editor of your data source.

First, set the IsSorted property of the source's output to true. Next, set the SortKeyPosition property of each column you specified in your ORDER BY clause; 1 for first, 2 for second, and so on. A SortKeyPosition value of 0 is the default and means that the column is not used in the sort.

That's all there is to it! Using this method, you will be able to sort your data so the SSIS transformations are happy, and can work without hours of waiting and memory hogging.

About the Author

Eric Johnson, SQL Server MVP, is the owner of Consortio Services in Colorado Springs providing IT systems management and technology consulting. He is also the President of the Colorado Springs SQL Server User Group. He can be contacted at www.consortioservices.com.

Featured

  • Microsoft Starting To Roll Out New Excel Connected Data Types

    Microsoft on Thursday announced some Excel and Power BI enhancements that add "connected data types" on top of the standard strings and numbers options.

  • Windows 10 Users Getting New Process for Finding Optional Driver Updates

    Accessing Windows 10 drivers classified as "optional updates" will be more of a manual seek-and-install type of experience, starting on Nov. 5, 2020, Microsoft explained in a Wednesday announcement.

  • Microsoft Changes Privacy Platform Name to SmartNoise

    Microsoft Research has changed the name of its "differential privacy" platform from "WhiteNoise" to "SmartNoise," according to a Wednesday announcement.

  • Why Restarting a Failed SCVMM Job Might Be a Bad Idea

    Occasionally, restarting a failed System Center Virtual Machine Manager job can leave your virtualization infrastructure in an unknown state. Here's how to avoid that.

comments powered by Disqus