Wednesday, September 15, 2010

Optimizing SSIS packages

When I was working for a client, I faced a serious problem with SSIS package. The SSIS package is simple, extract data from csv file and two databases and load the data into a single table. I used a flat file source, two OLEDB Sources, a data conversion and one OLEDB destination. Once the package was created in the dev, I executed it and it took whooping 5 hours to execute. I was scared of the time it took. We were supposed to schedule a job with this package that executes every 30 mins. So, I started digging into the package. There were several fields from the source which were not used. Instead of using a table in the source, I wrote a SQL command that selects only the desired fields. In the destination mappings were updated. I re ran the package and it did not make a significant impact on the execution time. I was confused of what to do. Went into the destination table and checked the schema. Wow, its the index that caused all the problems. There is one clustered and three non clustered indexes on the table. I disabled the indexes and re ran the package. Its amazing the package executed in 15 minutes. So, I added few steps in the package where before the dataflow begins, the indexes are disabled and once the dataflow executes, the indexes are rebuilt. This way the performance was amazing and the job is scheduled to run the package every 30 mins. These are the few steps which would be helpful to optimize the SSIS package.

1. While extracting the data from source use query to pull only the required fields
2. If there are indexes on the destination table, disable them and enable the indexes once the data load is complete.
3. If data is extracted and loaded in the same server, use SQL server Destination instead of OLEDB Destination.

There are some other optimization techniques but, they depend on the environment and the kind of package thats being developed. Since SSIS is mostly about data, pull and push be optimized first.

No comments: