Thursday, June 9, 2011

Steps for table replication:

Steps to setup Publication (Source):
1.       Open SSMS and connect to the instance.
2.       From the object explorer, go to Replication.
3.       Right click on Replication and select New -> Publication
4.       Publication wizard will open. Click next and select the database where the table(s) resides which needs to be replicated and click next.
5.       Select the type of publication which will satisfy the requirement and click next.
6.       You will be presented with the tables in the DB. Select the required Table(s) and click next.
7.       You will be presented with filter options. Select the required filters if you don’t want all the data to be replicated. You may skip if you don’t want to filter the data.
8.       In the next step you can specify how often to replicate. You can schedule to run daily at certain times or you can replicate as soon as new data enters the table.
9.       In the next steps you have to specify the security settings for SQL AGENT. Follow this carefully.
10.   At the end you will be provided with options to create the publication and generate script for publication.
11.   Select both so that you may reuse the script for later deployment to other environments.
Steps to setup Subscription (Target)
1.       Once the publication is created, the selected tables are ready for replication.
2.       Right click on Replication and select New ->Subscription. Wizard opens. Click Next.
3.       You will be presented with the DB where the source tables are (Publisher). Select the DB and click next.
4.       You will be presented with two options. Select the appropriate option according to your needs and click next.
5.       In this step you will have to select the target Instance and the target DB and click next.
6.       Select the security options for the SQL AGENT and click next.
7.       Specify how often the job should run and click next.
8.       Specify how the initialization would be and click next.
9.       Click both the options and click next. You will be prompted to save the file. Specify save location and click finish.
10.   Table is automatically created in the target DB.

You have successfully replicated the table. For a test, add new data in the publisher table and do a select on the Subscription table. Depending on the type of snapshot and how often the SQL AGENT job runs, the data will be replicated. 

Table replication in SQL Server 2008/R2

Replication comes in handy when you need other environments to be in sync with your production environment. Generally, Database replication is often used. But, what if, if you need only some tables in a particularDB in your dev environment needs to be synced with Prod environment? The answer is just replicate the required tables. You can replicate tables from one DB to another DB in the same Server/instance or you can from One server to another server. Once the tables are replicated, new field [msrepl_tran_version] will be added to the end of the table. You may hide this to external users by creating a view. In replication terminology, we call Publisher and Subscriber. Publisher is the mainDB that has the tables. Subscriber is the target. Once you subscribe to a publisher for a particular table(s), you will see them in the Subscriber (the target DB). You can also specify the replication strategy whether the replication is transactional/snapshot etc...

Thursday, September 16, 2010

Stored Procedure best practices

There is a nice presentation that was found on the web published by Kevin Kline. The actual topic starts from slide 9 / page 10. This indeed will be very helpful. The performance of the procedures increases significantly.

http://www.quest.com/sql_server/pdfs/Stored-Procedure-BP.pdf

Wednesday, September 15, 2010

Optimizing SSIS packages

When I was working for a client, I faced a serious problem with SSIS package. The SSIS package is simple, extract data from csv file and two databases and load the data into a single table. I used a flat file source, two OLEDB Sources, a data conversion and one OLEDB destination. Once the package was created in the dev, I executed it and it took whooping 5 hours to execute. I was scared of the time it took. We were supposed to schedule a job with this package that executes every 30 mins. So, I started digging into the package. There were several fields from the source which were not used. Instead of using a table in the source, I wrote a SQL command that selects only the desired fields. In the destination mappings were updated. I re ran the package and it did not make a significant impact on the execution time. I was confused of what to do. Went into the destination table and checked the schema. Wow, its the index that caused all the problems. There is one clustered and three non clustered indexes on the table. I disabled the indexes and re ran the package. Its amazing the package executed in 15 minutes. So, I added few steps in the package where before the dataflow begins, the indexes are disabled and once the dataflow executes, the indexes are rebuilt. This way the performance was amazing and the job is scheduled to run the package every 30 mins. These are the few steps which would be helpful to optimize the SSIS package.

1. While extracting the data from source use query to pull only the required fields
2. If there are indexes on the destination table, disable them and enable the indexes once the data load is complete.
3. If data is extracted and loaded in the same server, use SQL server Destination instead of OLEDB Destination.

There are some other optimization techniques but, they depend on the environment and the kind of package thats being developed. Since SSIS is mostly about data, pull and push be optimized first.

The magic of number 9

The number 9 plays a significant role in many of our lives. 9 is considered lucky by many people who believe in numerology. But, there is a magic that 9 has. The sum of 9s will be a 9. For instance consider 99. Add 9 and 9 which is 18 and again add 1 and 8, the result is 9. No matter how many 9 are taken, the sum of all those 9s will be a 9. 9 is the only number which has this significance.

Tuesday, September 14, 2010

Performance hit with Indexes

One main reason for the index creation on field(s) of a table is to fetch the results quickly. There is a performance hit because of the index. This happens while loading the data (inserting the data). Sometimes, the data insert would take quite a long time if an index exists on a table. To over come this issue, disable the indexes on a table while inserting the data. Once the insert is done, re enable the index.

The following is the syntax to disable the index

ALTER INDEX [IndexName] ON Database.Schema.tablename DISABLE

The following is the syntax to enable the index

ALTER INDEX [IndexName] ON Database.Schema.tablename REBUILD

The main reasons for performance hit while inserting data on a table with indexes are

1. If a table has a clustered index, each and every record inserted has to be sorted. If the table has less records, it will not be an issue. If the table has huge amounts of data, imagine sorting of the whole table for each and every record insert (takes significant amount of time and resources).

Optimizing Complex T-SQL queries / stored procedures

Generally when asked about optimizing T-SQL queries, people say to do this and that. But, after working extensively on SQL Server, I found that there are no fixed rules set forth that would work in all the senarios. The optimization technique that is used in one query may reduce the performance when applied in another query resulting in longer execution time. But, there is a way in SQL Server to identify and apply optimization technique. Its called the "EXECUTION PLAN" which is an option available in query analyzer.

The following steps will result in identying better optimization rule.

1. Take the query / stored procedure that is taking too long to execute.
2. In the Query Analyzer click on the show estimated execution plan.
3. A graphical representation will be shown how the compiler executes the code section and an estimated cost will be shown. The general rule of thumb is to confine this cost below 20%.
4. If there are table scans, create non-clustered indexes so that the execution will be faster.
5. Make sure that the fields that are used in joins have indexes on them.
6. Avoid sub - queries. Instead use joins (this may not be ideal in all situations the vice versa may work).
7. If a stored procedure needs to be optimized, see for the temp tables. Temp tables use lot of memory and disk storage should be kept in mind. If the temp table is used in the very next piece of code and no where else later, try replacing the temp table with a Common Table Expression (CTE).
8. CTEs scope is very limited to the next statement. But one advantage is CTE stores the data in the RAM and gives the data on the fly.

These are some of the steps that can be followed. But, again the optimization completely depends on the kind of query and environment. Also, there are some additional steps that can be followed while optimizing a stored procedure. The following are the steps that can be looked at.

1. Use SET NOCOUNT to ON so that the number  of records affected will not be returned.
2. Generally system stored procedures that exist in Master database starts with sp_ProcName. Avoid a prefix sp while creating stored procedures in databases other than master because SQL Server initially scans master database for the stored procedures starting with sp. If a procedure is created with sp_proc in some user's database and the same procedure exists in master, the user proc will not execute.