Friday, August 19, 2011

Implementing INTERSECT ALL

We know that SQL SERVER has a built in keywords for SET operations.

1.UNION
2.UNION ALL
3.INTERSECT
4.EXCEPT

Consider table A and B. If we use A UNION B, the result set is the records from both tables A and B with out repetition. i.e., duplicates are eliminated. If we want to include duplicates, we have UNION ALL. if we use A UNION ALL B, then the result set is all the records from both A and B.

In the same way, we have INTERSECT. This will pull out records that exist in both the tables A and B with out repetition(duplicates). What if? if we want to include duplicates also? There is no INTERSECT ALL operation. Still we can achieve this programmatically. However there is a rule for this. For example consider a record appears x times in table A and y times in table B. The INTERSECT ALL will display the record minimum(x,y) times. For example if a record exists 3 times table A and 4 times in table B, INTERSECT ALL will display the record 3 times because it is for sure a record INTERSECTS 3 times in both the tables. Here is how we can achieve this

;WITH INTERSECTALL
AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY [col1,col2,..coln] ORDER BY (SELECT 0)) AS RowNum
,[col1,col2,...coln]
FROM db.schema.tablename

INTERSECT

SELECT ROW_NUMBER() OVER(PARTITION BY [col1,col2,..coln] ORDER BY (SELECT 0)) AS RowNum
,[col1,col2,...coln]
FROM db.schema.tablename
)
SELECT [col1,col2,...coln]
FROM INTERSECTALL

In the above CTE, col1,col2,...coln are the columns that you want from table A and B. Again remember same number of columns should be selected in both the tables and the datatypes should be type compatible.
db is the name of the database
schema is the schema to which the table is associated
tablename is the name of the table.
SELECT(0) in the ORDER BY clause in ROW_NUMBER() tells SQL that Ordering is not important. You may use any constant.

Here I am calculating the ROW_NUMBER() to find the number of occurrences of each row in a table. Since ROW_NUMBER() shouldn't be returned in the final result, I have included the code in a CTE and I am pulling only the required fields. Here is an example. Consider you have two tables Employees and Customers and you want to show Country,Region and City fields that exist in both the tables with duplication (INTERSECT ALL. Remember the number repetitions is MIN(x,y)). The code would look like this

WITH INTERSECTALL
AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY country,region,city ORDER BY (SELECT 0)) AS RowNum
,country
,city
,region
FROM HR.Employees

INTERSECT

SELECT ROW_NUMBER() OVER(PARTITION BY country,region,city ORDER BY (SELECT 0)) AS RowNum
,country
,city
,region
FROM Sales.Customers
)
SELECT country
,city
,region
FROM INTERSECTALL


No comments: