Friday, September 19, 2014

Sunday, May 4, 2014

Copy master-detail data when destination master table has an 'identity' column in SQL Server

This is the one of good old tricks, which become possible in SQL Server 2008 with new MERGE operator.
I want to insert the set of records into 2 tables: master and detail, and master table has IDENTITY column as a primary key and detail table references this PK.
Define target tables as follows and put some data in them:
CREATE TABLE #destination_order (
	OrderId	INT NOT NULL IDENTITY(1,1) PRIMARY KEY CLUSTERED,
	OrderName VARCHAR(100) NOT NULL
)
CREATE TABLE #destination_orderitem (
	ItemId INT NOT NULL IDENTITY(1,1) PRIMARY KEY CLUSTERED,
	OrderId	INT NOT NULL,
	ItemName VARCHAR(100) NOT NULL
)
INSERT #destination_order (OrderName)
VALUES
	('Existing 1')
,	('Existing 2')
And here is the data we want to insert into the destination master and detail tables:
CREATE TABLE #source_order (
	OrderId	INT NOT NULL PRIMARY KEY CLUSTERED,
	OrderName VARCHAR(100) NOT NULL
)
CREATE TABLE #source_orderitem (
	ItemId INT NOT NULL IDENTITY(1,1) PRIMARY KEY CLUSTERED,
	OrderId	INT NOT NULL,
	ItemName VARCHAR(100) NOT NULL
)

INSERT #source_order (OrderId, OrderName)
VALUES
	(1, 'Source 1')
,	(2, 'Source 2')
,	(3, 'Source 3')
INSERT #source_orderitem (OrderId, ItemName)
VALUES
	(1, 'Source 1 Item 1')
,	(1, 'Source 1 Item 2')
,	(1, 'Source 1 Item 3')
,	(2, 'Source 2 Item 1')
,	(2, 'Source 2 Item 2')
,	(3, 'Source 3 Item 1')
So we want to insert data from #source_order and #source_orderitem tables into #destination_order and #destination_orderitem. The straightforward approach is to use cursor, insert master records one by one, get SCOPE_IDENTITY() and insert detail records for each master. But this can be done in more elegant and faster way.
Firstly, suppose we can insert all master records in one pass. The problem is that we cannot find out which new primary key #destination_order.Orderid corresponds to the #source_order.OrderId. The following construct is invalid, as OUTPUT statement can have only INSERTED pseudo-table fields when applied to INSERT operator:
INSERT #destination_order (OrderName)
OUTPUT #source_order.OrderId, INSERTED.OrderId 
SELECT OrderName
FROM #source_order
But, we can use MERGE which allow to specify source fields in the OUTPUT list, and we can simply emulate INSERT with MERGE:
DECLARE @ids TABLE (
	sourceOrderId INT NOT NULL,
	destinationOrderId INT NOT NULL)

;WITH source AS (
	SELECT OrderId, OrderName 
	FROM #source_order
)
MERGE #destination_order target
USING source ON 1=0
WHEN NOT MATCHED THEN
	INSERT (OrderName)
	VALUES (source.Ordername)
OUTPUT source.OrderId, INSERTED.OrderId 
INTO @ids (sourceOrderId, destinationOrderId)
;

INSERT #destination_orderitem (OrderId, ItemName)
SELECT ids.destinationOrderId, source.ItemName
FROM #source_orderitem source
INNER JOIN @ids ids ON ids.sourceOrderId = source.OrderId
This is simple and effective way and no cursor is required.

Saturday, April 26, 2014

SQL Server analytic functions: trap in LAST_VALUE

Analytic functions introduced in SQL Server 2012 appeared to be really awesome. LAG, LEAD, FIRST_VALUE and LAST_VALUE can simplify queries in particular when dealing with time series. However, there is a potential problem which I trapped into recently.

Here is the syntax of FIRST_VALUE and LAST_VALUE functions as by MSDN:

FIRST_VALUE ( [scalar_expression ] ) 
    OVER ( [ partition_by_clause ] order_by_clause [ rows_range_clause ] )

LAST_VALUE ( [scalar_expression ) 
    OVER ( [ partition_by_clause ] order_by_clause rows_range_clause )

Note that rows_range_clause is mandatory in LAST_VALUE and optional in FIRST_VALUE. However, rows_range_clause can be omitted in LAST_VALUE and SQL server will not report any syntax error and the result will be somewhat unexpected.
Example:

CREATE TABLE #transaction (
 [id] INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
 [date] DATE NOT NULL,
 [type] VARCHAR(10) NOT NULL,
 [amount] MONEY NOT NULL,
 [description] VARCHAR(100) NOT NULL
)

INSERT #transaction ([date], [type], [amount], [description])
VALUES
 ('2012-01-01', 'DEBIT', 100.0, 'First debit in Jan')
, ('2012-01-01', 'CREDIT', 120.0, 'First credit in Jan')
, ('2012-01-05', 'DEBIT', 77.10, 'Last debit in Jan')
, ('2012-01-08', 'CREDIT', 55.60, '')
, ('2012-01-20', 'CREDIT', 80.00, 'Last credit in Jan')
, ('2012-02-08', 'CREDIT', 155.60, 'First credit in Feb')
, ('2012-02-09', 'CREDIT', 200.80, '')
, ('2012-02-09', 'CREDIT', 14.12, '')
, ('2012-02-09', 'DEBIT', 148.00, 'First debit in Feb')
, ('2012-02-13', 'CREDIT', 66.00, 'Last credit in Feb')
, ('2012-02-14', 'DEBIT', 20.00, 'Last debit in Feb')
Selecting all transactions along with [description] of the first transaction of each type in each month:
SELECT
 [id], [date], [type], [amount],
 FIRST_VALUE([description]) OVER (
   PARTITION BY YEAR([date]), MONTH([date]), [type] 
   ORDER BY [date], [id]
   ) [description]
FROM #transaction
The result is as expected:
id date type amount description
1 2012-01-01 DEBIT 100,00 First debit in Jan
2 2012-01-01 CREDIT 120,00 First credit in Jan
3 2012-01-05 DEBIT 77,10 First debit in Jan
4 2012-01-08 CREDIT 55,60 First credit in Jan
5 2012-01-20 CREDIT 80,00 First credit in Jan
6 2012-02-08 CREDIT 155,60 First credit in Feb
7 2012-02-09 CREDIT 200,80 First credit in Feb
8 2012-02-09 CREDIT 14,12 First credit in Feb
9 2012-02-09 DEBIT 148,00 First debit in Feb
10 2012-02-13 CREDIT 66,00 First credit in Feb
11 2012-02-14 DEBIT 20,00 First debit in Feb
Now selecting all transactions along with [description] of the last transaction of each type in each month using FIRST_VALUE function by simly reversing the order in OVER clause:
SELECT DISTINCT
 [id], [date], [type], [amount],
 FIRST_VALUE([description]) OVER (
   PARTITION BY YEAR([date]), MONTH([date]), [type] 
   ORDER BY [date] DESC, [id] DESC
   ) [description]
FROM #transaction
and the result is as expected:
id date type amount description
1 2012-01-01 DEBIT 100,00 Last debit in Jan
2 2012-01-01 CREDIT 120,00 Last credit in Jan
3 2012-01-05 DEBIT 77,10 Last debit in Jan
4 2012-01-08 CREDIT 55,60 Last credit in Jan
5 2012-01-20 CREDIT 80,00 Last credit in Jan
6 2012-02-08 CREDIT 155,60 Last credit in Feb
7 2012-02-09 CREDIT 200,80 Last credit in Feb
8 2012-02-09 CREDIT 14,12 Last credit in Feb
9 2012-02-09 DEBIT 148,00 Last debit in Feb
10 2012-02-13 CREDIT 66,00 Last credit in Feb
11 2012-02-14 DEBIT 20,00 Last debit in Feb
Now selecting all transactions along with [description] of the last transaction of each type in each month using LAST_VALUE function, but we forget to set rows_range_clause as required by documentation:
SELECT
 [id], [date], [type], [amount],
 LAST_VALUE([description]) OVER (
   PARTITION BY YEAR([date]), MONTH([date]), [type] 
   ORDER BY [date], [id]
   ) [description]
FROM #transaction
And here is the trap. We are not getting the last value in the group:
id date type amount description
1 2012-01-01 DEBIT 100,00 First debit in Jan
2 2012-01-01 CREDIT 120,00 First credit in Jan
3 2012-01-05 DEBIT 77,10 Last debit in Jan
4 2012-01-08 CREDIT 55,60 
5 2012-01-20 CREDIT 80,00 Last credit in Jan
6 2012-02-08 CREDIT 155,60 First credit in Feb
7 2012-02-09 CREDIT 200,80 
8 2012-02-09 CREDIT 14,12 
9 2012-02-09 DEBIT 148,00 First debit in Feb
10 2012-02-13 CREDIT 66,00 Last credit in Feb
11 2012-02-14 DEBIT 20,00 Last debit in Feb
The reason for this is that SQL Server uses default rows_range_clause for LAST_VALUE function and it is:
"ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW", which gives the last value in the group up to current row, effectively returning description from the current row as rows are selected in the same order as specified in OVER clause. To fix this a rows_range_clause must be set to the full group via:
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
SELECT
 [id], [date], [type], [amount],
 LAST_VALUE([description]) OVER (
   PARTITION BY YEAR([date]), MONTH([date]), [type] 
   ORDER BY [date], [id] 
   ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
   ) [description]
FROM #transaction
And this is exactly what we want:
id date type amount description
1 2012-01-01 DEBIT 100,00 Last debit in Jan
2 2012-01-01 CREDIT 120,00 Last credit in Jan
3 2012-01-05 DEBIT 77,10 Last debit in Jan
4 2012-01-08 CREDIT 55,60 Last credit in Jan
5 2012-01-20 CREDIT 80,00 Last credit in Jan
6 2012-02-08 CREDIT 155,60 Last credit in Feb
7 2012-02-09 CREDIT 200,80 Last credit in Feb
8 2012-02-09 CREDIT 14,12 Last credit in Feb
9 2012-02-09 DEBIT 148,00 Last debit in Feb
10 2012-02-13 CREDIT 66,00 Last credit in Feb
11 2012-02-14 DEBIT 20,00 Last debit in Feb
In my view Microsoft had to enforce rows_range_clause in LAST_VALUE on a syntax level.