Spittin' Bits: September 2011

It's midnight on a Wednesday and I can't seem to get to sleep. So what do I want to do? Code.

So let us engage in a little SQL exercise. Suppose that we need to copy data from one table to another. A task like this, on its own, can be tackled with a single INSERT...SELECT query. For example:

insert into Employee(firstName, lastName,...,postalCode)
select firstName, lastName,...,postalCode
from EmployeeImport

For the sake of discussion, we'll refer to EmployeesImport as the table we need to copy rows from and Employees as their destination. Imagine that, instead of just copying the rows from EmployeeImport into Employees, we also need to generate a special hash value generated from the content of each row that's inserted into the Employees table. Top that off with the fact that this special index value can only be generated via a stored procedure and, voilà, you've now got an interesting situation on your hands. We can't use the simple INSERT...SELECT query like we did previously because. as we all well know, you can't execute stored procedures inline with SELECT queries. That means that we are going to need to iterate through each row of the EmployeesImport table and insert them one at a time into the Employees table. Here's one way I've seen this task accomplished:

declare @firstName varchar(50),
	@lastName varchar(50),
	@birthDt datetime,
	@addressLine1 varchar(100),
	@city varchar(80),
	@province varchar(80),
	@country varchar(80),
	@postalCode varchar(10),
	@hashKey char(10)


declare @err bit
set @err=0

begin transaction importEmployees
	-- Setup the cursor
	declare cur_employee cursor for
		select * from EmployeesImport
	open cur_employee
	fetch next from cur_employee
		into @firstName,@lastName,@birthDt,@addressLine1,@city,@province,@country,@postalCode
	-- Main cursor loop
	while(@@FETCH_STATUS=0) begin
		-- Generate the hash
		exec GetEmployeeHash
			@hashKey output
		,	@lastName
		,	@firstName
		,	@postalCode
		,	@birthDt
		-- Insert record into Employees table
		insert into Employees(EmployeeID,FirstName,LastName,BirthDt,AddressLine1,City,Province,Country,PostalCode)
			select @hashKey,@firstName,@lastName,@birthDt,@addressLine1,@city,@province,@country,@postalCode
		if(@@ROWCOUNT=0) begin set @err=1 goto RollBackTransaction end
		-- Retrieve the next record using the cursor
		fetch next from cur_employee
			into @firstName,@lastName,@birthDt,@addressLine1,@city,@province,@country,@postalCode	
	end
	-- Cleanup
	close cur_employee
	deallocate cur_employee
	
if(@err=0) commit transaction importEmployees
RollBackTransaction:
if(@err>1) rollback transaction importEmployees

Now there are a number of issues that I have with this code. For one, lines 21-22 show that the cur_employee cursor is fetching rows directly from the EmployeesImport table and not from a temporary table. This means that it will create locks against the table that can cause blocking issues for other queries that might be working against the same table. This can lead to performance degradation in high-volume environments. Not only that, but developers using cursors must always be mindful to de-allocate and close any they've declare to avoid memory leaks. Personally, I prefer to avoid cursors at all costs as I've found them to be slow, tedious, and high on overhead.

So how would I solve the problem? Like this:

declare @err bit
set @err=0

declare @EmployeesTemp as table (
	RowID int identity(1,1),
	FirstName varchar(50),
	LastName varchar(50),
	BirthDt datetime,
	AddressLine1 varchar(100),
	City varchar(80),
	province varchar(80),
	country varchar(80),
	postalCode varchar(10)
)

insert into @EmployeesTemp(FirstName,LastName,BirthDt,AddressLine1,City,Province,Country,PostalCode)
select * from EmployeesImport

declare @firstName varchar(50),
	@lastName varchar(50),
	@birthDt datetime,
	@addressLine1 varchar(100),
	@city varchar(80),
	@province varchar(80),
	@country varchar(80),
	@postalCode varchar(10),
	@hashKey char(10)
	
declare @total int
declare @rowID int
select @total=count(*) from @EmployeesTemp
set @rowID=1 

begin transaction importEmployees

	while (@rowID<=@total) begin
		select
			@firstName=FirstName
		,	@lastName=LastName
		,	@birthDt=birthDt
		,	@addressLine1=AddressLine1
		,	@city=City
		,	@province=Province
		,	@country=Country
		,	@postalCode=PostalCode
		from @EmployeesTemp
		where RowID=@rowID
		-- Generate the hash
		exec GetEmployeeHash
			@hashKey output
		,	@lastName
		,	@firstName
		,	@birthDt
		,	@postalCode
		-- Insert record into Employee table
		insert into Employees(EmployeeID,FirstName,LastName,BirthDt,AddressLine1,City,Province,Country,PostalCode)
			select @hashKey,@firstName,@lastName,@birthDt,@addressLine1,@city,@province,@country,@postalCode
		if(@@ROWCOUNT=0) begin set @err=1 goto RollBackTransaction end
		-- Increment the loop counter
		set @rowID=@rowID+1
	end

if(@err=0) commit transaction importEmployees
RollBackTransaction:
if(@err>1) rollback transaction importEmployees

My solution presents a number of improvements over the previous. Instead of repeatedly querying against the source table, the needed data is cached in a table variable. I love table variables because, unlike temp tables, they are scoped to the execution of the stored procedure and are disposed automatically once execution is complete. While I'm using the table variable, I also don't have to worry about locks against the source table. Temp tables also work for the same purpose (for more on the difference between temp tables and table variables, see this excellent Stack Overflow post)

Another cool feature of my solution is that I don't need to maintain cursors in order to move sequentially though my data set. Because I used an identity column in the temp table, I can just increment my counter so that I know the next rowID I need to query. Overall, I find it much simpler to use table variables effectively than to remember the complexities of managing cursors correctly.

On the whole, 95% of the work that you'll ever need to do in SQL can and should be done using set-based queries. That's the whole point of using relational databases in the first place. But for the small minority of tasks that require row-by-row sequential processing, table variables are definitely the better alternative to cursors.

For a great article on cursor performance, check out the article "Performance Tuning SQL Server Cursors" by Brad McGee.

Spittin' Bits

Thursday, September 29, 2011

Gettin' Loopy Wtih Table Variables

Blog Archive

Labels