TOP 1 WITH TIES: August 2004

Tuesday, August 24, 2004

Difference between Table Variable and Temporary Table

Table variables were introduced in SQL Server 2000. Many times they are presented as a replacement for temporary tables. There is a lot of arguement about the performance advantages of one over the other. It is generally believed that table variables give better performance than temporary tables. But there are so many cases reported where the usage of table variable slow down the performance, especially when the number of rows involved are high. So now the general recommendation is that use a table variable only if there is a performance gain over the temp table.

Lets see what are the basic difference between them.

1. Contrary to the popular belief that table variables exists in memory, KB Article 305977 states that "A table variable is not a memory-only structure. Because a table variable might hold more data than can fit in memory, it has to have a place on disk to store data. Table variables are created in the tempdb database similar to temporary tables. If memory is available, both table variables and temporary tables are created and processed while in memory (data cache)."

The implementation of table variables is from a logging and tempdb usage very similar to temporary tables.

2. You can create indexes on a temp table. Table variables does not support explicit index creation. You can create indexes on table variables using the PRIMARY KEY and Unique constraints.

3. User defined functions are not permitted to do DDL operations. But UDFs can CREATE, INSERT, UPDATE, DELETE and RETURN table variables. (But you cannot use user-defined data types in the table variable you declare for the return value.)

4. Table variables are always local to its scope while Temporary tables can be global or local. So table variables cannot be accessed from calling procedures in case of nested sps.

5. Stored Procedure Recompilations. Using table variables can reduce the number of stored procedure recompilations. thats one place where table variables excel over temp tables.

6. Table variables do not maintain statistics like temporary tables. This might lead to bad execution plan and slow performance when dealing with higher number of rows.

7. Temp tables can be ALTERed. But you cannot change the definition of a table variable after the initial DECLARE statement.

8. Parallelism is not possible when you insert into a table variable.

9. You cannot do

INSERT INTO @tablevar
EXEC yourspname

But

INSERT INTO #temptable
EXEC yourspname

is possible.

10. You cannot do

SELECT * INTO @tableVar FROM source

But

SELECT * INTO #temptable FROM source

is possible.

11. You cannot truncate a table variable.

DECLARE @x table(a int)
INSERT INTO @x VALUES(1)
TRUNCATE TABLE @x

produces a syntax Error.

But the same is possible with temp table.

DECLARE #x table(a int)
INSERT INTO #x VALUES(1)
TRUNCATE TABLE #x

12. Transactions involving table variables last only for the duration of an update on the table variable. Thus, table variables require less locking and logging resources. That is table variables are automatically committed, where as temporary tables can be included in a transaction. This can make a performance difference.

13. Join Syntax. You MUST use an alias when referring a table variable in a JOIN. The following query will throw an error.

USE PUBS
GO
DECLARE @x table(someid varchar(15))
INSERT INTO @x
SELECT Top 10 title_id FROM Titles
SELECT * FROM Titles
INNER JOIN @x
ON Titles.title_id = @x.someid

It should be written as

DECLARE @x table(someid varchar(15))
INSERT INTO @x
SELECT Top 10 title_id FROM Titles
SELECT * FROM Titles
INNER JOIN @x a
ON Titles.title_id = a.someid

Temporary tables doesnt have such limitations.

14. Temporary tables can be explicitly destroyed using DROP statement.

15. You cannot assign a table variable to another as in the case of other variables.

16. CHECK constraints, DEFAULT values, and computed columns in the table type declaration cannot call user-defined functions.

17. "Display Estimated Execution Plan" option in query analyser does'nt work with T-SQL code that has a reference to Temporary tables.

18. More, anyone...?

# posted by Roji. P. Thomas @ 8/24/2004 03:06:00 AM 11 comments

Wednesday, August 18, 2004

Let Us Count them!

Lets have a quick look on the differences between using COUNT(*) and COUNT(colname). Also discuss other ways to get the count of rows.

1. COUNT(*) returns the number of total rows. Count(colname) returns the total number of rows with Non NULL value for colname. If the specified column doesnt allow NULLs the result will be same.

eg. (USE PUBS)

SELECT count(*) FROM Titles
-- Returns 18

SELECT count(title_id) FROM Titles
-- Returns 18

SELECT count(royalty) FROM Titles
-- Returns 16

When you are doing Count on a column with NULLs SQL server will give you the warning. "Warning: Null value is eliminated by an aggregate or other SET operation."

As Count(colname) excludes columns with NULL values, the following query

SELECT count(royalty) FROM Titles where royalty IS NULL

will return zero. If you want to get the count of rows with NULL in a specific column, you should use,

SELECT count(*) FROM Titles where royalty IS NULL

which returns two in this case.

2. COUNT(colname) supports the use of ALL/DISTINCT clause (ALL is the default). COUNT(*) doesn't. So to get the UNIQUE Royalty values, you can use the query

SELECT COUNT(DISTINCT royalty) FROM Titles

You can combine these to see the difference.

SELECT COUNT(*), COUNT(royalty), COUNT(DISTINCT royalty) FROM Titles

3. As COUNT(*) returns the total number of records (in each group when GROUP BY is present in the query), like SELECT @@ROWCOUNT, SELECT COUNT(*) is also a valid statement in its own, which always returns 1.

SELECT COUNT(*)
GO

4. Performance. (with due credits to Itzik Ben Gan) If the specified column name does not allow NULLS, there isn't any performance difference between using COUNT(*) or COUNT(colname). The Optimizer will choose the smallest index available to get the number of records.

SELECT COUNT(*) FROM Titles
---------------------------------------------------------------------
--Compute Scalar(DEFINE:([Expr1002]=Convert([Expr1003])))
--Stream Aggregate(DEFINE:([Expr1003]=Count(*)))
--Index Scan(OBJECT:([pubs].[dbo].[titles].[titleind]))

SELECT Count(type) FROM Titles
---------------------------------------------------------------------
--Compute Scalar(DEFINE:([Expr1002]=Convert([Expr1003])))
--Stream Aggregate(DEFINE:([Expr1003]=Count(*)))
--Index Scan(OBJECT:([pubs].[dbo].[titles].[titleind]))

As the type column doesnt allow NULLS, both the counts return the same number and SQL Server uses the smallest index available.

If you issue the COUNT(colname) on a column that allows NULL, the execution plan might be different, as in

SELECT count(notes) FROM Titles
---------------------------------------------------------------------------------------
--Compute Scalar(DEFINE:([Expr1002]=Convert([Expr1003])))
--Stream Aggregate(DEFINE:([Expr1003]=COUNT_BIG([titles].[notes])))
--Clustered Index Scan(OBJECT:([pubs].[dbo].[titles].[UPKCL_titleidind]))

In this case, SQL Server will use a nonclustered index, if one is defined on the specified column. Otherwise a clustered index scan(as in the above case) or a table scan will be performed.

Here you can also notice that SQL Server internally replace the COUNT() function with COUNT_BIG() function. COUNT_BIG works like the COUNT function. The only difference between them is their return values: COUNT_BIG always returns a bigint data type value. COUNT always returns an int data type value.

5. Conditional Count
Using the CASE Expression, you can get count of items conditionally. See the following example.

Use NothWind
Go
SELECT COUNT(*) as ProductCount,
COUNT(CASE WHEN DisContinued = 0 Then 1 END) As Available,
COUNT(CASE WHEN DisContinued = 1 Then 1 END) As Discontinued
FROM Products

6. Getting Count from multiple tables
Here is two methods to get the count of rows from more than one table.

SELECT SUM(cnt) as EmployeesandCustomers FROM
(SELECT COUNT(*) cnt FROM Employees
UNION
SELECT COUNT(*) cnt FROM Customers) T

SELECT (SELECT COUNT(*) cnt FROM Employees)+
(SELECT COUNT(*) cnt FROM Customers) as EmployeesandCustomers

7. Count('AnythingGoesHere')
The parameter for the count function can be a column name, number, literal or an expression. The BOL says that "expression can be of any type except uniqueidentifier, text, image, or ntext. Aggregate functions and subqueries are not permitted."

SELECT COUNT(1) or SELECT COUNT('abc') is equivalent to SELECT COUNT(*)

8. Alternatives.
If you don't require an exact answer, it isn't necessary use a SELECT count(*) query on the rows in a table to get the row count, because it might be a time consuming process to get the count of rows from a table with millions of rows. SQL Server keeps the row count in sysindexes and it can be retrieved there. But it may not always reflect the exact number of records.

See below.

SELECT rows as linecount
FROM sysindexes
WHERE id = OBJECT_ID('Titles') AND indid <> 2

If you have a tip on count, plz post a comment.

# posted by Roji. P. Thomas @ 8/18/2004 09:26:00 PM 25 comments

Monday, August 02, 2004

Difference Between SET and SELECT

T-SQL Programmers use both SET and SELECT Statements for variable assignments. Lets see what are the differences.

1. SET is the ANSI Standard for variable assignment. Variable assignment using SELECT is T-SQL Proprietory.

2. SELECT can be used for multiple variable assignment. Using SET you can only assign one variable at a time.

DECLARE @myval1 int
DECLARE @myval2 int

SET @myVal1 = 5
SET @myVal2 = 6

SELECT @myVal1 = 5, @myVal2 = 6

3. Different Behavior when No rows are Returned.
If variables are assigned in the SELECT then the value of the variables will remain unchanged, but thats not the case with SET. See the following example.

DECLARE @id INTEGER
SET @id = 0
SELECT @id = id FROM sysobjects WHERE name = 'does not exist'
SELECT @id -- Returns Zero

SET @id= (SELECT id FROM sysobjects WHERE name = 'does not exist')
SELECT @id -- Returns NULL

But if the SELECT includes an aggregate function it will NOT retain the initial value.

DECLARE @id INTEGER
SET @id = 0
SELECT @id = Max(id) FROM sysobjects WHERE name = 'does not exist'
SELECT @id -- Returns NULL

4. Different Behavior when Multiple rows are returned by the expression in the right side.

When using SET, it returns the 'Subquery Returned more than one value' Error.

DECLARE @id INTEGER
SET @id = 0

SET @id= (SELECT id FROM sysobjects)
-- Returns Error

But with SELECT it assigns the 'last' returned value to the variable, whic can be anything, if an ORDER BY clause is not specified, as in -

DECLARE @id INTEGER
SET @id = 0
SELECT @id = id FROM sysobjects
SELECT @id

So you have to be careful, to avoid disatsters. Lets take an example.
Suppose you want to assign the orderid of the Last order placed by a given customer, to a variable.
(There might be better way to do that with an aggregate function, but this is just an example)

USE Northwind
GO

DECLARE @OrderId int
SELECT @OrderId = OrderId
FROM Orders
WHERE CustomerId = 'VINET'
ORDER BY OrderId ASC

SELECT @OrderId

Here we have used ORDER BY OrderId ASC. So the last OrderID of the customer will be assigned to the variable @OrderId. But suppose you add TOP 1 to the above query, thinking that you only want one OrderId anyway, then you will be getting only the first order of the customer.

USE Northwind
GO

DECLARE @OrderId int
SELECT TOP 1 @OrderId = OrderId
FROM Orders
WHERE CustomerId = 'VINET'
ORDER BY OrderId ASC

SELECT @OrderId

To get the expected result you have to change the ORDER BY clause to DESC.

USE Northwind
GO

DECLARE @OrderId int
SELECT TOP 1 @OrderId = OrderId
FROM Orders
WHERE CustomerId = 'VINET'
ORDER BY OrderId DESC

SELECT @OrderId

5. Looping Behavior of SELECT
Further exploring the above point, we can introduce some intuitive looping behaviour using SELECT. Consider the following example.

CREATE TABLE #tmp (a int)
INSERT INTO #tmp VALUES(1)
INSERT INTO #tmp VALUES(2)
INSERT INTO #tmp VALUES(3)
INSERT INTO #tmp VALUES(4)
INSERT INTO #tmp VALUES(5)

DECLARE @p int
SELECT @p = isnull(@p, 1) * a FROM #tmp
SELECT @p

Here @p become 120, the product of all values of a. We can observe that the assignment is carried out once for each row of in the table #tmp.

The same logic is used in the varchar concatenation technique shown below.

USE PUBS
GO

DECLARE @vchAuthors VARCHAR(400)
SET @vchAuthors = ''
SELECT @vchAuthors = @vchAuthors +
CASE WHEN @vchAuthors = '' THEN '' ELSE ', ' END + au_lname
FROM authors

SELECT @vchAuthors

Thats all I have about it. SQL Server MVP Narayana Vyas has an article abt the differences between SET and SELECT, which also includes a performance comparison between the two.

# posted by Roji. P. Thomas @ 8/02/2004 02:40:00 PM 3 comments

Sunday, August 01, 2004

Differences between COALESCE and ISNULL

I am trying to document the difference between some of the Keywords/functions, that are used for the same purpose and has some differences.

Let me start with COALESCE and ISNULL today. What comes first to my mind is COALESCE is hard to spell ;)

Lets see what are the relevant differences.

1. COALESCE is ANSI standard and ISNULL is T-SQL proprietary.

2. You can work with only one value at a time with ISNULL, but COALESCE can deal with multiple values. as in

SELECT ISNULL(NULL, 'SomeValue')
GO
SELECT COALESCE(NULL, NULL, NULL, 'SomeValue')
GO

3. With ISNULL, the Alternate value you specify is limited to the length of the first parameter. In case of COALESCE, you dont have such restrictions.

See the example below.

DECLARE @somestring CHAR(4)
SET @somestring = NULL

SELECT ISNULL(@somestring, 'Roji Thomas')
--Returns 'Roji'

SELECT COALESCE(@somestring, 'Roji Thomas')
--Returns 'Roji Thomas'

4. When using COALESCE All expressions must be of the same type or must be implicitly convertible to the same type. If you are not careful, this can give you incorrect results. See the examples below.

SELECT COALESCE(NULL, GetDate())
--Returns : 2004-07-26 13:47:15.937
--Correct

SELECT COALESCE(1, GetDate())
--Returns 1900-01-02 00:00:00.000
--Incorrect Value

SELECT COALESCE(1,'abcd')
--Returns 1
SELECT COALESCE('abcd',1)
--Returns Error
-- Server: Msg 245, Level 16, State 1, Line 1
-- Syntax error converting the varchar value 'abcd' to a column of data type int.

SELECT COALESCE(NULL,435,'abcd', CURRENT_TIMESTAMP,'xyzzz',435)
--Returns 1901-03-12 00:00:00.000
--Incorrect

SELECT COALESCE(NULL,GetDate(),9999999999)
--Returns : 2004-07-26 13:47:15.937
--Correct

SELECT COALESCE(NULL,9999999999, GetDate())
--Returs Error
-- Server: Msg 8115, Level 16, State 2, Line 1
-- Arithmetic overflow error converting expression to data type datetime.

5. Eventhough COALESCE is generally the preferred way, there are some performance consideration especially when you have a select statement as an arguument to COALESCE. So always analyse the execution plan. You can read further abt it here.

Here is two interesting Performance comparison between COALESCE and ISNULL.

http://blogs.x2line.com/al/archive/2004/03/01/189.aspx

http://weblogs.sqlteam.com/mladenp/articles/2937.aspx

Feel Free to postback your comments.

# posted by Roji. P. Thomas @ 8/01/2004 01:22:00 PM 8 comments

TOP 1 WITH TIES