Tuesday, May 28, 2013

Investigating Plan Cache Bloat

SQL Server includes a DMV, sys.dm_exec_query_stats, that returns performance statistics for each query plan cached in memory.  However, it can also help give you insight into how consistent your developers are with writing code.

For this topic, we'll just concentrate on a few columns returned by the DMV: sql_handle and plan_handle.  Per Books Online, sql_handle is a value that refers to the batch or stored procedure that the query, and plan_handle is a value that refers to the compiled plan of that query.  For each query that is processed, SQL Server can generate one or more compiled plans for that query.  This one-to-many relationship can be caused by a number factors, but one simple reason can be coding inconsistency.

One simple coding difference that I often see is within the SET statements preceding a query.  If a developer executes the exact same query using different SET statements, then SQL Server will compile a separate plan for each one.

First, we need to clear the cache.

DBCC FREEPROCCACHE;
GO

Next run these two queries.

SET QUOTED_IDENTIFIER OFF;
GO
SELECT p.FirstName,p.LastName FROM Person.Person p
JOIN HumanResources.Employee e ON p.BusinessEntityID = e.BusinessEntityID
WHERE e.Gender = 'M';
GO

SET QUOTED_IDENTIFIER ON;
GO
SELECT p.FirstName,p.LastName FROM Person.Person p
JOIN HumanResources.Employee e ON p.BusinessEntityID = e.BusinessEntityID
WHERE e.Gender = 'M';
GO

As you can see, the only difference between the two queries is the value for SET QUOTED_IDENTIFIER. Now let's query the DMV.

SELECT s.text, q.sql_handle, q.plan_handle
FROM sys.dm_exec_query_stats q CROSS APPLY sys.dm_exec_sql_text(sql_handle) s;
GO



We can see that we have 2 rows returned, one for each query.  As you'll notice, the sql_handle is the same for each, but the plan_handle is different.  Next let's look at the graphical query plan of each.

SELECT * FROM sys.dm_exec_query_plan(0x0600050049DA7633D08998220000000001000000000000000000000000000000000000000000000000000000);
GO


SELECT * FROM sys.dm_exec_query_plan(0x0600050049DA7633908298220000000001000000000000000000000000000000000000000000000000000000);
GO


You will see the query plan is the same; however, SQL Server treats each one as if it were a completely distinct query.  If this were just a typo by the developer, then SQL Server just doubled the amount of plan cache needed for this query and wasted valuable resources.

Let's look at the same queries from another angle.  This time we'll remove the SET statements, but change the formatting of the queries.

First clear the plan cache.

DBCC FREEPROCCACHE;
GO

Next run these two queries.

SELECT p.FirstName,p.LastName FROM Person.Person p
JOIN HumanResources.Employee e ON p.BusinessEntityID = e.BusinessEntityID
WHERE e.Gender = 'M';
GO

SELECT
        p.FirstName
       ,p.LastName
FROM Person.Person p
JOIN HumanResources.Employee e ON p.BusinessEntityID = e.BusinessEntityID
WHERE e.Gender = 'M';
GO

Finally, look at the DMV.

SELECT s.text, q.sql_handle, q.plan_handle
FROM sys.dm_exec_query_stats q CROSS APPLY sys.dm_exec_sql_text(sql_handle) s;
GO


What you'll notice is SQL Server will still treat each query as if it were two completely different statements; however, the only difference is the formatting.

In these examples, we've covered how important it is for the developers to be consistent with all the code passed to SQL Server.  Just minor changes in the code will cause SQL Server to generate different query plans and lead to plan cache bloat and wasted resources.  As a DBA, these are some simple examples of feedback you should be providing to your development teams.  Be proactive and don't let them down!

Tuesday, May 7, 2013

An Alternative to SELECT COUNT(*) for Better Performance

Sometimes rapid code development doesn't always produce the most efficient code.  Take the age old line of code SELECT COUNT(*) FROM MyTable.  Obviously this will give you the row count for a table, but at what cost? Doing any SELECT * from a table will ultimately result in a table or clustered index scan.

USE AdventureWorksDW2012;
SELECT COUNT(*) FROM dbo.FactProductInventory;
GO


Turning on STATISTICS IO on reveals 5753 logical reads just to return the row count of 776286.

Table 'FactProductInventory'. Scan count 1, logical reads 5753, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Starting with SQL Server 2005, Microsoft introduced a DMV, sys.dm_db_partition_stats, that provides you with the same information at a fraction of the cost.  It requires a little more coding, but once you turn on STATISTICS IO, you will see the performance benefit.

USE AdventureWorksDW2012;
SELECT
        s.name AS 'SchemaName'
       ,o.name AS 'TableName'
       ,SUM(p.row_count) AS 'RowCount'
FROM sys.dm_db_partition_stats p
       JOIN sys.objects o ON o.object_id = p.object_id
       JOIN sys.schemas s ON o.schema_id = s.schema_id
WHERE p.index_id < 2 AND o.type = 'U'
       AND s.name = 'dbo'
       AND o.name = 'FactProductInventory'
GROUP BY s.name,o.name
ORDER BY s.name,o.name;
GO


Since we're querying a DMV, we never touch the base table.  We can see here we only need 16 logical reads to return the same row count of 776286, and the FactProductInventory table is nowhere in our execution plan.

Table 'sysidxstats'. Scan count 1, logical reads 10, physical reads 0, read-ahead reads 8, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysschobjs'. Scan count 0, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysclsobjs'. Scan count 0, logical reads 2, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

By using the DMV, we have improved the query performance and reduced the total I/O count by nearly 100%. Another added benefit of using the DMV, is we won't need locks on the base table and therefore will avoid the possibility of blocking other queries hitting that table.

This is just one simple example of how you can easily improve the performance of an application.