Friday, June 12, 2009

Using Hash Values to Find Similar Queries


There were a lot of capabilities added in SQL Server 2005 with the introduction of Dynamic Management Views and Functions, but there were a couple of columns added to the sys.dm_exec_query_stats and sys.dm_exec_requests DMV's that are going to make life a lot easier when trying to indentify queries for tuning purposes.

You can now use the query_plan_hash and the query_hash columns to group queries based on similar logic and execution plans. In other words, you can find those small queries that run several times that wouldn't normally be on your radar. For example, let's say you run the following queries. Be carful not to free your procedure cache on a production machine. I am doing this for demo purposes only.

USE AdventureWorks
GO
DBCC FREEPROCCACHE
GO
SELECT * FROM Person.Address WHERE Postalcode = '98011'
GO
SELECT * FROM Person.Address WHERE Postalcode ='98011'
GO
SELECT * FROM Person.Address WHERE Postalcode = '98011'
GO
SELECT * FROM Person.Address WHERE Postalcode = '98011';
GO
SELECT * FROM Person.Address WHERE Postalcode = 'K4B 1T7'
GO
SELECT *  FROM Person.Address WHERE Postalcode = 'K4B 1T7'
GO

Now if you take the query from Books Online for 2005 for Obtaining information about the top five queries by average CPU time and modify it a little to add a GROUP BY so you can also see the number times a query has executed and get the Total CPU Time, you can see that only two of the queries were similar enough to be grouped together.

USE AdventureWorks
GO

SELECT   TOP 5 
Sum(total_worker_time / execution_count) AS [Total CPU Time],
               Sum(execution_count)AS executioncount,
               Substring(st.TEXT,(qs.statement_start_offset / 2) + 1,
                         ((CASE qs.statement_end_offset
                             WHEN -1
                             THEN Datalength(st.TEXT)
                             ELSE qs.statement_end_offset
                           END - qs.statement_start_offset) / 2) + 1) AS statementtext
FROM     sys.dm_exec_query_stats AS qs
         CROSS APPLY sys.Dm_exec_sql_text(qs.sql_handle) AS st
GROUP BY Substring(st.TEXT,(qs.statement_start_offset / 2) + 1,
                   ((CASE qs.statement_end_offset
                       WHEN -1
                       THEN Datalength(st.TEXT)
                       ELSE qs.statement_end_offset
                     END - qs.statement_start_offset) / 2) + 1)
ORDER BY Sum(total_worker_time / execution_count) DESC;

image

Now here's the cool part. In SQL Server 2008, you can use the query_hash to aggregate all of the queries and get the true execution counts and CPU measures.

SELECT   TOP 5 
Sum(query_stats.total_worker_time/query_stats.execution_count) AS [Total CPU Time], 
              Sum(execution_count)AS executioncount, 
              Min(query_stats.statement_text) AS statementtext
FROM     (SELECT qs.*,
                 Substring(st.TEXT,(qs.statement_start_offset / 2) + 1,
                           ((CASE statement_end_offset
                               WHEN -1
                               THEN Datalength(st.TEXT)
                               ELSE qs.statement_end_offset
                             END - qs.statement_start_offset) / 2) + 1) AS statement_text
          FROM   sys.dm_exec_query_stats AS qs
                 CROSS APPLY sys.Dm_exec_sql_text(qs.sql_handle) AS st) AS query_stats
GROUP BY query_stats.query_hash
ORDER BY [Total CPU Time] DESC;

image 

Awesome. Yet another reason to upgrade to 2008. The query_plan_hash works the same except the hash value is based off the execution plan, so you can find queries that may not produce the same query_hash but will produce the same query_plan_hash.

0 comments: