Optimizing T-SQL COUNT

This is a continuation of my previous post about paging in SQL Server. When it comes to paging, you would naturally want to know the total number of rows satisfying, so you can display some nice, useful information to your end-users.

You would think, well, it’s just a count, and a simple query like this would be enough:

SELECT COUNT(Id) FROM MySecretTable

There should be nothing to worry about, right? Actually, there is.

Let’s get back to the example in previous post – we have to count the total number of orders in that big table.

SELECT COUNT(ObjectId) FROM OrderGroup_PurchaseOrder

Because ObjectId is the clustered index of OrderGroup_PurchaseOrder, I did expect it to be use that index and be pretty fast. But does it? To my surprises, no.

So it do an index scan on a non clustered index – in this case `IDX_PurchaseOrder_TrackingNumber`. May be that was the reason it was slow. So let force it to use the clustered index – well, because we can

SELECT Count(ObjectId) FROM OrderGroup_PurchaseOrder WITH (INDEX (PK_OrderGroup_PurchaseOrder))

Well, SQL Server obeys my command (I’m its master, after all), but it performs poorly. Previous query ran for around 10s with a cold cache, and this query takes more or less 8 minutes, even though it has the execution plan I wanted:

Something must be wrong – somewhere. I tried again and the query behaved the same, consistently.

It only became clear to me when I turned on the IO statistics. The slow query uses a lot of IO operation

Table 'OrderGroup_PurchaseOrder'. Scan count 9, logical reads 397053, physical reads 1407, read-ahead reads 42411, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

when the fast one uses much less

Table 'OrderGroup_PurchaseOrder'. Scan count 9, logical reads 22552, physical reads 0, read-ahead reads 10980, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

It explains why SQL Server uses the non clustered index in the first place. I should have trusted it :).

If we look into how indexes are stored internally we can understand these number. The non clustered index is actually stored with the column(s) of the clustered index (so it can quickly locate the rows if needed). In this case, the ObjectId  column would be indexed with IX_OrderGroup_PurchaseOrder_TrackingNumber  ,  while the clustered index PK_OrderGroup_PurchaseOrder would be the entire table (minus the non clustered indexes), which is why it’s much bigger and much slower to read from. You can’t actually read only ObjectId from the clustered index – you were reading the entire row. This is also why should we have as narrow clustered index as possible, because it will increase the size of all non clustered indexes in that table.

Let’s do another experiment to confirm our theory. If we have another index with smaller size, would SQL Server use it for COUNT? Just create an index on Modified, which is only 8 bytes in size. This clustered index should have the smallest size of all indexes in that table:

SELECT i.[name] AS IndexName
    ,SUM (s.[used_page_count]) AS PageCount
FROM sys.dm_db_partition_stats AS s
INNER JOIN sys.indexes AS i ON s.[object_id] = i.[object_id]
    AND s.[index_id] = i.[index_id]
	WHERE i.[name] LIKE '%OrderGroup_PurchaseOrder%'
GROUP BY i.[name]
ORDER BY PageCount DESC

and we have a new, smaller index. Let’s run the original query again to see:

And yes, our theory is confirmed. SQL Server prefers the execution plan which requires the least IO operations.

So we can’t really make COUNT faster without adding another index (which has downsides on its own). To perform a COUNT, SQL Server would have to do an index scan. One optimization is to count on TrackingNumber

Morals of the story

  • COUNT can be expensive if your table is big enough, and it’s very hard to optimize.
  • COUNT on a big data set should only happen once. Sure, that would mean it can be out-of-date after a while (new records being added, or old records being deleted), so ask yourself if you really need the real time accurate.
  • SQL Server tries to figure out the execution plan based on the least IO required path. In most of the cases, it’s smarter than you and me, so let it does the job, unless you are absolutely sure you know better.

Leave a Reply

Your email address will not be published. Required fields are marked *