Fixing a stored procedure

At Episerver development team, we understand the importance of good performance. Who would not like a lightning fast website? We work hard to ensure the framework is fast, and we seize (almost) every opportunity to make it faster.

You know in Commerce 10.2 we introduced a new cart mode – serializable cart, and it’s proven to bring great performance compared to the “old/traditional” approach. Our own tests showed an improvement of 3-5x times faster. But can it be even faster? Probably yes.

And actually we did some improvements in later versions. In the scope of this blog post, we will just focus into a specific aspect – and to learn a little more about SQL Server performance optimization.

Let’s look at a stored procedure in the new cart system:

CREATE PROCEDURE [dbo].[ecf_SerializableCart_FindCarts]
	@CartId INT = NULL,
	@CustomerId UNIQUEIDENTIFIER = NULL,
	@Name NVARCHAR (128) = NULL,
	@MarketId NVARCHAR (16) = NULL,
	@CreatedFrom DateTime = NULL,
	@CreatedTo DateTime = NULL,
	@StartingRecord INT = NULL,
	@RecordsToRetrieve INT = NULL
AS
BEGIN
      SELECT [CartId]
      ,[CustomerId]
      ,[Name]
      ,[Data]
      FROM SerializableCart
      WHERE (@CartId IS NULL OR CartId = @CartId)
      AND (@CustomerId IS NULL OR CustomerId = @CustomerId)
      AND (@Name IS NULL OR Name = @Name)
      AND (@MarketId IS NULL OR MarketId = @MarketId)
      AND (@CreatedFrom IS NULL OR Created >= @CreatedFrom)
      AND (@CreatedTo IS NULL OR Created <= @CreatedTo)

If you still have an old version with this stored procedure, it’s not exactly what it looks like. I removed the paging to make it simpler to prove the point (and that is one part we can get back in another post).

This SP looks … normal. At least to us who usually work with C#. The logic is quite easy to understand and straightforward – based on the parameters passed to the SP, it will filter by that condition. No parameter is provided means no filter.

But if it is easy to understand, is it good? SQL is vastly different for C#, not just syntax, but also mindset. From what we think about C#, the stored procedure can just eliminate the “circuit shortcuts” and filter the conditions which matter. But can it?

The easy way to check is, well, of course, run it. We already know we have a clustered index on `CartId` column, and a non clustered index on (CustomerId, Name), so if we query by CartId, SQL Server should use a clustered index seek, and if we query by CustomerId and Name, it should use a non clustered index seek with key look up (because we are looking for data which is not in the index itself). That’s what we think and it seems logical, but does it?

The thing I like the most about SQL Server is that we can easily validate our assumptions, with the help of SQL Server Management Studio and its superb execution plan feature. Let’s try to run the SP

exec ecf_SerializableCart_FindCarts @CustomerId = '55538A39-CFDA-4C89-861F-0DD7C9C313C2', @Name = 'Wishlist'

I have a fairly big database which about 330k rows in SerializableCart. As we mentioned earlier, this query should use an index seek + key lookup. But to your surprises, it does not

It uses an index scan, and not just any index scan, it scans the entire IDX_SerializableCart_Indexed_CustomerId_Name, as we can see in the Number of Rows Read. If we turn on the IO Statistics with  SET IO STATISTICS ON  we can see it was doing a lot of reads

(3 row(s) affected)
Table 'SerializableCart'. Scan count 1, logical reads 1875, physical reads 6, read-ahead reads 1903, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

So this is not good. This sub optimal execution plan is not only slow, it also wastes a lot of resources. This is not to mention that the index would be locked during the scan, and that will increase the chance of deadlocks when we have other threads trying to update this table.

So it’s clear, SQL Server does not/can not/is not very good at evaluating the logic conditions. One can argue this is the SQL Server Parameter sniffing problem, but it is not so clear. Technically, Parameter sniffing means SQL Server caches the execution plan on the first run, and re-use it on subsequent runs, even that would be highly ineffective. However it’s not entirely true in this case. Even after I run `DBCC FREEPROCCACHE`, the result is still the same.

Now we know about the problem – we have to fix it. The first attempt look like this

ALTER PROCEDURE [dbo].[ecf_SerializableCart_FindCarts]
	@CartId INT = NULL,
	@CustomerId UNIQUEIDENTIFIER = NULL,
	@Name NVARCHAR (128) = NULL,
	@MarketId NVARCHAR (16) = NULL,
	@CreatedFrom DateTime = NULL,
	@CreatedTo DateTime = NULL,
	@StartingRecord INT = NULL,
	@RecordsToRetrieve INT = NULL
AS
BEGIN
      SELECT [CartId]
      ,[CustomerId]
      ,[Name]
      ,[Data]
      FROM SerializableCart
      WHERE (@CartId IS NULL OR CartId = @CartId)
      AND (@CustomerId IS NULL OR CustomerId = @CustomerId)
      AND (@Name IS NULL OR Name = @Name)
      AND (@MarketId IS NULL OR MarketId = @MarketId)
      AND (@CreatedFrom IS NULL OR Created >= @CreatedFrom)
      AND (@CreatedTo IS NULL OR Created <= @CreatedTo)
	  OPTION (RECOMPILE)
END

We are adding OPTION (RECOMPILE) hint to the query. This will tell SQL Server to throw away the execution plan cache for this SP, and re-evaluate the condition everytime it’s called. Performance wise, this looks much better

And the IO reads have been dramatically reduced:

(3 row(s) affected)
Table 'SerializableCart'. Scan count 1, logical reads 12, physical reads 7, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

This is how it should perform. Let’s try again with @CartId

And a clustered index seek was used, as we expected. That’s all good. In other words, the query could have been like this

		SELECT CartId, Created, Modified, [Data]
		FROM SerializableCart
		WHERE (@CartId IS NULL OR CartId = @CartId)
			AND (@CustomerId IS NULL OR CustomerId = @CustomerId)
			AND (@Name IS NULL OR Name = @Name)
            AND (@MarketId IS NULL OR MarketId = @MarketId)
			AND (@CreatedFrom IS NULL OR Created >= @CreatedFrom)
			AND (@CreatedTo IS NULL OR Created <= @CreatedTo)
			AND (@ModifiedFrom IS NULL OR Modified >= @ModifiedFrom)
			AND (@ModifiedTo IS NULL OR Modified <= @ModifiedTo)
			
		ORDER BY CartId DESC
		OFFSET @StartingRecord ROWS
		FETCH NEXT @RecordsToRetrieve ROWS ONLY
		OPTION (RECOMPILE)

The interesting thing is, RECOMPILE only works at that statement level, if I add RECOMPILE at SP level, it does not work. In other words, this is not working:

ALTER PROCEDURE [dbo].[ecf_SerializableCart_FindCarts]
	@CartId INT = NULL,
	@CustomerId UNIQUEIDENTIFIER = NULL,
	@Name NVARCHAR (128) = NULL,
	@MarketId NVARCHAR (16) = NULL,
	@CreatedFrom DateTime = NULL,
	@CreatedTo DateTime = NULL,
	@StartingRecord INT = NULL,
	@RecordsToRetrieve INT = NULL
WITH RECOMPILE
AS
BEGIN
      SELECT [CartId]
      ,[CustomerId]
      ,[Name]
      ,[Data]
      FROM SerializableCart
      WHERE (@CartId IS NULL OR CartId = @CartId)
      AND (@CustomerId IS NULL OR CustomerId = @CustomerId)
      AND (@Name IS NULL OR Name = @Name)
      AND (@MarketId IS NULL OR MarketId = @MarketId)
      AND (@CreatedFrom IS NULL OR Created >= @CreatedFrom)
      AND (@CreatedTo IS NULL OR Created <= @CreatedTo)
END

In this case, it behaves pretty much the same as without WITH RECOMPILE.

So did we go with that? We could. But the execution plan is not everything. To get there, SQL Server needs to parse the compile the SP. And in the best cases, the execution plan should be cached and reused. But let’s check if is is the case by using this query.

SELECT cp.objtype AS ObjectType,
OBJECT_NAME(st.objectid,st.dbid) AS ObjectName,
cp.usecounts AS ExecutionCount,
st.TEXT AS QueryText,
qp.query_plan AS QueryPlan
FROM sys.dm_exec_cached_plans AS cp
CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) AS qp
CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) AS st
WHERE OBJECT_NAME(st.objectid,st.dbid) = 'ecf_SerializableCart_FindCarts'

Apparently the SP is recompiled every time it is executed. Keep in mind recompilation is not free. Every recompilation will cost CPU cycles, and you don’t want a frequently called SP to be recompiled every time.

Is there another option? Yes, by dynamically constructing the query based on the parameters. So the parameters which are null will not make it to the WHERE statement, allowing SQL Server to optimize the execution plan as it should.

This is how it would look:

CREATE PROCEDURE [dbo].[ecf_SerializableCart_FindCarts]
	@CartId INT = NULL,
	@CustomerId UNIQUEIDENTIFIER = NULL,
	@Name NVARCHAR (128) = NULL,
    @MarketId NVARCHAR (16) = NULL,
	@CreatedFrom DateTime = NULL,
	@CreatedTo DateTime = NULL,
	@ModifiedFrom DateTime = NULL,
	@ModifiedTo DateTime = NULL,
	@StartingRecord INT = NULL,
	@RecordsToRetrieve INT = NULL
AS
BEGIN
	DECLARE @query nvarchar(4000);
	SET @query = 'SELECT CartId, Created, Modified, [Data] FROM SerializableCart WHERE 1 = 1 '

	IF (@CartId IS NOT NULL)
	BEGIN
	SET @query = @query + ' AND CartId = @CartId '
	END
	IF (@CustomerId IS NOT NULL)
	BEGIN
	SET @query = @query + ' AND CustomerId = @CustomerId '
	END
	IF (@Name IS NOT NULL)
	BEGIN
	SET @query = @query + ' AND Name = @Name '
	END
	IF (@CartId IS NOT NULL)
	BEGIN
	SET @query = @query + ' AND CartId = @CartId '
	END
	IF (@MarketId IS NOT NULL)
	BEGIN
	SET @query = @query + ' AND MarketId = @MarketId '
	END
	IF (@CreatedFrom IS NOT NULL)
	BEGIN
	SET @query = @query + ' AND Created >= @CreatedFrom '
	END
	IF (@CreatedTo IS NOT NULL)
	BEGIN
	SET @query = @query + ' AND Created <= @CreatedTo '
	END
	IF (@ModifiedFrom IS NOT NULL)
	BEGIN
	SET @query = @query + ' AND Modified >= @ModifiedFrom '
	END
	IF (@ModifiedTo IS NOT NULL)
	BEGIN
	SET @query = @query + ' AND Modified <= @ModifiedTo '
	END

	SET @query = @query +
	' ORDER BY  CartId DESC
        OFFSET '  + CAST(@StartingRecord AS NVARCHAR(50)) + '  ROWS 
        FETCH NEXT ' + CAST(@RecordsToRetrieve AS NVARCHAR(50)) + ' ROWS ONLY'

	exec sp_executesql @query, 
	N'@CartId INT,
	@CustomerId UNIQUEIDENTIFIER,
	@Name nvarchar(128),
    @MarketId nvarchar(16),
	@CreatedFrom DateTime,
	@CreatedTo DateTime,
	@ModifiedFrom DateTime,
	@ModifiedTo DateTime,
	@StartingRecord INT,
	@RecordsToRetrieve INT',
	@CartId = @CartId, @CustomerId= @CustomerId, @Name=@Name, @MarketId = @MarketId,
	@CreatedFrom = @CreatedFrom, @CreatedTo=@CreatedTo, @ModifiedFrom=@ModifiedFrom, @ModifiedTo=@ModifiedTo, 
	@StartingRecord = @StartingRecord, @RecordsToRetrieve =@RecordsToRetrieve
END

So if we query by CustomerId and Name , the actual query executed by SQL Server will look like this (I removed the paging part for simpleness):

SELECT CartId, Created, Modified, [Data] FROM SerializableCart WHERE 1 = 1 AND CustomerId = @CustomerId AND Name = @Name

There might be two concerns for you. First, doesn’t the string concentration cost something? Second, will this execution plan cached?

First, yes, it costs something, but still much cheaper than execution plan compilation. And second, yes, it’s cached. exec sp_executesql is very important because it allows SQL Server to parameterize the query. Just try to run the stored procedure a few times, and run this query to see which execution plans are cached:

SELECT cp.objtype AS ObjectType,
OBJECT_NAME(st.objectid,st.dbid) AS ObjectName,
cp.usecounts AS ExecutionCount,
st.TEXT AS QueryText,
qp.query_plan AS QueryPlan
FROM sys.dm_exec_cached_plans AS cp
CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) AS qp
CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) AS st
WHERE cp.objtype = 'Prepared'

You will get this:

With the query text:

(@CartId INT,   @CustomerId UNIQUEIDENTIFIER,   @Name nvarchar(128),      @MarketId nvarchar(16),   @CreatedFrom DateTime,   @CreatedTo DateTime,   @ModifiedFrom DateTime,   @ModifiedTo DateTime,   @StartingRecord INT,   @RecordsToRetrieve INT)SELECT CartId, Created, Modified, [Data] FROM SerializableCart WHERE 1 = 1  AND CustomerId = @CustomerId  AND Name = @Name  ORDER BY  CartId DESC          OFFSET 0  ROWS           FETCH NEXT 10 ROWS ONLY

So this was what we wanted to see.

In the end, both versions, dynamically built query and RECOMPILE perform quite similar to each other, they have same execution plan for same collection of parameters. And when the data set is big enough, the differences in term of performance become negligible. In the end, it really comes to your personal preferences, and/or your team choices.

So morals of the story:

  • Conditional WHERE statements are bad, and should be avoided unless absolutely necessary.
  • Make sure to check if you are getting adequate performance from your SP.
  • When applicable, try OPTION COMPILE or dynamically built query to make sure your query does not suffer from a poor execution plan.

 

 

2 thoughts on “Fixing a stored procedure

Leave a Reply

Your email address will not be published. Required fields are marked *