This time, we will talk about ecfVersion_ListFiltered
, again.
This stored procedure was previously the subject of several blog posts regarding SQL Server performance optimizations. When I thought it is perfect (in term of performance), I learned something more.
Recently we received a performance report from a customer asking about an issue after upgrading from Commerce 10.4.2 to Commerce 10.8 (the last version before Commerce 11). The job “Publish Delayed Content Versions” starts to throw timeout exceptions.
This scheduled job calls to a ecfVersion_ListFiltered
to load the content versions which are in status DelayedPublish
, it looks like this when it reaches SQL Server:
declare @s [udttIdTable] insert into @s values(6) exec dbo.ecfVersion_ListFiltered @Statuses = @s, @StartIndex = 0, @MaxRows = 2147483646
This query is known to be slow. The reason is quite obvious – Status
contains only 5 or 6 distinct values, so it’s not indexed. SQL Server will have to do a Clustered Index Scan, and if ecfVersion
is big enough, it’s inevitably slow.
But being slow does not mean it’s allowed to not work. Under any reasonable circumstances, the query should complete in under 30 seconds (the default timeout value), and return to the caller.
So it needs to be fixed.
[quads id=2]
I was in the direct contact with the partner developerE, and of course, as always, I suggested him to do some tests. Rebuilding index is the first thing to think about – as the query relies heavily on a Clustered Index Scan, a highly fragmented Clustered Index can severely affect performance, and rebuilding it would help.
It didn’t.
Knowing we improved this query in more reason versions, I suggested him to try to upgrade to later one, and eventually sent the newer version of the stored procedure. It was 2x times faster on another customer database.
It didn’t help, neither.
I was really puzzled. Even after getting their database to do the tests myself, I couldn’t find anything suspicious from the execution plan, the time and IO statistics. Everything looks normal.
Time to blame the hardware?
Not really, the developer said the old version was quite fast on same hardware. So a change in the database made the SP slower between these two versions. I finally scoped it down to the change in the SP from
SET NOCOUNT ON DECLARE @StatusCount INT SELECT @StatusCount = COUNT(*) FROM @Statuses DECLARE @LanguageCount INT SELECT @LanguageCount = COUNT(*) FROM @Languages ;WITH TempResult as ( SELECT ROW_NUMBER() OVER(ORDER BY vn.Modified DESC) as RowNumber, vn.* FROM dbo.ecfVersion vn WHERE vn.CurrentLanguageRemoved = 0 AND ((@ObjectId IS NULL) OR vn.ObjectId = @ObjectId) AND ((@ObjectTypeId IS NULL) OR vn.ObjectTypeId = @ObjectTypeId) AND ((@ModifiedBy IS NULL) OR vn.ModifiedBy = @ModifiedBy) AND ((@StatusCount = 0) OR (vn.[Status] IN (SELECT ID FROM @Statuses))) AND ((@LanguageCount = 0) OR (vn.LanguageName IN (SELECT LanguageCode FROM @Languages))) ) SELECT WorkId, ObjectId, ObjectTypeId, Name, LanguageName, MasterLanguageName, IsCommonDraft, StartPublish, ModifiedBy, Modified, [Status], (SELECT COUNT(*) FROM TempResult) AS TotalRows FROM TempResult WHERE RowNumber BETWEEN (@StartIndex + 1) AND (@MaxRows + @StartIndex)
to:
DECLARE @StatusCount INT SELECT @StatusCount = COUNT(*) FROM @Statuses DECLARE @LanguageCount INT SELECT @LanguageCount = COUNT(*) FROM @Languages ;WITH TempResult as ( SELECT ROW_NUMBER() OVER(ORDER BY vn.Modified DESC) as RowNumber, vn.* FROM dbo.ecfVersion vn WHERE [dbo].ecf_IsCurrentLanguageRemoved(vn.CatalogId, vn.LanguageName) = 0 AND ((@ObjectId IS NULL) OR vn.ObjectId = @ObjectId) AND ((@ObjectTypeId IS NULL) OR vn.ObjectTypeId = @ObjectTypeId) AND ((@ModifiedBy IS NULL) OR vn.ModifiedBy = @ModifiedBy) AND ((@StatusCount = 0) OR (vn.[Status] IN (SELECT ID FROM @Statuses))) AND ((@LanguageCount = 0) OR (vn.LanguageName IN (SELECT LanguageCode FROM @Languages))) ) SELECT WorkId, ObjectId, ObjectTypeId, Name, LanguageName, MasterLanguageName, IsCommonDraft, StartPublish, ModifiedBy, Modified, [Status], (SELECT COUNT(*) FROM TempResult) AS TotalRows FROM TempResult WHERE RowNumber BETWEEN (@StartIndex + 1) AND (@MaxRows + @StartIndex)
So if you can’t spot the difference, the check for ecfVersion.CurrentLanguageRemoved
was changed to an user defined function – UDF – ecf_IsCurrentLanguageRemoved
. Naturally, that will be our suspicion.
But execution plan tells otherwise. There were virtually no difference between execution plans:
Slow one:
Fast one:
IO Statistics are also very similar:
Slow one:
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 3235, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table '#BEA5F9D9'. Scan count 2, logical reads 435176, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'ecfVersion'. Scan count 2, logical reads 17076, physical reads 1924, read-ahead reads 10419, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Fast one:
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 3236, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table '#BEA5F9D9'. Scan count 2, logical reads 435176, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'ecfVersion'. Scan count 2, logical reads 17076, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
But if I removed the call to the UDF, the SP was fast again. With almost no difference in execution plan and number of IOs read. We can safely assume that the UDF is called for every row, and that makes it slow.
Why?
I pulled my hairs for a while (there is a joke running inside Commerce development team, if you have too many hairs, you are not experienced enough) Things only come clear when we get the execs stats
SELECT text , execution_count , total_logical_reads / execution_count AvgReads , total_worker_time / execution_count AvgWorkersTime , total_elapsed_time / execution_count AvgElapsedTime FROM sys.dm_exec_query_stats CROSS APPLY sys.dm_exec_sql_text(sql_handle) WHERE text LIKE '%ecf_IsCurrentLanguageRemoved%'
It’s now clear that the UDF was called on every row of ecfVersion
, and because we need to count the number of rows satisfied the filter conditions, it’s called twice for each row. Each call is cheap, but the total time and IOs are significant.
One thing to remember when work with SQL Server is that it’s designed to work with sets, not individual rows. That’s explain why SQL Server performs very bad in this case: it is not allowed to do what it does best.
A simple fix would be to throw away the UDF, and to join with CatalogLanguage
directly, something like this:
SET NOCOUNT ON DECLARE @StatusCount INT SELECT @StatusCount = COUNT(*) FROM @Statuses DECLARE @LanguageCount INT SELECT @LanguageCount = COUNT(*) FROM @Languages ;WITH TempResult as ( SELECT ROW_NUMBER() OVER(ORDER BY vn.Modified DESC) as RowNumber, vn.* FROM dbo.ecfVersion vn INNER JOIN CatalogLanguage cl ON vn.CatalogId = cl.CatalogId AND vn.LanguageName = cl.LanguageCode WHERE ((@ObjectId IS NULL) OR vn.ObjectId = @ObjectId) AND ((@ObjectTypeId IS NULL) OR vn.ObjectTypeId = @ObjectTypeId) AND ((@ModifiedBy IS NULL) OR vn.ModifiedBy = @ModifiedBy) AND ((@StatusCount = 0) OR (vn.[Status] IN (SELECT ID FROM @Statuses))) AND ((@LanguageCount = 0) OR (vn.LanguageName IN (SELECT LanguageCode FROM @Languages))) ) SELECT WorkId, ObjectId, ObjectTypeId, Name, LanguageName, MasterLanguageName, IsCommonDraft, StartPublish, ModifiedBy, Modified, [Status], (SELECT COUNT(*) FROM TempResult) AS TotalRows FROM TempResult WHERE RowNumber BETWEEN (@StartIndex + 1) AND (@MaxRows + @StartIndex)
Lesson learned:
- Always check the query you are going to change on some sizable data set. The result can be quite shocking.
One of the most important rules for performance optimizations is to not do worse. I.e. you have to keep checking the performance to ensure that performance is not degrading over time. A 5% degradation seems to be small, but ten of them can has significant impact.
The SP became 15-20 times slower, and nobody raised suspicions about it!
- UDF can be a really sneaky b*stard. Execution plan does not show time consumed by an UDF. IO statistics does not include IO made by an UDF. It’s only be caught by the query statistics and that can be easily missed in 95% of the case.
Can it be even better? Yes, it can. The “updated” version (as we made some other improvements to ecfVersion_ListFiltered
recently), which will be in an upcoming release, will perform like this:
SQL Server Execution Times:
CPU time = 201 ms, elapsed time = 540 ms.
So basically more than 50x times faster the affected one, and 3x times faster the fixed one.
That’s why you should upgrade to latest version, my friends!
Currently working on upgrading from 9.24.1 to latest CMS/Commerce 10.
We’re experiencing this issue, where the “Publish Delayed Content Versions” times out after 30 seconds. This is currently blocking us from deploying into production.
Is it a viable option to modify the existing SP according to the solution you posted here? Or is there another option besides furter upgrading the next major version? (if solved there)
I did – and it worked like a charm. Thank you!