Varchar can be harmful to your performance

As string is the most common data type in an application, nvarchar and its variant varchar are probably the most common column types in your database. (We almost always use nvarchar because nchar is meant for fixed length columns which we don’t have). The difference is that nvarchar has encoding of UTF-16/USC-2 while varchar has UTF-8

Starting with SQL Server 2012 (11.x), when a Supplementary Character (SC) enabled collation is used, these data types store the full range of Unicode character data and use the UTF-16 character encoding. If a non-SC collation is specified, then these data types store only the subset of character data supported by the UCS-2 character encoding.

But varchar can be harmful in a way that you don’t expect it to. Let’s assume we have this simple table with two columns (forgive naming, I can’t come up with better names)

CREATE TABLE [dbo].[Demo](
	[varcharColumn] [varchar](50) NULL,
	[nvarcharColumn] [nvarchar](50) NULL
)

Each will be inserted with same random values, almost unique. We will add a non clustered index on each of these columns, and as we know, the index should be very efficient on querying based on those values.

Let’s try with out varchar column first. It should work pretty well right. Nope!

SELECT *
  FROM dbo.[Demo]
  where varcharColumn = N'0002T9'

Instead of a highly efficient Index seek, it does an Index scan on the entire table. This is of course not what you want to.

But, why? Good question. You might have noticed that I used N’0002T9′ which is a nvarchar type – which is what .NET would pass to your query if your parameter is of type string. If you look closer to the execution plan, you’ll see that SQL Server has to do a CONVERT_IMPLICIT on each row of this column, effectively invalidates the index.

If we pass ‘0002T9’ without the notion though, it works as it should, this can cause the confusion as it works during development, but once deployed it is much slower

To see the difference we can run the queries side by side. Note that this is for a very simple table with 130k rows. If you have a few millions or more rows, the difference will be even bigger.

(1 row affected)
Table 'Demo'. Scan count 1, logical reads 4, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.


(1 row affected)
Table 'Demo'. Scan count 1, logical reads 422, physical reads 0, page server reads 0, read-ahead reads 14, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

What’s about the vice versa? If we have data as nvarchar(100) but the parameter is passed as varchar ? SQL Server can handle it with ease. It simply converts the parameters to nvarchar and does an index seek, as it should

So moral of the story? Unless you have strong reasons to use varchar (or char ), stick with nvarchar (or nchar ) to avoid complications with data type conversion which can, and will hurt your database performance.

Don’t let the execution plan fools you

Don’t get me wrong, execution plan is one of the best tools at your disposal if you want to optimize a SQL query. No, it is the must have tool. It is not the only tool you will need, but if you have to pick only one, pick it.

But it is important to know that execution plan can be misleading. It is very useful to see where is the bottleneck is within a statement. It is not exactly useful when you need to compare two statements.

Let’s compare these two queries that I am working to optimize

SELECT	OG.OrderGroupId
		FROM	OrderGroup OG
		INNER JOIN	OrderGroup_PurchaseOrder PO ON OG.OrderGroupId = PO.ObjectId WHERE 1 = 1  AND OG.Status IN(SELECT Item FROM ecf_splitlist('Cancelled')) ORDER BY OG.OrderGroupId DESC
        OFFSET 0  ROWS 
        FETCH NEXT 50 ROWS ONLY

versus

SELECT	OG.OrderGroupId
		FROM	OrderGroup OG
		INNER JOIN	OrderGroup_PurchaseOrder PO ON OG.OrderGroupId = PO.ObjectId  WHERE 1 = 1  AND OG.Status IN('Cancelled') ORDER BY OG.OrderGroupId DESC
        OFFSET 0  ROWS 
        FETCH NEXT 50 ROWS ONLY

These are 99% similar, except for the statement OG.Status IN ..., with and without calling the split function.

If you look at the execution plan only, it seems the former is much faster than the latter. It takes only 14% of the time, while the latter takes 86%, so if based on those figures only, we might think the first one is ~6 times faster than the second one.

Except it is not. If we turn on the IO statistics, it is a very different story

The first query has significantly more IO operations than the second

(50 rows affected)
Table 'OrderGroup_PurchaseOrder'. Scan count 0, logical reads 162, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table '#BA76F977'. Scan count 1, logical reads 8386, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'OrderGroup'. Scan count 1, logical reads 356, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

versus

(50 rows affected)
Table 'OrderGroup'. Scan count 1, logical reads 356, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'OrderGroup_PurchaseOrder'. Scan count 1, logical reads 143, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

The first has slightly more logical reads on OrderGroup and OrderGroup_PurchaseOrder, but significantly more in a temp table (which is, inside the ecf_splitlist function).

The moral of the story? Execution plan is helpful, but not to compare query to query. In most cases, IO statistics are much more useful.

Optimizing an interesting query

It’s not a secret, I love optimizing things. In a sense, I am both an Optimizer (literally) and an optimizer. And today we will be back to basic – optimizing a tricky SQL query.

The query in question is this particular stored procedure ecf_CatalogNode_GetAllChildNodes, this is used to get all children nodes of specific nodes. It is used in between to find all entries that are direct, or indirect children of specific nodes. Why, you might ask, because when you change the url segment of the node, you want to make sure that all entries that are under that node, will have their indexed object refreshed.

Let’s take a look at this stored procedure, this is how it looks like

CREATE PROCEDURE [dbo].[ecf_CatalogNode_GetAllChildNodes]
    @catalogNodeIds udttCatalogNodeList readonly
AS
BEGIN
    WITH all_node_relations AS 
    (
        SELECT ParentNodeId, CatalogNodeId AS ChildNodeId FROM CatalogNode
        WHERE ParentNodeId > 0
        UNION
        SELECT ParentNodeId, ChildNodeId FROM CatalogNodeRelation
    ),
    hierarchy AS
    (
        SELECT 
            n.CatalogNodeId,
            '|' + CAST(n.CatalogNodeId AS nvarchar(4000)) + '|' AS CyclePrevention
        FROM @catalogNodeIds n
        UNION ALL
        SELECT
            children.ChildNodeId AS CatalogNodeId,
            parent.CyclePrevention + CAST(children.ChildNodeId AS nvarchar(4000)) + '|' AS CyclePrevention
        FROM hierarchy parent
        JOIN all_node_relations children ON parent.CatalogNodeId = children.ParentNodeId
        WHERE CHARINDEX('|' + CAST(children.ChildNodeId AS nvarchar(4000)) + '|', parent.CyclePrevention) = 0
    )
    SELECT CatalogNodeId FROM hierarchy
END

I previously wrote about the relations between entities in Commerce catalog, here Commerce relation(ship), a story – Quan Mai’s blog (vimvq1987.com) , so relations between nodes can be a bit complicated – a node can have one true parent defined in CatalogNode table, and then other “linked” nodes in CatalogNodeRelation . So to find all children – and grand children of a node, you need to get from both.

Getting children of a node from CatalogNode or CatalogNodeRelation is simple, but things become more complicated when you have to get grandchildren, then great-grandchildren, and so on, and so forth. with that, CTE needs to be used in a recursive way. But then there is a problem arises – there is a chance, small, but still, that the data was added in a correct way, so circular reference is possible. i.e. A is a parent of B, which is a parent of C, and itself is a parent of A. To stop the SP from running forever, a check needs to be added to make sure any circular reference is cut short.

This brings back memory as the first ever support case I worked on at Optimizely (then Episerver) was with a circular reference. The site would crash whenever someone visited the catalog management in Commerce Manager. That was around June, 2012 (feeling old now?). My “boss” at that time involuntarily volunteered me for the case. See what you made me do, boss.

Now you can grasp the basic of what the SP does – let’s get back to the original problem. it’s slow to run especially with big catalog and complex node structure. As always, to optimize everything you need to find the bottleneck – time to fire up SQL Server Management Studio and turn on the Actual Execution Plan

I decided to go with 66, the “root” catalog node. this query yield around 18k rows

declare @Nodes udttCatalogNodeList 

insert into @Nodes (CatalogNodeId) select 66

exec ecf_CatalogNode_GetAllChildNodes @Nodes

and also 18s of execution.

Mind you, this is on my machine with pretty powerful CPU (AMD Ryzen 7 5800x, 8 cores 16 threads), and a very fast nvme PCIe SSD (Western Digital Black SN850 2TB). If this was executed on Azure Sql database for example, a timeout is almost certainly guaranteed. So time of execution should only be compared relatively with each other.

If we look at the execution plan, it is quite obvious where the bottleneck is. A scan on CatalogNode table is heavy (it read 79M rows on that operation). As suggest by Anders from Timeout when deleting CatalogNodes from a large catalog (optimizely.com), adding a non clustered index on ParentNodeId column would improve it quite a lot. And indeed it does. The execution time is reduced to 5 second.

And the number of rows read on CatalogNode reduced to just 17k

This is of course a very nice improvement. But the customer reported that it is not enough and the SP is still giving timeout, i.e. further optimization is needed.

Naturally, the next step would be to see if we can skip the circular check. It was added as a safe measure to avoid bad data. It should not be there, as the check should be performed at data modification. But it is there for historical reasons and we can’t just change it, not trivially. So let’s try it for our curiousity.

The modified query looks like this (basically just commented out any code related to the CyclePrevention

ALTER PROCEDURE [dbo].[ecf_CatalogNode_GetAllChildNodes]
    @catalogNodeIds udttCatalogNodeList readonly
AS
BEGIN
    WITH all_node_relations AS 
    (
        SELECT ParentNodeId, CatalogNodeId AS ChildNodeId FROM CatalogNode
        WHERE ParentNodeId > 0
        UNION
        SELECT ParentNodeId, ChildNodeId FROM CatalogNodeRelation
    ),
    hierarchy AS
    (
        SELECT 
            n.CatalogNodeId
			--, '|' + CAST(n.CatalogNodeId AS nvarchar(4000)) + '|' AS CyclePrevention
        FROM @catalogNodeIds n
        UNION ALL
        SELECT
            children.ChildNodeId AS CatalogNodeId
			--, parent.CyclePrevention + CAST(children.ChildNodeId AS nvarchar(4000)) + '|' AS CyclePrevention
        FROM hierarchy parent
        JOIN all_node_relations children ON parent.CatalogNodeId = children.ParentNodeId
        --WHERE CHARINDEX('|' + CAST(children.ChildNodeId AS nvarchar(4000)) + '|', parent.CyclePrevention) = 0
    )
    SELECT CatalogNodeId FROM hierarchy
END

And the improvement is quite impressive (more than I expected), the query completes almost instantly (less than 1s). The read on CatalogNodeRelation significantly reduced

A word of warning here, execution plan can’t be simply compared as-is. If I run two versions side by side, it gives quite misleading comparison

Even though the top one (without the circular reference check) is much faster than the original (the bottom one), SQL Server estimates that the first is slower (almost 2x slower than the second). So execution plan should be used to see what has been done and what is likely the bottleneck inside a query, it should not be used as comparison between queries. In most cases, comparing statistics using set statistics io on is the best way to compare.

If not for the fact that we are changing the behavior of the stored procedure, I would be happy with this approach. The chance of running into circular reference is small, but it is not zero. As we said, we can in theory gating the relation during insert/updating, but that would be too big a change to start with. This is one of constraint as we work at framework level – we have to step carefully to not break anything. A breaking change is bad, but a data corruption is simply unacceptable. I spent a few hours (probably more than I should) trying to optimize the circular reference check, but no better solution is found.

The next approach would be – as we can guess, to make sure that we get rid of the Clustered Index Scan happened on the CatalogNodeRelation table. The solution would be quite simple, a non clustered index on the `ParentNodeId should be enough.

Great success. The performance is comparable with the “non circular reference check” approach.

As adding an index is a non breaking change (and albeit in some cases it can cause performance regression, like in A curious case of SQL execution plan – Quan Mai’s blog (vimvq1987.com) , but it is rare, also, in this case the cardinality of the ParentNodeId is most likely quite well distributed).

That is all for today. Hopefully you learn one thing or two about optimizing queries in your daily works.