Don’t get me wrong, execution plan is one of the best tools at your disposal if you want to optimize a SQL query. No, it is the must have tool. It is not the only tool you will need, but if you have to pick only one, pick it.
But it is important to know that execution plan can be misleading. It is very useful to see where is the bottleneck is within a statement. It is not exactly useful when you need to compare two statements.
Let’s compare these two queries that I am working to optimize
SELECT OG.OrderGroupId
FROM OrderGroup OG
INNER JOIN OrderGroup_PurchaseOrder PO ON OG.OrderGroupId = PO.ObjectId WHERE 1 = 1 AND OG.Status IN(SELECT Item FROM ecf_splitlist('Cancelled')) ORDER BY OG.OrderGroupId DESC
OFFSET 0 ROWS
FETCH NEXT 50 ROWS ONLY
versus
SELECT OG.OrderGroupId
FROM OrderGroup OG
INNER JOIN OrderGroup_PurchaseOrder PO ON OG.OrderGroupId = PO.ObjectId WHERE 1 = 1 AND OG.Status IN('Cancelled') ORDER BY OG.OrderGroupId DESC
OFFSET 0 ROWS
FETCH NEXT 50 ROWS ONLY
These are 99% similar, except for the statement OG.Status IN ..., with and without calling the split function.
If you look at the execution plan only, it seems the former is much faster than the latter. It takes only 14% of the time, while the latter takes 86%, so if based on those figures only, we might think the first one is ~6 times faster than the second one.
Except it is not. If we turn on the IO statistics, it is a very different story
The first query has significantly more IO operations than the second
The first has slightly more logical reads on OrderGroup and OrderGroup_PurchaseOrder, but significantly more in a temp table (which is, inside the ecf_splitlist function).
The moral of the story? Execution plan is helpful, but not to compare query to query. In most cases, IO statistics are much more useful.
While we always want to have fast database queries, it is not doable all the time. Sometimes we need to run slow queries, and we need to tell the underlying framework that this query can take some time to complete, and we’re fine with it. Otherwise, it will try to terminate the query after 30 seconds (the default time out limit)
There is a different between connection timeout and command timeout. Connection timeout is the time .NET will try to connect to the database before giving up. Command timeout is the time .NET will try to execute a command before giving up.
Things are a bit complicated when it comes to command timeout with .NET 5 and up. With later versions of Microsoft.Data.SqlClient, it is possible to set command timeout directly using connection string. It is indeed a simple way to do it, but with a caveat.
The new setting is not recognized by Entity Framework/Entity Framework Core, and it will throw exception if you try to access a connection string with command timeout setting. It has another way to set the command timeout itself by each DbContext , but it does not accept the setting via Connection string. It will throw “Keyword not supported: ‘command timeout'” if such setting is present.
The workaround is to configure the command timeout for EcfSqlConnection connection string, and another different connection string without command timeout just for Entity Framework.
However, that’s with a caveat: using command timeout in connection string means that value applies to all database queries. As we discussed in the previous post above, doing so is not without drawbacks – it hides slow queries rather than let it fails. A failed query might not bring down your website, but an overloaded database will likely do.
In Commerce 14.15.6 which should be released shortly, we introduce a new setting SqlDataProviderOptions.CommandTimeout which let you set the command timeout for queries that are using SqlDataProvider – most notably the MetaDataPlus system like orders.
The important advantage of the new setting is that you can set it on the fly. If you know that some operation will be slow, you can set the command timeout to a higher value just for that operation, then set it back to default value. In most cases, you can leave it to default value (30 seconds), and do optimization on application level (reduce batch size for example) or database layer (rewrite the query, adding/optimizing indexes etc.). But sometimes you know the query would be slow and you acknowledge that – this is the tool.
It’s not a secret, I love optimizing things. In a sense, I am both an Optimizer (literally) and an optimizer. And today we will be back to basic – optimizing a tricky SQL query.
The query in question is this particular stored procedure ecf_CatalogNode_GetAllChildNodes, this is used to get all children nodes of specific nodes. It is used in between to find all entries that are direct, or indirect children of specific nodes. Why, you might ask, because when you change the url segment of the node, you want to make sure that all entries that are under that node, will have their indexed object refreshed.
Let’s take a look at this stored procedure, this is how it looks like
CREATE PROCEDURE [dbo].[ecf_CatalogNode_GetAllChildNodes]
@catalogNodeIds udttCatalogNodeList readonly
AS
BEGIN
WITH all_node_relations AS
(
SELECT ParentNodeId, CatalogNodeId AS ChildNodeId FROM CatalogNode
WHERE ParentNodeId > 0
UNION
SELECT ParentNodeId, ChildNodeId FROM CatalogNodeRelation
),
hierarchy AS
(
SELECT
n.CatalogNodeId,
'|' + CAST(n.CatalogNodeId AS nvarchar(4000)) + '|' AS CyclePrevention
FROM @catalogNodeIds n
UNION ALL
SELECT
children.ChildNodeId AS CatalogNodeId,
parent.CyclePrevention + CAST(children.ChildNodeId AS nvarchar(4000)) + '|' AS CyclePrevention
FROM hierarchy parent
JOIN all_node_relations children ON parent.CatalogNodeId = children.ParentNodeId
WHERE CHARINDEX('|' + CAST(children.ChildNodeId AS nvarchar(4000)) + '|', parent.CyclePrevention) = 0
)
SELECT CatalogNodeId FROM hierarchy
END
I previously wrote about the relations between entities in Commerce catalog, here Commerce relation(ship), a story – Quan Mai’s blog (vimvq1987.com) , so relations between nodes can be a bit complicated – a node can have one true parent defined in CatalogNode table, and then other “linked” nodes in CatalogNodeRelation . So to find all children – and grand children of a node, you need to get from both.
Getting children of a node from CatalogNode or CatalogNodeRelation is simple, but things become more complicated when you have to get grandchildren, then great-grandchildren, and so on, and so forth. with that, CTE needs to be used in a recursive way. But then there is a problem arises – there is a chance, small, but still, that the data was added in a correct way, so circular reference is possible. i.e. A is a parent of B, which is a parent of C, and itself is a parent of A. To stop the SP from running forever, a check needs to be added to make sure any circular reference is cut short.
This brings back memory as the first ever support case I worked on at Optimizely (then Episerver) was with a circular reference. The site would crash whenever someone visited the catalog management in Commerce Manager. That was around June, 2012 (feeling old now?). My “boss” at that time involuntarily volunteered me for the case. See what you made me do, boss.
Now you can grasp the basic of what the SP does – let’s get back to the original problem. it’s slow to run especially with big catalog and complex node structure. As always, to optimize everything you need to find the bottleneck – time to fire up SQL Server Management Studio and turn on the Actual Execution Plan
I decided to go with 66, the “root” catalog node. this query yield around 18k rows
Mind you, this is on my machine with pretty powerful CPU (AMD Ryzen 7 5800x, 8 cores 16 threads), and a very fast nvme PCIe SSD (Western Digital Black SN850 2TB). If this was executed on Azure Sql database for example, a timeout is almost certainly guaranteed. So time of execution should only be compared relatively with each other.
If we look at the execution plan, it is quite obvious where the bottleneck is. A scan on CatalogNode table is heavy (it read 79M rows on that operation). As suggest by Anders from Timeout when deleting CatalogNodes from a large catalog (optimizely.com), adding a non clustered index on ParentNodeId column would improve it quite a lot. And indeed it does. The execution time is reduced to 5 second.
And the number of rows read on CatalogNode reduced to just 17k
This is of course a very nice improvement. But the customer reported that it is not enough and the SP is still giving timeout, i.e. further optimization is needed.
Naturally, the next step would be to see if we can skip the circular check. It was added as a safe measure to avoid bad data. It should not be there, as the check should be performed at data modification. But it is there for historical reasons and we can’t just change it, not trivially. So let’s try it for our curiousity.
The modified query looks like this (basically just commented out any code related to the CyclePrevention
ALTER PROCEDURE [dbo].[ecf_CatalogNode_GetAllChildNodes]
@catalogNodeIds udttCatalogNodeList readonly
AS
BEGIN
WITH all_node_relations AS
(
SELECT ParentNodeId, CatalogNodeId AS ChildNodeId FROM CatalogNode
WHERE ParentNodeId > 0
UNION
SELECT ParentNodeId, ChildNodeId FROM CatalogNodeRelation
),
hierarchy AS
(
SELECT
n.CatalogNodeId
--, '|' + CAST(n.CatalogNodeId AS nvarchar(4000)) + '|' AS CyclePrevention
FROM @catalogNodeIds n
UNION ALL
SELECT
children.ChildNodeId AS CatalogNodeId
--, parent.CyclePrevention + CAST(children.ChildNodeId AS nvarchar(4000)) + '|' AS CyclePrevention
FROM hierarchy parent
JOIN all_node_relations children ON parent.CatalogNodeId = children.ParentNodeId
--WHERE CHARINDEX('|' + CAST(children.ChildNodeId AS nvarchar(4000)) + '|', parent.CyclePrevention) = 0
)
SELECT CatalogNodeId FROM hierarchy
END
And the improvement is quite impressive (more than I expected), the query completes almost instantly (less than 1s). The read on CatalogNodeRelation significantly reduced
A word of warning here, execution plan can’t be simply compared as-is. If I run two versions side by side, it gives quite misleading comparison
Even though the top one (without the circular reference check) is much faster than the original (the bottom one), SQL Server estimates that the first is slower (almost 2x slower than the second). So execution plan should be used to see what has been done and what is likely the bottleneck inside a query, it should not be used as comparison between queries. In most cases, comparing statistics using set statistics io on is the best way to compare.
If not for the fact that we are changing the behavior of the stored procedure, I would be happy with this approach. The chance of running into circular reference is small, but it is not zero. As we said, we can in theory gating the relation during insert/updating, but that would be too big a change to start with. This is one of constraint as we work at framework level – we have to step carefully to not break anything. A breaking change is bad, but a data corruption is simply unacceptable. I spent a few hours (probably more than I should) trying to optimize the circular reference check, but no better solution is found.
The next approach would be – as we can guess, to make sure that we get rid of the Clustered Index Scan happened on the CatalogNodeRelation table. The solution would be quite simple, a non clustered index on the `ParentNodeId should be enough.
Great success. The performance is comparable with the “non circular reference check” approach.
As adding an index is a non breaking change (and albeit in some cases it can cause performance regression, like in A curious case of SQL execution plan – Quan Mai’s blog (vimvq1987.com) , but it is rare, also, in this case the cardinality of the ParentNodeId is most likely quite well distributed).
That is all for today. Hopefully you learn one thing or two about optimizing queries in your daily works.
But because of its power and flexibility, it can be complicated to get right. People usually stop at making the query works. Performance is usually an after thought, as it is only visible on production environment when there are enough requests to bring your database to its knees.
Let me be very clear about it: during my years helping customers with performance issues (and you can guess, that is a lot of customers), order search is one of the most, if not the most common cause of database spikes.
As your commerce database is brought to its knees, your entire website performance suffers. Your response time suffers. Your visitors are unhappy and that makes your business suffer.
But what is so bad about order search?
Order search allows you to find orders by almost any criteria. And to do that, you often join with different tables in the database. Search for orders with specific line items? Join with LineItem table on a match of CatalogEntryId column. Search for orders with a specific shipping method? Join with Shipment table on a match of ShippingMethodId etc. etc. SqlWhereClause and SqlMetaWhereClause of OrderSearchParameters are extremely flexible, and that is both a cure, and a curse.
Let’s examine the first example in closer details. The query is easy to write. But don’t you know that there is no index on the CatalogEntryId column? That means every request to search order, end up in a full table scan of LineItem.
There are two bad news into that: your LineItem table usually have many rows already, which makes that scan slow, and resource intensive. And as it’s an ever growing table, the situation only gets worse over time.
That is only a start, and a simple one, because that can be resolved by adding an index on CatalogEntryId , but there are more complicated cases when adding an index simply can’t solve the problem – because there is no good one. For example if you search for orders with custom fields, but only of type bit . Bit is essentially the worst type when it comes to index-ability, so your indexes will be much less effective than you want it to be. A full table scan will likely be used.
In short:
Order search is flexible, and powerful. But, “With great power come great responsibility”. Think about what you join on your SqlWhereClause and SqlMetaWhereClause statements, and if your query is covered by an index, or if adding an index will make senses in this case (I have a few guidelines here for a good index https://vimvq1987.com/index-or-no-index-thats-the-question/). Or if you can limit the number of the orders you search for.
If you have been developing with Episerver CMS for a while, you probably know about its embedded “ORM”, called Dynamic Data Store, or DDS for short. It allows you to define strongly typed types which are mapped to database directly to you. You don’t have to create the table(s), don’t have to write stored procedures to insert/query/delete data. Sounds very convenient, right? The fact is, DDS is quite frequently used, and more often than you might think, mis-used.
As Joel Spolsky once said Every abstraction is leaky, an ORM will likely make you forget about the nature of the RDBMS under neath, and that can cause performance problems, sometime severe problems.
Let me make it clear to you
DDS is slow, and it is not suitable for big sets of data.
If you want to store a few settings for your website, DDS should be fine. However, if you are thinking about hundreds of items, it is probably worth looking else. Thousands and more items, then it would be a NO.
I did spend some time trying to bench mark the DDS to see how bad it is. A simple test is to add 10.000 items to a store, then query by each item, then deleted by each item, to see how long does it take
The item is defined like this, this is just another boring POCO:
internal class ShippingArea : IDynamicData
{
public Identity Id { get; set; }
public string PostCode { get; set; }
public string Area { get; set; }
public DateTime Expires { get; set; }
}
Then I have some quick and dirty code in QuickSilver ProductController.Index to measure the time (You will have to forgive some bad coding practices here ;). As usual StopWatch should be used on demonstration only, it should not be used in production. If you want a good break down of your code execution, use tools like dotTrace. If you want to measure production performance, use some monitoring system like NewRelic or Azure Application Insights ):
var shippingAreaStore = ServiceLocator.Current.GetInstance<ShippingAreaStore>();
var dictionary = new Dictionary<string, string>();
for (int i = 0; i < 10000; i++)
{
dictionary[RandomString(6)] = RandomString(10);
}
var identities = new List<ShippingArea>();
var sw = new Stopwatch();
sw.Start();
foreach (var pair in dictionary)
{
shippingAreaStore.CreateNew(pair.Key, pair.Value);
}
sw.Stop();
_logger.Error($"Creating 10000 items took {sw.ElapsedMilliseconds}");
sw.Restart();
foreach (var pair in dictionary)
{
Dictionary<string, object> parameters = new Dictionary<string, object>();
parameters.Add("PostCode", pair.Key);
parameters.Add("Area", pair.Value);
identities.AddRange(shippingAreaStore.Find(parameters));
}
sw.Stop();
_logger.Error($"Querying 10000 items took {sw.ElapsedMilliseconds}");
sw.Restart();
foreach (var id in identities)
{
shippingAreaStore.Delete(id);
}
sw.Stop();
_logger.Error($"Deleting 10000 items took {sw.ElapsedMilliseconds}");
Everything is ready. So a few tries gave us a fairly stable result:
2019-12-02 13:33:01,574 Creating 10000 items took 11938
2019-12-02 13:34:59,594 Querying 10000 items took 118009
2019-12-02 13:35:24,728 Deleting 10000 items took 25131
And this is strictly single-threaded, the site will certainly perform worse when it comes to real site with a lot of traffic, and thus multiple insert-query-delete at the same time.
Can we do better?
There is a little better attribute that many people don’t know about DDS: you can mark a field as indexed, by adding [EPiServerDataIndex] attribute to the properties. The new class would look like this.
[EPiServerDataStore]
internal class ShippingArea : IDynamicData
{
public Identity Id { get; set; }
[EPiServerDataIndex]
public string PostCode { get; set; }
[EPiServerDataIndex]
public string Area { get; set; }
public DateTime Expires { get; set; }
}
If you peek into the database during the test, you can see that the data is now being written to Indexed_String01 and Indexed_String02 columns, instead of String01 and String02 as without the attributes. Such changes give us quite drastic improvement:
2019-12-02 15:38:16,376 Creating 10000 items took 7741
2019-12-02 15:38:19,245 Querying 10000 items took 2867
2019-12-02 15:38:44,266 Deleting 10000 items took 25019
The querying benefits greatly from the new index, as it no longer has to do a Clustered Index Scan, it can now do a non clustered index seek + Key look up. Deleting is still equally slow, because the delete is done by a Clustered Index delete on the Id column, which we already have, and the index on an Uniqueidentifier column is not the most effective one.
Before you are happy which such improvement, keep in mind that there are two indexes added for Indexed_String01 and Indexed_String02 separately. Naturally, we would want a combination, clustered even, on those columns, but we just can’t.
What if we want to go bare metal and create a table ourselves, write the query ourselves? Our repository would look like this
public class ShippingAreaStore2
{
private readonly IDatabaseExecutor _databaseExecutor;
public ShippingAreaStore2(IDatabaseExecutor databaseExecutor)
{
_databaseExecutor = databaseExecutor;
}
/// <summary>
/// Creates and stores a new token.
/// </summary>
/// <param name="blobId">The id of the blob for which the token is valid.</param>
/// <returns>The id of the new token.</returns>
internal virtual ShippingArea CreateNew(string postCode, string area)
{
var token = new ShippingArea
{
Id = Identity.NewIdentity(),
PostCode = postCode,
Area = area,
Expires = DateTime.UtcNow.AddDays(1)
};
_databaseExecutor.Execute(() =>
{
var cmd = _databaseExecutor.CreateCommand();
cmd.CommandText = "ShippingArea_Add";
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.Add(_databaseExecutor.CreateParameter("Id", token.Id.ExternalId));
cmd.Parameters.Add(_databaseExecutor.CreateParameter("PostCode", token.PostCode));
cmd.Parameters.Add(_databaseExecutor.CreateParameter("Area", token.Area));
cmd.Parameters.Add(_databaseExecutor.CreateParameter("Expires", token.Expires));
cmd.ExecuteNonQuery();
});
return token;
}
internal virtual IEnumerable<ShippingArea> Find(IDictionary<string, object> parameters)
{
return _databaseExecutor.Execute<IEnumerable<ShippingArea>>(() =>
{
var areas = new List<ShippingArea>();
var cmd = _databaseExecutor.CreateCommand();
cmd.CommandText = "ShippingArea_Find";
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.Add(_databaseExecutor.CreateParameter("PostCode", parameters.Values.First()));
cmd.Parameters.Add(_databaseExecutor.CreateParameter("Area", parameters.Values.Last()));
var reader = cmd.ExecuteReader();
while (reader.Read())
{
areas.Add(new ShippingArea
{
Id = (Guid)reader["Id"],
PostCode = (string)reader["PostCode"],
Area = (string)reader["Area"],
Expires = (DateTime)reader["Expires"]
});
}
return areas;
});
}
/// <summary>
/// Deletes a token from the store.
/// </summary>
/// <param name="token">The token to be deleted.</param>
internal virtual void Delete(ShippingArea area)
{
_databaseExecutor.Execute(() =>
{
var cmd = _databaseExecutor.CreateCommand();
cmd.CommandText = "ShippingArea_Delete";
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.Add(_databaseExecutor.CreateParameter("PostCode", area.PostCode));
cmd.Parameters.Add(_databaseExecutor.CreateParameter("Area", area.Area));
cmd.ExecuteNonQuery();
});
}
}
And those would give us the results:
2019-12-02 16:44:14,785 Creating 10000 items took 2977
2019-12-02 16:44:17,114 Querying 10000 items took 2315
2019-12-02 16:44:20,307 Deleting 10000 items took 3190
Moral of the story?
DDS is slow and you should be avoid using it if you are working with fairly big set of data. If you have to use DDS for whatever reason, make sure to at least try to index the columns that you query the most.
And in the end of the days, hand-crafted custom table + query beats everything. Remember that you can use some tools like Dapper to do most of the works for you.
This post is more of a note-to-self. These are the useful T-SQL statements which can be incredibly useful in development and troubleshooting
SET STATISTICS IO ON
Turn on the IO statistics for statements run after that until set to OFF explicitly. We then switch to Messages tab to see how many IO operations were done on each table.
SET STATISTICS TIME ON
Find out about the statements were executed: which statements, its texts, how many reads (logical), how many time was spent on CPU and how many time was spent total
This time, we will talk about ecfVersion_ListFiltered, again.
This stored procedure was previously the subject of several blog posts regarding SQL Server performance optimizations. When I thought it is perfect (in term of performance), I learned something more.
Recently we received a performance report from a customer asking about an issue after upgrading from Commerce 10.4.2 to Commerce 10.8 (the last version before Commerce 11). The job “Publish Delayed Content Versions” starts to throw timeout exceptions.
This scheduled job calls to a ecfVersion_ListFiltered to load the content versions which are in status DelayedPublish, it looks like this when it reaches SQL Server:
This query is known to be slow. The reason is quite obvious – Status contains only 5 or 6 distinct values, so it’s not indexed. SQL Server will have to do a Clustered Index Scan, and if ecfVersion is big enough, it’s inevitably slow.
This is a continuation of my previous post about paging in SQL Server. When it comes to paging, you would naturally want to know the total number of rows satisfying, so you can display some nice, useful information to your end-users.
You would think, well, it’s just a count, and a simple query like this would be enough:
SELECT COUNT(Id) FROM MySecretTable
There should be nothing to worry about, right? Actually, there is.
Let’s get back to the example in previous post – we have to count the total number of orders in that big table.
SELECT COUNT(ObjectId) FROM OrderGroup_PurchaseOrder
Because ObjectId is the clustered index of OrderGroup_PurchaseOrder, I did expect it to be use that index and be pretty fast. But does it? To my surprises, no.
No this is not really “art” – I’m just trying to have a more clickbait title. It’s more about understanding what you have at your disposal and use them for your benefits – in this case – how new SQL statement can drastically improve your performance.
In this blogpost we will look into paging feature of SQL Server. in Commerce we usually work with large set of data – millions of rows are fairly common, and it’s natural to load data by page. There is no point loading thousands, or even millions of rows in one go. First it’s not practical to display all of them. Second you’ll likely end up with an timeout exception and/or an out of memory exception. Even if you are lucky enough to get through, it’s still able to take your SQL Server instance to a knee, and transferring that much data over network will be another bottleneck for your system. So my friends, the best practice for loading data is to do it by batches, and to not load everything at once.
At Episerver development team, we understand the importance of good performance. Who would not like a lightning fast website? We work hard to ensure the framework is fast, and we seize (almost) every opportunity to make it faster.
You know in Commerce 10.2 we introduced a new cart mode – serializable cart, and it’s proven to bring great performance compared to the “old/traditional” approach. Our own tests showed an improvement of 3-5x times faster. But can it be even faster? Probably yes.
And actually we did some improvements in later versions. In the scope of this blog post, we will just focus into a specific aspect – and to learn a little more about SQL Server performance optimization.