But because of its power and flexibility, it can be complicated to get right. People usually stop at making the query works. Performance is usually an after thought, as it is only visible on production environment when there are enough requests to bring your database to its knees.
Let me be very clear about it: during my years helping customers with performance issues (and you can guess, that is a lot of customers), order search is one of the most, if not the most common cause of database spikes.
Trust me, you never want to your database looks like this
As your commerce database is brought to its knees, your entire website performance suffers. Your response time suffers. Your visitors are unhappy and that makes your business suffer.
But what is so bad about order search?
Order search allows you to find orders by almost any criteria. And to do that, you often join with different tables in the database. Search for orders with specific line items? Join with LineItem table on a match of CatalogEntryId column. Search for orders with a specific shipping method? Join with Shipment table on a match of ShippingMethodId etc. etc. SqlWhereClause and SqlMetaWhereClause of OrderSearchParameters are extremely flexible, and that is both a cure, and a curse.
Let’s examine the first example in closer details. The query is easy to write. But don’t you know that there is no index on the CatalogEntryId column? That means every request to search order, end up in a full table scan of LineItem.
There are two bad news into that: your LineItem table usually have many rows already, which makes that scan slow, and resource intensive. And as it’s an ever growing table, the situation only gets worse over time.
That is only a start, and a simple one, because that can be resolved by adding an index on CatalogEntryId , but there are more complicated cases when adding an index simply can’t solve the problem – because there is no good one. For example if you search for orders with custom fields, but only of type bit . Bit is essentially the worst type when it comes to index-ability, so your indexes will be much less effective than you want it to be. A full table scan will likely be used.
In short:
Order search is flexible, and powerful. But, “With great power come great responsibility”. Think about what you join on your SqlWhereClause and SqlMetaWhereClause statements, and if your query is covered by an index, or if adding an index will make senses in this case (I have a few guidelines here for a good index https://vimvq1987.com/index-or-no-index-thats-the-question/). Or if you can limit the number of the orders you search for.
If you have been developing with Episerver CMS for a while, you probably know about its embedded “ORM”, called Dynamic Data Store, or DDS for short. It allows you to define strongly typed types which are mapped to database directly to you. You don’t have to create the table(s), don’t have to write stored procedures to insert/query/delete data. Sounds very convenient, right? The fact is, DDS is quite frequently used, and more often than you might think, mis-used.
As Joel Spolsky once said Every abstraction is leaky, an ORM will likely make you forget about the nature of the RDBMS under neath, and that can cause performance problems, sometime severe problems.
Let me make it clear to you
DDS is slow, and it is not suitable for big sets of data.
If you want to store a few settings for your website, DDS should be fine. However, if you are thinking about hundreds of items, it is probably worth looking else. Thousands and more items, then it would be a NO.
I did spend some time trying to bench mark the DDS to see how bad it is. A simple test is to add 10.000 items to a store, then query by each item, then deleted by each item, to see how long does it take
The item is defined like this, this is just another boring POCO:
internal class ShippingArea : IDynamicData
{
public Identity Id { get; set; }
public string PostCode { get; set; }
public string Area { get; set; }
public DateTime Expires { get; set; }
}
Then I have some quick and dirty code in QuickSilver ProductController.Index to measure the time (You will have to forgive some bad coding practices here ;). As usual StopWatch should be used on demonstration only, it should not be used in production. If you want a good break down of your code execution, use tools like dotTrace. If you want to measure production performance, use some monitoring system like NewRelic or Azure Application Insights ):
var shippingAreaStore = ServiceLocator.Current.GetInstance<ShippingAreaStore>();
var dictionary = new Dictionary<string, string>();
for (int i = 0; i < 10000; i++)
{
dictionary[RandomString(6)] = RandomString(10);
}
var identities = new List<ShippingArea>();
var sw = new Stopwatch();
sw.Start();
foreach (var pair in dictionary)
{
shippingAreaStore.CreateNew(pair.Key, pair.Value);
}
sw.Stop();
_logger.Error($"Creating 10000 items took {sw.ElapsedMilliseconds}");
sw.Restart();
foreach (var pair in dictionary)
{
Dictionary<string, object> parameters = new Dictionary<string, object>();
parameters.Add("PostCode", pair.Key);
parameters.Add("Area", pair.Value);
identities.AddRange(shippingAreaStore.Find(parameters));
}
sw.Stop();
_logger.Error($"Querying 10000 items took {sw.ElapsedMilliseconds}");
sw.Restart();
foreach (var id in identities)
{
shippingAreaStore.Delete(id);
}
sw.Stop();
_logger.Error($"Deleting 10000 items took {sw.ElapsedMilliseconds}");
Everything is ready. So a few tries gave us a fairly stable result:
2019-12-02 13:33:01,574 Creating 10000 items took 11938
2019-12-02 13:34:59,594 Querying 10000 items took 118009
2019-12-02 13:35:24,728 Deleting 10000 items took 25131
And this is strictly single-threaded, the site will certainly perform worse when it comes to real site with a lot of traffic, and thus multiple insert-query-delete at the same time.
Can we do better?
There is a little better attribute that many people don’t know about DDS: you can mark a field as indexed, by adding [EPiServerDataIndex] attribute to the properties. The new class would look like this.
[EPiServerDataStore]
internal class ShippingArea : IDynamicData
{
public Identity Id { get; set; }
[EPiServerDataIndex]
public string PostCode { get; set; }
[EPiServerDataIndex]
public string Area { get; set; }
public DateTime Expires { get; set; }
}
If you peek into the database during the test, you can see that the data is now being written to Indexed_String01 and Indexed_String02 columns, instead of String01 and String02 as without the attributes. Such changes give us quite drastic improvement:
2019-12-02 15:38:16,376 Creating 10000 items took 7741
2019-12-02 15:38:19,245 Querying 10000 items took 2867
2019-12-02 15:38:44,266 Deleting 10000 items took 25019
The querying benefits greatly from the new index, as it no longer has to do a Clustered Index Scan, it can now do a non clustered index seek + Key look up. Deleting is still equally slow, because the delete is done by a Clustered Index delete on the Id column, which we already have, and the index on an Uniqueidentifier column is not the most effective one.
Before you are happy which such improvement, keep in mind that there are two indexes added for Indexed_String01 and Indexed_String02 separately. Naturally, we would want a combination, clustered even, on those columns, but we just can’t.
What if we want to go bare metal and create a table ourselves, write the query ourselves? Our repository would look like this
public class ShippingAreaStore2
{
private readonly IDatabaseExecutor _databaseExecutor;
public ShippingAreaStore2(IDatabaseExecutor databaseExecutor)
{
_databaseExecutor = databaseExecutor;
}
/// <summary>
/// Creates and stores a new token.
/// </summary>
/// <param name="blobId">The id of the blob for which the token is valid.</param>
/// <returns>The id of the new token.</returns>
internal virtual ShippingArea CreateNew(string postCode, string area)
{
var token = new ShippingArea
{
Id = Identity.NewIdentity(),
PostCode = postCode,
Area = area,
Expires = DateTime.UtcNow.AddDays(1)
};
_databaseExecutor.Execute(() =>
{
var cmd = _databaseExecutor.CreateCommand();
cmd.CommandText = "ShippingArea_Add";
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.Add(_databaseExecutor.CreateParameter("Id", token.Id.ExternalId));
cmd.Parameters.Add(_databaseExecutor.CreateParameter("PostCode", token.PostCode));
cmd.Parameters.Add(_databaseExecutor.CreateParameter("Area", token.Area));
cmd.Parameters.Add(_databaseExecutor.CreateParameter("Expires", token.Expires));
cmd.ExecuteNonQuery();
});
return token;
}
internal virtual IEnumerable<ShippingArea> Find(IDictionary<string, object> parameters)
{
return _databaseExecutor.Execute<IEnumerable<ShippingArea>>(() =>
{
var areas = new List<ShippingArea>();
var cmd = _databaseExecutor.CreateCommand();
cmd.CommandText = "ShippingArea_Find";
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.Add(_databaseExecutor.CreateParameter("PostCode", parameters.Values.First()));
cmd.Parameters.Add(_databaseExecutor.CreateParameter("Area", parameters.Values.Last()));
var reader = cmd.ExecuteReader();
while (reader.Read())
{
areas.Add(new ShippingArea
{
Id = (Guid)reader["Id"],
PostCode = (string)reader["PostCode"],
Area = (string)reader["Area"],
Expires = (DateTime)reader["Expires"]
});
}
return areas;
});
}
/// <summary>
/// Deletes a token from the store.
/// </summary>
/// <param name="token">The token to be deleted.</param>
internal virtual void Delete(ShippingArea area)
{
_databaseExecutor.Execute(() =>
{
var cmd = _databaseExecutor.CreateCommand();
cmd.CommandText = "ShippingArea_Delete";
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.Add(_databaseExecutor.CreateParameter("PostCode", area.PostCode));
cmd.Parameters.Add(_databaseExecutor.CreateParameter("Area", area.Area));
cmd.ExecuteNonQuery();
});
}
}
And those would give us the results:
2019-12-02 16:44:14,785 Creating 10000 items took 2977
2019-12-02 16:44:17,114 Querying 10000 items took 2315
2019-12-02 16:44:20,307 Deleting 10000 items took 3190
Moral of the story?
DDS is slow and you should be avoid using it if you are working with fairly big set of data. If you have to use DDS for whatever reason, make sure to at least try to index the columns that you query the most.
And in the end of the days, hand-crafted custom table + query beats everything. Remember that you can use some tools like Dapper to do most of the works for you.
This post is more of a note-to-self. These are the useful T-SQL statements which can be incredibly useful in development and troubleshooting
SET STATISTICS IO ON
Turn on the IO statistics for statements run after that until set to OFF explicitly. We then switch to Messages tab to see how many IO operations were done on each table.
SET STATISTICS TIME ON
Find out about the statements were executed: which statements, its texts, how many reads (logical), how many time was spent on CPU and how many time was spent total
This time, we will talk about ecfVersion_ListFiltered, again.
This stored procedure was previously the subject of several blog posts regarding SQL Server performance optimizations. When I thought it is perfect (in term of performance), I learned something more.
Recently we received a performance report from a customer asking about an issue after upgrading from Commerce 10.4.2 to Commerce 10.8 (the last version before Commerce 11). The job “Publish Delayed Content Versions” starts to throw timeout exceptions.
This scheduled job calls to a ecfVersion_ListFiltered to load the content versions which are in status DelayedPublish, it looks like this when it reaches SQL Server:
This query is known to be slow. The reason is quite obvious – Status contains only 5 or 6 distinct values, so it’s not indexed. SQL Server will have to do a Clustered Index Scan, and if ecfVersion is big enough, it’s inevitably slow.
This is a continuation of my previous post about paging in SQL Server. When it comes to paging, you would naturally want to know the total number of rows satisfying, so you can display some nice, useful information to your end-users.
You would think, well, it’s just a count, and a simple query like this would be enough:
SELECT COUNT(Id) FROM MySecretTable
There should be nothing to worry about, right? Actually, there is.
Let’s get back to the example in previous post – we have to count the total number of orders in that big table.
SELECT COUNT(ObjectId) FROM OrderGroup_PurchaseOrder
Because ObjectId is the clustered index of OrderGroup_PurchaseOrder, I did expect it to be use that index and be pretty fast. But does it? To my surprises, no.
No this is not really “art” – I’m just trying to have a more clickbait title. It’s more about understanding what you have at your disposal and use them for your benefits – in this case – how new SQL statement can drastically improve your performance.
In this blogpost we will look into paging feature of SQL Server. in Commerce we usually work with large set of data – millions of rows are fairly common, and it’s natural to load data by page. There is no point loading thousands, or even millions of rows in one go. First it’s not practical to display all of them. Second you’ll likely end up with an timeout exception and/or an out of memory exception. Even if you are lucky enough to get through, it’s still able to take your SQL Server instance to a knee, and transferring that much data over network will be another bottleneck for your system. So my friends, the best practice for loading data is to do it by batches, and to not load everything at once.
At Episerver development team, we understand the importance of good performance. Who would not like a lightning fast website? We work hard to ensure the framework is fast, and we seize (almost) every opportunity to make it faster.
You know in Commerce 10.2 we introduced a new cart mode – serializable cart, and it’s proven to bring great performance compared to the “old/traditional” approach. Our own tests showed an improvement of 3-5x times faster. But can it be even faster? Probably yes.
And actually we did some improvements in later versions. In the scope of this blog post, we will just focus into a specific aspect – and to learn a little more about SQL Server performance optimization.
I will make it quick and to the point: if you are expecting a lot of customers visiting your site tomorrow (and you should) for Black Friday, you should rebuild your database indexes, now.
On average, it will help you to serve more customers and they will be happier with a more responsive, faster website. On best cases it will help prevent catastrophes.
I have some more time to revisit the query now, and I realized I made a “small” mistake. The “optimized” query is using a Clustered Index Scan
So it’s not as fast as it should be, and it will perform quite poorly in no cache scenario (when the buffer is empty, for example) – it takes about 40s to complete. Yes it’s still better than the original one, both in non cached and cached cases. But it’s not good enough. An index scan, even cached, is not only slower, but also more prone to deadlocks. It’s also worse in best case scenario, when the original one can use the proper index seek.
I said this already, and I will say it again: SQL Server optimizer is smart. I can even go further and say, it’s smarter than you and me (I have no doubt that you are smart, even very, very smart 🙂 ). So most of the cases, you leave it to do whatever it thinks is the best.
But there are cases SQL Server optimizer is fooled by the engine – it gets confused and chooses an sub-optimal plan, because it was given wrong, outdated, or incorrect information. That’s when you need to step in.