Don’t get me wrong, execution plan is one of the best tools at your disposal if you want to optimize a SQL query. No, it is the must have tool. It is not the only tool you will need, but if you have to pick only one, pick it.
But it is important to know that execution plan can be misleading. It is very useful to see where is the bottleneck is within a statement. It is not exactly useful when you need to compare two statements.
Let’s compare these two queries that I am working to optimize
SELECT OG.OrderGroupId
FROM OrderGroup OG
INNER JOIN OrderGroup_PurchaseOrder PO ON OG.OrderGroupId = PO.ObjectId WHERE 1 = 1 AND OG.Status IN(SELECT Item FROM ecf_splitlist('Cancelled')) ORDER BY OG.OrderGroupId DESC
OFFSET 0 ROWS
FETCH NEXT 50 ROWS ONLY
versus
SELECT OG.OrderGroupId
FROM OrderGroup OG
INNER JOIN OrderGroup_PurchaseOrder PO ON OG.OrderGroupId = PO.ObjectId WHERE 1 = 1 AND OG.Status IN('Cancelled') ORDER BY OG.OrderGroupId DESC
OFFSET 0 ROWS
FETCH NEXT 50 ROWS ONLY
These are 99% similar, except for the statement OG.Status IN ..., with and without calling the split function.
If you look at the execution plan only, it seems the former is much faster than the latter. It takes only 14% of the time, while the latter takes 86%, so if based on those figures only, we might think the first one is ~6 times faster than the second one.
Except it is not. If we turn on the IO statistics, it is a very different story
The first query has significantly more IO operations than the second
The first has slightly more logical reads on OrderGroup and OrderGroup_PurchaseOrder, but significantly more in a temp table (which is, inside the ecf_splitlist function).
The moral of the story? Execution plan is helpful, but not to compare query to query. In most cases, IO statistics are much more useful.
It’s not a secret, I love optimizing things. In a sense, I am both an Optimizer (literally) and an optimizer. And today we will be back to basic – optimizing a tricky SQL query.
The query in question is this particular stored procedure ecf_CatalogNode_GetAllChildNodes, this is used to get all children nodes of specific nodes. It is used in between to find all entries that are direct, or indirect children of specific nodes. Why, you might ask, because when you change the url segment of the node, you want to make sure that all entries that are under that node, will have their indexed object refreshed.
Let’s take a look at this stored procedure, this is how it looks like
CREATE PROCEDURE [dbo].[ecf_CatalogNode_GetAllChildNodes]
@catalogNodeIds udttCatalogNodeList readonly
AS
BEGIN
WITH all_node_relations AS
(
SELECT ParentNodeId, CatalogNodeId AS ChildNodeId FROM CatalogNode
WHERE ParentNodeId > 0
UNION
SELECT ParentNodeId, ChildNodeId FROM CatalogNodeRelation
),
hierarchy AS
(
SELECT
n.CatalogNodeId,
'|' + CAST(n.CatalogNodeId AS nvarchar(4000)) + '|' AS CyclePrevention
FROM @catalogNodeIds n
UNION ALL
SELECT
children.ChildNodeId AS CatalogNodeId,
parent.CyclePrevention + CAST(children.ChildNodeId AS nvarchar(4000)) + '|' AS CyclePrevention
FROM hierarchy parent
JOIN all_node_relations children ON parent.CatalogNodeId = children.ParentNodeId
WHERE CHARINDEX('|' + CAST(children.ChildNodeId AS nvarchar(4000)) + '|', parent.CyclePrevention) = 0
)
SELECT CatalogNodeId FROM hierarchy
END
I previously wrote about the relations between entities in Commerce catalog, here Commerce relation(ship), a story โ Quan Mai’s blog (vimvq1987.com) , so relations between nodes can be a bit complicated – a node can have one true parent defined in CatalogNode table, and then other “linked” nodes in CatalogNodeRelation . So to find all children – and grand children of a node, you need to get from both.
Getting children of a node from CatalogNode or CatalogNodeRelation is simple, but things become more complicated when you have to get grandchildren, then great-grandchildren, and so on, and so forth. with that, CTE needs to be used in a recursive way. But then there is a problem arises – there is a chance, small, but still, that the data was added in a correct way, so circular reference is possible. i.e. A is a parent of B, which is a parent of C, and itself is a parent of A. To stop the SP from running forever, a check needs to be added to make sure any circular reference is cut short.
This brings back memory as the first ever support case I worked on at Optimizely (then Episerver) was with a circular reference. The site would crash whenever someone visited the catalog management in Commerce Manager. That was around June, 2012 (feeling old now?). My “boss” at that time involuntarily volunteered me for the case. See what you made me do, boss.
Now you can grasp the basic of what the SP does – let’s get back to the original problem. it’s slow to run especially with big catalog and complex node structure. As always, to optimize everything you need to find the bottleneck – time to fire up SQL Server Management Studio and turn on the Actual Execution Plan
I decided to go with 66, the “root” catalog node. this query yield around 18k rows
Mind you, this is on my machine with pretty powerful CPU (AMD Ryzen 7 5800x, 8 cores 16 threads), and a very fast nvme PCIe SSD (Western Digital Black SN850 2TB). If this was executed on Azure Sql database for example, a timeout is almost certainly guaranteed. So time of execution should only be compared relatively with each other.
If we look at the execution plan, it is quite obvious where the bottleneck is. A scan on CatalogNode table is heavy (it read 79M rows on that operation). As suggest by Anders from Timeout when deleting CatalogNodes from a large catalog (optimizely.com), adding a non clustered index on ParentNodeId column would improve it quite a lot. And indeed it does. The execution time is reduced to 5 second.
And the number of rows read on CatalogNode reduced to just 17k
This is of course a very nice improvement. But the customer reported that it is not enough and the SP is still giving timeout, i.e. further optimization is needed.
Naturally, the next step would be to see if we can skip the circular check. It was added as a safe measure to avoid bad data. It should not be there, as the check should be performed at data modification. But it is there for historical reasons and we can’t just change it, not trivially. So let’s try it for our curiousity.
The modified query looks like this (basically just commented out any code related to the CyclePrevention
ALTER PROCEDURE [dbo].[ecf_CatalogNode_GetAllChildNodes]
@catalogNodeIds udttCatalogNodeList readonly
AS
BEGIN
WITH all_node_relations AS
(
SELECT ParentNodeId, CatalogNodeId AS ChildNodeId FROM CatalogNode
WHERE ParentNodeId > 0
UNION
SELECT ParentNodeId, ChildNodeId FROM CatalogNodeRelation
),
hierarchy AS
(
SELECT
n.CatalogNodeId
--, '|' + CAST(n.CatalogNodeId AS nvarchar(4000)) + '|' AS CyclePrevention
FROM @catalogNodeIds n
UNION ALL
SELECT
children.ChildNodeId AS CatalogNodeId
--, parent.CyclePrevention + CAST(children.ChildNodeId AS nvarchar(4000)) + '|' AS CyclePrevention
FROM hierarchy parent
JOIN all_node_relations children ON parent.CatalogNodeId = children.ParentNodeId
--WHERE CHARINDEX('|' + CAST(children.ChildNodeId AS nvarchar(4000)) + '|', parent.CyclePrevention) = 0
)
SELECT CatalogNodeId FROM hierarchy
END
And the improvement is quite impressive (more than I expected), the query completes almost instantly (less than 1s). The read on CatalogNodeRelation significantly reduced
A word of warning here, execution plan can’t be simply compared as-is. If I run two versions side by side, it gives quite misleading comparison
Even though the top one (without the circular reference check) is much faster than the original (the bottom one), SQL Server estimates that the first is slower (almost 2x slower than the second). So execution plan should be used to see what has been done and what is likely the bottleneck inside a query, it should not be used as comparison between queries. In most cases, comparing statistics using set statistics io on is the best way to compare.
If not for the fact that we are changing the behavior of the stored procedure, I would be happy with this approach. The chance of running into circular reference is small, but it is not zero. As we said, we can in theory gating the relation during insert/updating, but that would be too big a change to start with. This is one of constraint as we work at framework level – we have to step carefully to not break anything. A breaking change is bad, but a data corruption is simply unacceptable. I spent a few hours (probably more than I should) trying to optimize the circular reference check, but no better solution is found.
The next approach would be – as we can guess, to make sure that we get rid of the Clustered Index Scan happened on the CatalogNodeRelation table. The solution would be quite simple, a non clustered index on the `ParentNodeId should be enough.
Great success. The performance is comparable with the “non circular reference check” approach.
As adding an index is a non breaking change (and albeit in some cases it can cause performance regression, like in A curious case of SQL execution plan โ Quan Mai’s blog (vimvq1987.com) , but it is rare, also, in this case the cardinality of the ParentNodeId is most likely quite well distributed).
That is all for today. Hopefully you learn one thing or two about optimizing queries in your daily works.
Or any of its equivalent – FindByEmail/FindById etc.
Why?
Reason? It’s slow. Slow enough to effectively kill your database, and therefore, your website.
If you want dig into the default implementation (which is using EntityFramework), this is what you end up with, either if you are using FindByEmailAsync, or its synchronous equivalent FindByEmail
public virtual Task<TUser> FindByEmailAsync(string email)
{
this.ThrowIfDisposed();
return this.GetUserAggregateAsync((Expression<Func<TUser, bool>>) (u => u.Email.ToUpper() == email.ToUpper()));
}
It finds user by matching the email, but not before it uses ToUpper in both sides of the equation. This is to ensure correctness because an user can register with “[email protected]” but then try to login with “[email protected]”. If the database is set to be CS – case sensitive collation, that is not a match.
That is fine for C#/.NET, but it is bad for SQL Server. When it reaches database, this query is generated
(@p__linq__0 nvarchar(4000))SELECT TOP (1)
[Extent1].[Id] AS [Id],
[Extent1].[NewsLetter] AS [NewsLetter],
[Extent1].[IsApproved] AS [IsApproved],
[Extent1].[IsLockedOut] AS [IsLockedOut],
[Extent1].[Comment] AS [Comment],
[Extent1].[CreationDate] AS [CreationDate],
[Extent1].[LastLoginDate] AS [LastLoginDate],
[Extent1].[LastLockoutDate] AS [LastLockoutDate],
[Extent1].[Email] AS [Email],
[Extent1].[EmailConfirmed] AS [EmailConfirmed],
[Extent1].[PasswordHash] AS [PasswordHash],
[Extent1].[SecurityStamp] AS [SecurityStamp],
[Extent1].[PhoneNumber] AS [PhoneNumber],
[Extent1].[PhoneNumberConfirmed] AS [PhoneNumberConfirmed],
[Extent1].[TwoFactorEnabled] AS [TwoFactorEnabled],
[Extent1].[LockoutEndDateUtc] AS [LockoutEndDateUtc],
[Extent1].[LockoutEnabled] AS [LockoutEnabled],
[Extent1].[AccessFailedCount] AS [AccessFailedCount],
[Extent1].[UserName] AS [UserName]
FROM [dbo].[AspNetUsers] AS [Extent1]
WHERE ((UPPER([Extent1].[Email])) = (UPPER(@p__linq__0))) OR ((UPPER([Extent1].[Email]) IS NULL) AND (UPPER(@p__linq__0) IS NULL))
If you can’t spot the problem – don’t worry because I have seen experienced developers made the same mistake. By using the TOUPPER function on the column you are effectively remove any performance benefit of the index that might be on Email column. That means this query will do an index scan every time it is called. We have the TOP(1) statement which somewhat reduces the impact (it can stop as soon as it finds a match), but if there is no match – e.g. no registered email, it will be a full index scan.
If you have a lot of registered customers, frequent calls to that query can effectively kill your database.
And how to fix it
Fixing this issue will be a bit cumbersome, because the code is well hidden inside the implementation of AspNetIdentity EntityFramework. But it’s not impossible. First we need an UserStore which does not use the Upper for comparison:
public class FoundationUserStore<TUser> : UserStore<TUser> where TUser : IdentityUser, IUIUser, new()
{
public FoundationUserStore(DbContext context)
: base(context)
{ }
public override Task<TUser> FindByEmailAsync(string email)
{
return GetUserAggregateAsync(x => x.Email == email);
}
public override Task<TUser> FindByNameAsync(string name)
{
return GetUserAggregateAsync(x => x.UserName == name);
}
}
And then a new UserManager to use that new UserStore
public class CustomApplicationUserManager<TUser> : ApplicationUserManager<TUser> where TUser : IdentityUser, IUIUser, new()
{
public CustomApplicationUserManager(IUserStore<TUser> store)
: base(store)
{
}
public static new ApplicationUserManager<TUser> Create(IdentityFactoryOptions<ApplicationUserManager<TUser>> options, IOwinContext context)
{
var manager = new ApplicationUserManager<TUser>(new FoundationUserStore<TUser>(context.Get<ApplicationDbContext<TUser>>()));
// Configure validation logic for usernames
manager.UserValidator = new UserValidator<TUser>(manager)
{
AllowOnlyAlphanumericUserNames = false,
RequireUniqueEmail = true
};
// Configure validation logic for passwords
manager.PasswordValidator = new PasswordValidator
{
#if DEBUG
RequiredLength = 2,
RequireNonLetterOrDigit = false,
RequireDigit = false,
RequireLowercase = false,
RequireUppercase = false
#else
RequiredLength = 6,
RequireNonLetterOrDigit = true,
RequireDigit = true,
RequireLowercase = true,
RequireUppercase = true
#endif
};
// Configure user lockout defaults
manager.UserLockoutEnabledByDefault = true;
manager.DefaultAccountLockoutTimeSpan = TimeSpan.FromMinutes(5);
manager.MaxFailedAccessAttemptsBeforeLockout = 5;
var provider = context.Get<ApplicationOptions>().DataProtectionProvider.Create("EPiServerAspNetIdentity");
manager.UserTokenProvider = new DataProtectorTokenProvider<TUser>(provider);
return manager;
}
}
And then a way to register our UserManager
public static IAppBuilder AddCustomAspNetIdentity<TUser>(this IAppBuilder app, ApplicationOptions applicationOptions) where TUser : IdentityUser, IUIUser, new()
{
applicationOptions.DataProtectionProvider = app.GetDataProtectionProvider();
// Configure the db context, user manager and signin manager to use a single instance per request
app.CreatePerOwinContext<ApplicationOptions>(() => applicationOptions);
app.CreatePerOwinContext<ApplicationDbContext<TUser>>(ApplicationDbContext<TUser>.Create);
app.CreatePerOwinContext<ApplicationRoleManager<TUser>>(ApplicationRoleManager<TUser>.Create);
app.CreatePerOwinContext<ApplicationUserManager<TUser>>(CustomApplicationUserManager<TUser>.Create);
app.CreatePerOwinContext<ApplicationSignInManager<TUser>>(ApplicationSignInManager<TUser>.Create);
// Configure the application
app.CreatePerOwinContext<UIUserProvider>(ApplicationUserProvider<TUser>.Create);
app.CreatePerOwinContext<UIRoleProvider>(ApplicationRoleProvider<TUser>.Create);
app.CreatePerOwinContext<UIUserManager>(ApplicationUIUserManager<TUser>.Create);
app.CreatePerOwinContext<UISignInManager>(ApplicationUISignInManager<TUser>.Create);
// Saving the connection string in the case dbcontext be requested from none web context
ConnectionStringNameResolver.ConnectionStringNameFromOptions = applicationOptions.ConnectionStringName;
return app;
}
Finally, replace the normal app.AddAspNetIdentity with this:
As I mentioned, this is cumbersome to do. If you know a better way to do, I’m all ear ;).
We are also skipping the case sensitivity part. In most of the cases, it’ll be fine as you are most likely using CI collation instead. But it’s better to be sure than leave it to chance. We will address that in the second part of this blog post.
Even though it is not the best identity management system in the .NET world, ASP.NET Membership provider is still fairly widely used, especially for systems that have been running for quite long time with a significant amount of users: migrating to a better system like AspNetIdentity does not comes cheap. However, built from early days of ASP.NET mean Membership provider has numerous significant limitations: beside the “architecture” problems, it also has limited performance. Depends on who you ask, the ultimate “maximum” number of customers that ASP.NET membership provider can handle ranges from 30.000 to 750.000. That does not sound great. Today if you start a new project, you should be probably better off with AspNetIdentity or some other solutions, but if your website is using ASP.NET membership provider and there is currently no plan to migrate, then read on.
The one I will be used for this blog post has around 950.000 registered users, and the site is doing great – but that was achieved by some very fine grained performance tuning, and a very high end Azure subscription.
A performance overview
I have been using ASP.NET membership provider for years, but I have never looked into it from performance aspects. (Even though I have done some very nasty digging to their table structure). And now I have the chance, I realize how bad it is.
It’s a fairly common seen in the aspnet_* tables that the indexes have ApplicationId as the first column. It does not take a database master to know it is a very ineffective way to create an index – in most of the cases, you only have on ApplicationId in your website, making those indexes useless when you want to, for example, query by UserId. This is a rookie mistake – a newbie tends to make order of columns in the index as same as they appear in the table, thinking, that that SQL Server will just do magic to exchange the order for the best performance. It’s not how SQL Server – or in general – RDBMS systems work.
It is OK to be a newbie or to misunderstand some concepts. I had the very same misconception once, and learned my lessons. However, it should not be OK for a framework to make that mistake, and never correct it.
That is the beginning of much bigger problems. Because of the ineffective order of columns, the builtin indexes are as almost useless. That makes the queries, which should be very fast, become unnecessarily slow, wasting resources and increasing your site average response time. This is of course bad news. But good news is it’s in database level, so we can change it for the better. It if were in the application level then our chance of doing that is close to none.
Missing indexes
If you use Membership.GetUserNameByEmail on your website a lot, you might notice that it is … slow. It leads to this query:
SELECT u.UserName
FROM dbo.aspnet_Applications a, dbo.aspnet_Users u, dbo.aspnet_Membership m
WHERE LOWER(@ApplicationName) = a.LoweredApplicationName AND
u.ApplicationId = a.ApplicationId AND
u.UserId = m.UserId AND
LOWER(@Email) = m.LoweredEmail
Let’s just ignore the style for now (INNER JOIN would be a much more popular choice), and look into the what is actually done here. So it joins 3 tables by their keys. The join with aspnet_Applications would be fairly simple, because you usually have just one application. The join between aspnet_Users and aspnet_Membership is also simple, because both of them have index on UserId – clustered on aspnet_Users and non-clustered on aspnet_Membership.
The last one is actually problematic. The clustered index on aspnet_Membership actually looks like this
CREATE CLUSTERED INDEX [aspnet_Membership_index]
ON [dbo].[aspnet_Membership]([ApplicationId] ASC, [LoweredEmail] ASC);
Uh oh. Even if this contains LoweredEmail, it’s the worst possible kind of index. By using the least distinctive column in the first, it defeats the purpose of the index completely. Every request to get user name by email address will need to perform a full table scan (oops!)
This is a the last thing you want to see in a execution plan, especially with a fairly big table.
It should have been just
CREATE CLUSTERED INDEX [aspnet_Membership_index]
ON [dbo].[aspnet_Membership]([LoweredEmail] ASC);
which helps SQL Server to use the optimal execution plan
If you look into Azure SQL Database recommendation, it suggest you to create a non clustered index on LoweredEmail. That is not technically incorrect, and it still helps. However, keep in mind that each non clustered index will have to “duplicate” the clustered index, for the purpose of identify the rows, so keeping the useless clustered index actually increases wastes and slows down performance (even just a little, because you have to perform more reads to get the same data). However, if your database is currently performing badly, adding a non clustered index is a much quicker and safer option. The change to clustered index should be done with caution at low traffic time.
Tested the stored procedure on database above, without any additional index
This post is more of a note-to-self. These are the useful T-SQL statements which can be incredibly useful in development and troubleshooting
SET STATISTICS IO ON
Turn on the IO statistics for statements run after that until set to OFF explicitly. We then switch to Messages tab to see how many IO operations were done on each table.
SET STATISTICS TIME ON
Find out about the statements were executed: which statements, its texts, how many reads (logical), how many time was spent on CPU and how many time was spent total
This time, we will talk about ecfVersion_ListFiltered, again.
This stored procedure was previously the subject of several blog posts regarding SQL Server performance optimizations. When I thought it is perfect (in term of performance), I learned something more.
Recently we received a performance report from a customer asking about an issue after upgrading from Commerce 10.4.2 to Commerce 10.8 (the last version before Commerce 11). The job “Publish Delayed Content Versions” starts to throw timeout exceptions.
This scheduled job calls to a ecfVersion_ListFiltered to load the content versions which are in status DelayedPublish, it looks like this when it reaches SQL Server:
This query is known to be slow. The reason is quite obvious – Status contains only 5 or 6 distinct values, so it’s not indexed. SQL Server will have to do a Clustered Index Scan, and if ecfVersion is big enough, it’s inevitably slow.
This is a continuation of my previous post about paging in SQL Server. When it comes to paging, you would naturally want to know the total number of rows satisfying, so you can display some nice, useful information to your end-users.
You would think, well, it’s just a count, and a simple query like this would be enough:
SELECT COUNT(Id) FROM MySecretTable
There should be nothing to worry about, right? Actually, there is.
Let’s get back to the example in previous post – we have to count the total number of orders in that big table.
SELECT COUNT(ObjectId) FROM OrderGroup_PurchaseOrder
Because ObjectIdย is the clustered index of OrderGroup_PurchaseOrder, I did expect it to be use that index and be pretty fast. But does it? To my surprises, no.
No this is not really “art” – I’m just trying to have a more clickbait title. It’s more about understanding what you have at your disposal and use them for your benefits – in this case – how new SQL statement can drastically improve your performance.
In this blogpost we will look into paging feature of SQL Server. in Commerce we usually work with large set of data – millions of rows are fairly common, and it’s natural to load data by page. There is no point loading thousands, or even millions of rows in one go. First it’s not practical to display all of them. Second you’ll likely end up with an timeout exception and/or an out of memory exception. Even if you are lucky enough to get through, it’s still able to take your SQL Server instance to a knee, and transferring that much data over network will be another bottleneck for your system. So my friends, the best practice for loading data is to do it by batches, and to not load everything at once.
I will make it quick and to the point: if you are expecting a lot of customers visiting your site tomorrow (and you should) for Black Friday, you should rebuild your database indexes, now.
On average, it will help you to serve more customers and they will be happier with a more responsive, faster website. On best cases it will help prevent catastrophes.
I said this already, and I will say it again: SQL Server optimizer is smart. I can even go further and say, it’s smarter than you and me (I have no doubt that you are smart, even very, very smart ๐ ). So most of the cases, you leave it to do whatever it thinks is the best.
But there are cases SQL Server optimizer is fooled by the engine – it gets confused and chooses an sub-optimal plan, because it was given wrong, outdated, or incorrect information. That’s when you need to step in.