Don’t use AspNetIdentity FindByEmailAsync/FindByIdAsync

Or any of its equivalent – FindByEmail/FindById etc.

Why?

Reason? It’s slow. Slow enough to effectively kill your database, and therefore, your website.

If you want dig into the default implementation (which is using EntityFramework), this is what you end up with, either if you are using FindByEmailAsync, or its synchronous equivalent FindByEmail

    public virtual Task<TUser> FindByEmailAsync(string email)
    {
      this.ThrowIfDisposed();
      return this.GetUserAggregateAsync((Expression<Func<TUser, bool>>) (u => u.Email.ToUpper() == email.ToUpper()));
    }

It finds user by matching the email, but not before it uses ToUpper in both sides of the equation. This is to ensure correctness because an user can register with “[email protected]” but then try to login with “[email protected]”. If the database is set to be CS – case sensitive collation, that is not a match.

That is fine for C#/.NET, but it is bad for SQL Server. When it reaches database, this query is generated

(@p__linq__0 nvarchar(4000))SELECT TOP (1) 
    [Extent1].[Id] AS [Id], 
    [Extent1].[NewsLetter] AS [NewsLetter], 
    [Extent1].[IsApproved] AS [IsApproved], 
    [Extent1].[IsLockedOut] AS [IsLockedOut], 
    [Extent1].[Comment] AS [Comment], 
    [Extent1].[CreationDate] AS [CreationDate], 
    [Extent1].[LastLoginDate] AS [LastLoginDate], 
    [Extent1].[LastLockoutDate] AS [LastLockoutDate], 
    [Extent1].[Email] AS [Email], 
    [Extent1].[EmailConfirmed] AS [EmailConfirmed], 
    [Extent1].[PasswordHash] AS [PasswordHash], 
    [Extent1].[SecurityStamp] AS [SecurityStamp], 
    [Extent1].[PhoneNumber] AS [PhoneNumber], 
    [Extent1].[PhoneNumberConfirmed] AS [PhoneNumberConfirmed], 
    [Extent1].[TwoFactorEnabled] AS [TwoFactorEnabled], 
    [Extent1].[LockoutEndDateUtc] AS [LockoutEndDateUtc], 
    [Extent1].[LockoutEnabled] AS [LockoutEnabled], 
    [Extent1].[AccessFailedCount] AS [AccessFailedCount], 
    [Extent1].[UserName] AS [UserName]
    FROM [dbo].[AspNetUsers] AS [Extent1]
    WHERE ((UPPER([Extent1].[Email])) = (UPPER(@p__linq__0))) OR ((UPPER([Extent1].[Email]) IS NULL) AND (UPPER(@p__linq__0) IS NULL))

If you can’t spot the problem – don’t worry because I have seen experienced developers made the same mistake. By using the TOUPPER function on the column you are effectively remove any performance benefit of the index that might be on Email column. That means this query will do an index scan every time it is called. We have the TOP(1) statement which somewhat reduces the impact (it can stop as soon as it finds a match), but if there is no match – e.g. no registered email, it will be a full index scan.

If you have a lot of registered customers, frequent calls to that query can effectively kill your database.

And how to fix it

Fixing this issue will be a bit cumbersome, because the code is well hidden inside the implementation of AspNetIdentity EntityFramework. But it’s not impossible. First we need an UserStore which does not use the Upper for comparison:

public class FoundationUserStore<TUser> : UserStore<TUser> where TUser : IdentityUser, IUIUser, new()
{
    public FoundationUserStore(DbContext context)
        : base(context)
    { }

    public override Task<TUser> FindByEmailAsync(string email)
    {
        return GetUserAggregateAsync(x => x.Email == email);
    }

    public override Task<TUser> FindByNameAsync(string name)
    {
        return GetUserAggregateAsync(x => x.UserName == name);
    }
}

And then a new UserManager to use that new UserStore

    public class CustomApplicationUserManager<TUser> : ApplicationUserManager<TUser> where TUser : IdentityUser, IUIUser, new()
    {
        public CustomApplicationUserManager(IUserStore<TUser> store)
            : base(store)
        {
        }

        public static new ApplicationUserManager<TUser> Create(IdentityFactoryOptions<ApplicationUserManager<TUser>> options, IOwinContext context)
        {
            var manager = new ApplicationUserManager<TUser>(new FoundationUserStore<TUser>(context.Get<ApplicationDbContext<TUser>>()));

            // Configure validation logic for usernames
            manager.UserValidator = new UserValidator<TUser>(manager)
            {
                AllowOnlyAlphanumericUserNames = false,
                RequireUniqueEmail = true
            };

            // Configure validation logic for passwords
            manager.PasswordValidator = new PasswordValidator
            {
#if DEBUG
                RequiredLength = 2,
                RequireNonLetterOrDigit = false,
                RequireDigit = false,
                RequireLowercase = false,
                RequireUppercase = false
#else
                RequiredLength = 6,
                RequireNonLetterOrDigit = true,
                RequireDigit = true,
                RequireLowercase = true,
                RequireUppercase = true

#endif
            };

            // Configure user lockout defaults
            manager.UserLockoutEnabledByDefault = true;
            manager.DefaultAccountLockoutTimeSpan = TimeSpan.FromMinutes(5);
            manager.MaxFailedAccessAttemptsBeforeLockout = 5;

            var provider = context.Get<ApplicationOptions>().DataProtectionProvider.Create("EPiServerAspNetIdentity");
            manager.UserTokenProvider = new DataProtectorTokenProvider<TUser>(provider);

            return manager;
        }
    }

And then a way to register our UserManager

    public static IAppBuilder AddCustomAspNetIdentity<TUser>(this IAppBuilder app, ApplicationOptions applicationOptions) where TUser : IdentityUser, IUIUser, new()
    {
        applicationOptions.DataProtectionProvider = app.GetDataProtectionProvider();

        // Configure the db context, user manager and signin manager to use a single instance per request
        app.CreatePerOwinContext<ApplicationOptions>(() => applicationOptions);
        app.CreatePerOwinContext<ApplicationDbContext<TUser>>(ApplicationDbContext<TUser>.Create);
        app.CreatePerOwinContext<ApplicationRoleManager<TUser>>(ApplicationRoleManager<TUser>.Create);
        app.CreatePerOwinContext<ApplicationUserManager<TUser>>(CustomApplicationUserManager<TUser>.Create);
        app.CreatePerOwinContext<ApplicationSignInManager<TUser>>(ApplicationSignInManager<TUser>.Create);

        // Configure the application
        app.CreatePerOwinContext<UIUserProvider>(ApplicationUserProvider<TUser>.Create);
        app.CreatePerOwinContext<UIRoleProvider>(ApplicationRoleProvider<TUser>.Create);
        app.CreatePerOwinContext<UIUserManager>(ApplicationUIUserManager<TUser>.Create);
        app.CreatePerOwinContext<UISignInManager>(ApplicationUISignInManager<TUser>.Create);

        // Saving the connection string in the case dbcontext be requested from none web context
        ConnectionStringNameResolver.ConnectionStringNameFromOptions = applicationOptions.ConnectionStringName;

        return app;
    }

Finally, replace the normal app.AddAspNetIdentity with this:

        app.AddCustomAspNetIdentity<SiteUser>(new ApplicationOptions
        {
            ConnectionStringName = commerceConectionStringName
        });

As I mentioned, this is cumbersome to do. If you know a better way to do, I’m all ear ;).

We are also skipping the case sensitivity part. In most of the cases, it’ll be fine as you are most likely using CI collation instead. But it’s better to be sure than leave it to chance. We will address that in the second part of this blog post.

Fixing ASP.NET Membership performance – part 1

Even though it is not the best identity management system in the .NET world, ASP.NET Membership provider is still fairly widely used, especially for systems that have been running for quite long time with a significant amount of users: migrating to a better system like AspNetIdentity does not comes cheap. However, built from early days of ASP.NET mean Membership provider has numerous significant limitations: beside the “architecture” problems, it also has limited performance. Depends on who you ask, the ultimate “maximum” number of customers that ASP.NET membership provider can handle ranges from 30.000 to 750.000. That does not sound great. Today if you start a new project, you should be probably better off with AspNetIdentity or some other solutions, but if your website is using ASP.NET membership provider and there is currently no plan to migrate, then read on.

The one I will be used for this blog post has around 950.000 registered users, and the site is doing great – but that was achieved by some very fine grained performance tuning, and a very high end Azure subscription.

A performance overview 

I have been using ASP.NET membership provider for years, but I have never looked into it from performance aspects. (Even though I have done some very nasty digging to their table structure). And now I have the chance, I realize how bad it is.

It’s a fairly common seen in the aspnet_* tables that the indexes have ApplicationId as the first column. It does not take a database master to know it is a very ineffective way to create an index – in most of the cases, you only have on ApplicationId in your website, making those indexes useless when you want to, for example, query by UserId. This is a rookie mistake – a newbie tends to make order of columns in the index as same as they appear in the table, thinking, that that SQL Server will just do magic to exchange the order for the best performance. It’s not how SQL Server – or in general – RDBMS systems work.

It is OK to be a newbie or to misunderstand some concepts. I had the very same misconception once, and learned my lessons. However, it should not be OK for a framework to make that mistake, and never correct it.

That is the beginning of much bigger problems. Because of the ineffective order of columns, the builtin indexes are as almost useless. That makes the queries, which should be very fast, become unnecessarily slow, wasting resources and increasing your site average response time. This is of course bad news. But good news is it’s in database level, so we can change it for the better. It if were in the application level then our chance of doing that is close to none.

Missing indexes

If you use Membership.GetUserNameByEmail on your website a lot, you might notice that it is … slow. It leads to this query:

        SELECT  u.UserName
        FROM    dbo.aspnet_Applications a, dbo.aspnet_Users u, dbo.aspnet_Membership m
        WHERE   LOWER(@ApplicationName) = a.LoweredApplicationName AND
                u.ApplicationId = a.ApplicationId    AND
                u.UserId = m.UserId AND
                LOWER(@Email) = m.LoweredEmail

Let’s just ignore the style for now (INNER JOIN would be a much more popular choice), and look into the what is actually done here. So it joins 3 tables by their keys. The join with aspnet_Applications would be fairly simple, because you usually have just one application. The join between aspnet_Users and aspnet_Membership is also simple, because both of them have index on UserId – clustered on aspnet_Users and non-clustered on aspnet_Membership

The last one is actually problematic. The clustered index on aspnet_Membership actually looks like this

CREATE CLUSTERED INDEX [aspnet_Membership_index]
    ON [dbo].[aspnet_Membership]([ApplicationId] ASC, [LoweredEmail] ASC);

Uh oh. Even if this contains LoweredEmail, it’s the worst possible kind of index. By using the least distinctive column in the first, it defeats the purpose of the index completely. Every request to get user name by email address will need to perform a full table scan (oops!)

This is a the last thing you want to see in a execution plan, especially with a fairly big table. 

It should have been just

CREATE CLUSTERED INDEX [aspnet_Membership_index]
    ON [dbo].[aspnet_Membership]([LoweredEmail] ASC);

which helps SQL Server to use the optimal execution plan

If you look into Azure SQL Database recommendation, it suggest you to create a non clustered index on LoweredEmail. That is not technically incorrect, and it still helps. However, keep in mind that each non clustered index will have to “duplicate” the clustered index, for the purpose of identify the rows, so keeping the useless clustered index actually increases wastes and slows down performance (even just a little, because you have to perform more reads to get the same data). However, if your database is currently performing badly, adding a non clustered index is a much quicker and safer option. The change to clustered index should be done with caution at low traffic time.

Tested the stored procedure on database above, without any additional index

Table 'aspnet_Membership'. Scan count 9, logical reads 20101, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'aspnet_Applications'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'aspnet_Users'. Scan count 0, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row affected)

 SQL Server Execution Times:
   CPU time = 237 ms,  elapsed time = 182 ms.

With new non clustered index


(1 row affected)
Table 'aspnet_Applications'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'aspnet_Users'. Scan count 0, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'aspnet_Membership'. Scan count 1, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row affected)

 SQL Server Execution Times:
   CPU time = 15 ms,  elapsed time = 89 ms.

With new clustered index:

(1 row affected)
Table 'aspnet_Applications'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'aspnet_Users'. Scan count 0, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'aspnet_Membership'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row affected)

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 89 ms.

Don’t we have a clear winner?

Useful T-SQL snippets for development and troubleshooting

This post is more of a note-to-self. These are the useful T-SQL statements which can be incredibly useful in development and troubleshooting

SET STATISTICS IO ON

Turn on the IO statistics for statements run after that until set to OFF explicitly. We then switch to Messages tab to see how many IO operations were done on each table.

SET STATISTICS TIME ON

Find out about the statements were executed: which statements, its texts, how many reads (logical), how many time was spent on CPU and how many time was spent total

Continue reading “Useful T-SQL snippets for development and troubleshooting”

A curious case of SQL Server function

This time, we will talk about ecfVersion_ListFiltered, again.

This stored procedure was previously the subject of several blog posts regarding SQL Server performance optimizations. When I thought it is perfect (in term of performance), I learned something more.

Recently we received a performance report from a customer asking about an issue after upgrading from Commerce 10.4.2 to Commerce 10.8 (the last version before Commerce 11). The job “Publish Delayed Content Versions” starts to throw timeout exceptions.

This scheduled job calls to a ecfVersion_ListFiltered to load the content versions which are in status DelayedPublish, it looks like this when it reaches SQL Server:

declare @s [udttIdTable]
insert into @s values(6)
exec dbo.ecfVersion_ListFiltered @Statuses = @s, @StartIndex = 0, @MaxRows = 2147483646

This query is known to be slow. The reason is quite obvious – Status contains only 5 or 6 distinct values, so it’s not indexed. SQL Server will have to do a Clustered Index Scan, and if ecfVersion is big enough, it’s inevitably slow.

Continue reading “A curious case of SQL Server function”

Optimizing T-SQL COUNT

This is a continuation of my previous post about paging in SQL Server. When it comes to paging, you would naturally want to know the total number of rows satisfying, so you can display some nice, useful information to your end-users.

You would think, well, it’s just a count, and a simple query like this would be enough:

SELECT COUNT(Id) FROM MySecretTable

There should be nothing to worry about, right? Actually, there is.

Let’s get back to the example in previous post – we have to count the total number of orders in that big table.

SELECT COUNT(ObjectId) FROM OrderGroup_PurchaseOrder

Because ObjectId is the clustered index of OrderGroup_PurchaseOrder, I did expect it to be use that index and be pretty fast. But does it? To my surprises, no.

Continue reading “Optimizing T-SQL COUNT”

The art of paging

No this is not really “art” – I’m just trying to have a more clickbait title. It’s more about understanding what you have at your disposal and use them for your benefits – in this case – how new SQL statement can drastically improve your performance.

In this blogpost we will look into paging feature of SQL Server. in Commerce we usually work with large set of data – millions of rows are fairly common, and it’s natural to load data by page. There is no point loading thousands, or even millions of rows in one go. First it’s not practical to display all of them. Second you’ll likely end up with an timeout exception and/or an out of memory exception. Even if you are lucky enough to get through, it’s still able to take your SQL Server instance to a knee, and transferring that much data over network will be another bottleneck for your system. So my friends, the best practice for loading data is to do it by batches, and to not load everything at once.

Continue reading “The art of paging”

Please, rebuild your database indexes, now

I will make it quick and to the point: if you are expecting a lot of customers visiting your site tomorrow (and you should) for Black Friday, you should rebuild your database indexes, now.

On average, it will help you to serve more customers and they will be happier with a more responsive, faster website. On best cases it will help prevent catastrophes.

Continue reading “Please, rebuild your database indexes, now”

A curious case of SQL execution plan

I said this already, and I will say it again: SQL Server optimizer is smart. I can even go further and say, it’s smarter than you and me (I have no doubt that you are smart, even very, very smart 🙂 ). So most of the cases, you leave it to do whatever it thinks is the best.

But there are cases SQL Server optimizer is fooled by the engine – it gets confused and chooses an sub-optimal plan, because it was given wrong, outdated, or incorrect information. That’s when you need to step in.

Today I face one case like that, as reported here: http://world.episerver.com/forum/developer-forum/Episerver-Commerce/Thread-Container/2017/10/database-timeout-on-productvariant-update/

Continue reading “A curious case of SQL execution plan”

Import a bacpac to SQL Server

This is more of a note-to-self.

I sometimes have to import a bacpac file from customer’s database (usually from Azure SQL database) to my local machine – . For most of the time it’ll be very easy when the databases are in .bak format, but for .bacpac file it can be pretty complicated.




Sqlpackage.exe is the standard tool to import the .bacpac file, and it can be found with the installation of Visual Studio (for example `C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\Common7\IDE\Extensions\Microsoft\SQLDB\DAC\130`) or SQL Server ( `C:\Program Files (x86)\Microsoft SQL Server\130\DAC\bin` ). Latest version should be used because they can support the older formats (.bacpac exported from older SQL Server version), but not the way around (older version might not support .bacpac files exported from newer SQL Server versions)

Continue reading “Import a bacpac to SQL Server”

Watch your indexes closely

Recently we were tasked to help a customer having a problem with a query. This specific query ate a lot of CPU resources (30-40%) and causing performance problem for other queries – as it slows the entire SQL Server instance down.

Upon investigation, we discovered that the query was accessing a table with an outdated index. The index was supposedly updated in Episerver Commerce 7.10.3, which was released almost 3 years ago.

For some reasons, the index was not updated in customer’s table. Instead of just having to do a index seek, SQL Server was forced to do a full table scan, which is much slower, causing the problem.

If you want to go into details, it’s mdpsp_getchildrenbysegment stored procedure, which looks into UriSegment column of CatalogItemSeo table, previously, the index was like this:

CREATE NONCLUSTERED INDEX [IX_CatalogItemSeo_UniqueSegment_CatalogEntry] ON [dbo].[CatalogItemSeo]
(
    [ApplicationId] ASC
    [UriSegment] ASC,
    [CatalogEntryId] ASC
)

You can see the problem: The order of the index was bad – because ApplicationId was not distinctive (in fact, in most of the cases it’s the same for every row), and because UriSegment was not the first column in the index, this index will not be used if a query uses UriSegment only. Continue reading “Watch your indexes closely”