Don’t use AspNetIdentity FindByEmailAsync

Or any of its equivalent – FindByEmail etc.

Why?

Reason? It’s slow. Slow enough to effectively kill your database, and therefore, your website.

If you want dig into the default implementation (which is using EntityFramework), this is what you end up with, either if you are using FindByEmailAsync, or its synchronous equivalent FindByEmail

    public virtual Task<TUser> FindByEmailAsync(string email)
    {
      this.ThrowIfDisposed();
      return this.GetUserAggregateAsync((Expression<Func<TUser, bool>>) (u => u.Email.ToUpper() == email.ToUpper()));
    }

It finds user by matching the email, but not before it uses ToUpper in both sides of the equation. This is to ensure correctness because an user can register with “[email protected]” but then try to login with “[email protected]”. If the database is set to be CS – case sensitive collation, that is not a match.

That is fine for C#/.NET, but it is bad for SQL Server. When it reaches database, this query is generated

(@p__linq__0 nvarchar(4000))SELECT TOP (1) 
    [Extent1].[Id] AS [Id], 
    [Extent1].[NewsLetter] AS [NewsLetter], 
    [Extent1].[IsApproved] AS [IsApproved], 
    [Extent1].[IsLockedOut] AS [IsLockedOut], 
    [Extent1].[Comment] AS [Comment], 
    [Extent1].[CreationDate] AS [CreationDate], 
    [Extent1].[LastLoginDate] AS [LastLoginDate], 
    [Extent1].[LastLockoutDate] AS [LastLockoutDate], 
    [Extent1].[Email] AS [Email], 
    [Extent1].[EmailConfirmed] AS [EmailConfirmed], 
    [Extent1].[PasswordHash] AS [PasswordHash], 
    [Extent1].[SecurityStamp] AS [SecurityStamp], 
    [Extent1].[PhoneNumber] AS [PhoneNumber], 
    [Extent1].[PhoneNumberConfirmed] AS [PhoneNumberConfirmed], 
    [Extent1].[TwoFactorEnabled] AS [TwoFactorEnabled], 
    [Extent1].[LockoutEndDateUtc] AS [LockoutEndDateUtc], 
    [Extent1].[LockoutEnabled] AS [LockoutEnabled], 
    [Extent1].[AccessFailedCount] AS [AccessFailedCount], 
    [Extent1].[UserName] AS [UserName]
    FROM [dbo].[AspNetUsers] AS [Extent1]
    WHERE ((UPPER([Extent1].[Email])) = (UPPER(@p__linq__0))) OR ((UPPER([Extent1].[Email]) IS NULL) AND (UPPER(@p__linq__0) IS NULL))

If you can’t spot the problem – don’t worry because I have seen experienced developers made the same mistake. By using the TOUPPER function on the column you are effectively remove any performance benefit of the index that might be on Email column. That means this query will do an index scan every time it is called. We have the TOP(1) statement which somewhat reduces the impact (it can stop as soon as it finds a match), but if there is no match – e.g. no registered email, it will be a full index scan.

If you have a lot of registered customers, frequent calls to that query can effectively kill your database.

And how to fix it

Fixing this issue will be a bit cumbersome, because the code is well hidden inside the implementation of AspNetIdentity EntityFramework. But it’s not impossible. First we need an UserStore which does not use the Upper for comparison:

public class FoundationUserStore<TUser> : UserStore<TUser> where TUser : IdentityUser, IUIUser, new()
{
    public FoundationUserStore(DbContext context)
        : base(context)
    { }

    public override Task<TUser> FindByEmailAsync(string email)
    {
        return GetUserAggregateAsync(x => x.Email == email);
    }

    public override Task<TUser> FindByNameAsync(string name)
    {
        return GetUserAggregateAsync(x => x.UserName == name);
    }
}

And then a new UserManager to use that new UserStore

    public class CustomApplicationUserManager<TUser> : ApplicationUserManager<TUser> where TUser : IdentityUser, IUIUser, new()
    {
        public CustomApplicationUserManager(IUserStore<TUser> store)
            : base(store)
        {
        }

        public static new ApplicationUserManager<TUser> Create(IdentityFactoryOptions<ApplicationUserManager<TUser>> options, IOwinContext context)
        {
            var manager = new ApplicationUserManager<TUser>(new FoundationUserStore<TUser>(context.Get<ApplicationDbContext<TUser>>()));

            // Configure validation logic for usernames
            manager.UserValidator = new UserValidator<TUser>(manager)
            {
                AllowOnlyAlphanumericUserNames = false,
                RequireUniqueEmail = true
            };

            // Configure validation logic for passwords
            manager.PasswordValidator = new PasswordValidator
            {
#if DEBUG
                RequiredLength = 2,
                RequireNonLetterOrDigit = false,
                RequireDigit = false,
                RequireLowercase = false,
                RequireUppercase = false
#else
                RequiredLength = 6,
                RequireNonLetterOrDigit = true,
                RequireDigit = true,
                RequireLowercase = true,
                RequireUppercase = true

#endif
            };

            // Configure user lockout defaults
            manager.UserLockoutEnabledByDefault = true;
            manager.DefaultAccountLockoutTimeSpan = TimeSpan.FromMinutes(5);
            manager.MaxFailedAccessAttemptsBeforeLockout = 5;

            var provider = context.Get<ApplicationOptions>().DataProtectionProvider.Create("EPiServerAspNetIdentity");
            manager.UserTokenProvider = new DataProtectorTokenProvider<TUser>(provider);

            return manager;
        }
    }

And then a way to register our UserManager

    public static IAppBuilder AddCustomAspNetIdentity<TUser>(this IAppBuilder app, ApplicationOptions applicationOptions) where TUser : IdentityUser, IUIUser, new()
    {
        applicationOptions.DataProtectionProvider = app.GetDataProtectionProvider();

        // Configure the db context, user manager and signin manager to use a single instance per request
        app.CreatePerOwinContext<ApplicationOptions>(() => applicationOptions);
        app.CreatePerOwinContext<ApplicationDbContext<TUser>>(ApplicationDbContext<TUser>.Create);
        app.CreatePerOwinContext<ApplicationRoleManager<TUser>>(ApplicationRoleManager<TUser>.Create);
        app.CreatePerOwinContext<ApplicationUserManager<TUser>>(CustomApplicationUserManager<TUser>.Create);
        app.CreatePerOwinContext<ApplicationSignInManager<TUser>>(ApplicationSignInManager<TUser>.Create);

        // Configure the application
        app.CreatePerOwinContext<UIUserProvider>(ApplicationUserProvider<TUser>.Create);
        app.CreatePerOwinContext<UIRoleProvider>(ApplicationRoleProvider<TUser>.Create);
        app.CreatePerOwinContext<UIUserManager>(ApplicationUIUserManager<TUser>.Create);
        app.CreatePerOwinContext<UISignInManager>(ApplicationUISignInManager<TUser>.Create);

        // Saving the connection string in the case dbcontext be requested from none web context
        ConnectionStringNameResolver.ConnectionStringNameFromOptions = applicationOptions.ConnectionStringName;

        return app;
    }

Finally, replace the normal app.AddAspNetIdentity with this:

        app.AddCustomAspNetIdentity<SiteUser>(new ApplicationOptions
        {
            ConnectionStringName = commerceConectionStringName
        });

As I mentioned, this is cumbersome to do. If you know a better way to do, I’m all ear ;).

We are also skipping the case sensitivity part. In most of the cases, it’ll be fine as you are most likely using CI collation instead. But it’s better to be sure than leave it to chance. We will address that in the second part of this blog post.

Fixing ASP.NET Membership performance – part 1

Even though it is not the best identity management system in the .NET world, ASP.NET Membership provider is still fairly widely used, especially for systems that have been running for quite long time with a significant amount of users: migrating to a better system like AspNetIdentity does not comes cheap. However, built from early days of ASP.NET mean Membership provider has numerous significant limitations: beside the “architecture” problems, it also has limited performance. Depends on who you ask, the ultimate “maximum” number of customers that ASP.NET membership provider can handle ranges from 30.000 to 750.000. That does not sound great. Today if you start a new project, you should be probably better off with AspNetIdentity or some other solutions, but if your website is using ASP.NET membership provider and there is currently no plan to migrate, then read on.

The one I will be used for this blog post has around 950.000 registered users, and the site is doing great – but that was achieved by some very fine grained performance tuning, and a very high end Azure subscription.

A performance overview 

I have been using ASP.NET membership provider for years, but I have never looked into it from performance aspects. (Even though I have done some very nasty digging to their table structure). And now I have the chance, I realize how bad it is.

It’s a fairly common seen in the aspnet_* tables that the indexes have ApplicationId as the first column. It does not take a database master to know it is a very ineffective way to create an index – in most of the cases, you only have on ApplicationId in your website, making those indexes useless when you want to, for example, query by UserId. This is a rookie mistake – a newbie tends to make order of columns in the index as same as they appear in the table, thinking, that that SQL Server will just do magic to exchange the order for the best performance. It’s not how SQL Server – or in general – RDBMS systems work.

It is OK to be a newbie or to misunderstand some concepts. I had the very same misconception once, and learned my lessons. However, it should not be OK for a framework to make that mistake, and never correct it.

That is the beginning of much bigger problems. Because of the ineffective order of columns, the builtin indexes are as almost useless. That makes the queries, which should be very fast, become unnecessarily slow, wasting resources and increasing your site average response time. This is of course bad news. But good news is it’s in database level, so we can change it for the better. It if were in the application level then our chance of doing that is close to none.

Missing indexes

If you use Membership.GetUserNameByEmail on your website a lot, you might notice that it is … slow. It leads to this query:

        SELECT  u.UserName
        FROM    dbo.aspnet_Applications a, dbo.aspnet_Users u, dbo.aspnet_Membership m
        WHERE   LOWER(@ApplicationName) = a.LoweredApplicationName AND
                u.ApplicationId = a.ApplicationId    AND
                u.UserId = m.UserId AND
                LOWER(@Email) = m.LoweredEmail

Let’s just ignore the style for now (INNER JOIN would be a much more popular choice), and look into the what is actually done here. So it joins 3 tables by their keys. The join with aspnet_Applications would be fairly simple, because you usually have just one application. The join between aspnet_Users and aspnet_Membership is also simple, because both of them have index on UserId – clustered on aspnet_Users and non-clustered on aspnet_Membership

The last one is actually problematic. The clustered index on aspnet_Membership actually looks like this

CREATE CLUSTERED INDEX [aspnet_Membership_index]
    ON [dbo].[aspnet_Membership]([ApplicationId] ASC, [LoweredEmail] ASC);

Uh oh. Even if this contains LoweredEmail, it’s the worst possible kind of index. By using the least distinctive column in the first, it defeats the purpose of the index completely. Every request to get user name by email address will need to perform a full table scan (oops!)

This is a the last thing you want to see in a execution plan, especially with a fairly big table. 

It should have been just

CREATE CLUSTERED INDEX [aspnet_Membership_index]
    ON [dbo].[aspnet_Membership]([LoweredEmail] ASC);

which helps SQL Server to use the optimal execution plan

If you look into Azure SQL Database recommendation, it suggest you to create a non clustered index on LoweredEmail. That is not technically incorrect, and it still helps. However, keep in mind that each non clustered index will have to “duplicate” the clustered index, for the purpose of identify the rows, so keeping the useless clustered index actually increases wastes and slows down performance (even just a little, because you have to perform more reads to get the same data). However, if your database is currently performing badly, adding a non clustered index is a much quicker and safer option. The change to clustered index should be done with caution at low traffic time.

Tested the stored procedure on database above, without any additional index

Table 'aspnet_Membership'. Scan count 9, logical reads 20101, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'aspnet_Applications'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'aspnet_Users'. Scan count 0, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row affected)

 SQL Server Execution Times:
   CPU time = 237 ms,  elapsed time = 182 ms.

With new non clustered index


(1 row affected)
Table 'aspnet_Applications'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'aspnet_Users'. Scan count 0, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'aspnet_Membership'. Scan count 1, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row affected)

 SQL Server Execution Times:
   CPU time = 15 ms,  elapsed time = 89 ms.

With new clustered index:

(1 row affected)
Table 'aspnet_Applications'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'aspnet_Users'. Scan count 0, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'aspnet_Membership'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row affected)

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 89 ms.

Don’t we have a clear winner?

Index or no index, that’s the question

If you do (and you should) care about your Episerver Commerce site performance, you probably know that database access is usually the bottleneck. Allowing SQL Server works smoothly and effectively is a very important key to the great performance.

We are of course, very well aware of this fact, and we have spent a considerable amount of time making sure Commerce database works as fast as we could. Better table schema, better stored procedures, better indexes, … we have done all of that and will continue doing so when we have the chances. (And if you find anything that can be improved, you are very welcome to share your finding with us)

But there are places where the database performance improvement is in your hand.

Continue reading “Index or no index, that’s the question”

Optimizing T-SQL COUNT

This is a continuation of my previous post about paging in SQL Server. When it comes to paging, you would naturally want to know the total number of rows satisfying, so you can display some nice, useful information to your end-users.

You would think, well, it’s just a count, and a simple query like this would be enough:

SELECT COUNT(Id) FROM MySecretTable

There should be nothing to worry about, right? Actually, there is.

Let’s get back to the example in previous post – we have to count the total number of orders in that big table.

SELECT COUNT(ObjectId) FROM OrderGroup_PurchaseOrder

Because ObjectId is the clustered index of OrderGroup_PurchaseOrder, I did expect it to be use that index and be pretty fast. But does it? To my surprises, no.

Continue reading “Optimizing T-SQL COUNT”

Maintaining your indexes

Indexes are crucial to SQL Server performance. Having the right indexes might make the difference of day and night with your application performance – as I once talked here.

However, even having the right indexes is not everything. You have to keep them healthy. Indexes, as any other kinds of storage, is subjected to fragmentation. SQL Server works best if the index structure is compact and continuous, but with all of the inserts/updates/deletes, it’s inevitable to get fragmented. When the fragmentation grows, it starts affecting the performance of SQL Server: Instead of having to read just one page, it now have to read two, which increases both time and resource needed, and so on and so forth.

Continue reading “Maintaining your indexes”

Watch your indexes closely

Recently we were tasked to help a customer having a problem with a query. This specific query ate a lot of CPU resources (30-40%) and causing performance problem for other queries – as it slows the entire SQL Server instance down.

Upon investigation, we discovered that the query was accessing a table with an outdated index. The index was supposedly updated in Episerver Commerce 7.10.3, which was released almost 3 years ago.

For some reasons, the index was not updated in customer’s table. Instead of just having to do a index seek, SQL Server was forced to do a full table scan, which is much slower, causing the problem.

If you want to go into details, it’s mdpsp_getchildrenbysegment stored procedure, which looks into UriSegment column of CatalogItemSeo table, previously, the index was like this:

CREATE NONCLUSTERED INDEX [IX_CatalogItemSeo_UniqueSegment_CatalogEntry] ON [dbo].[CatalogItemSeo]
(
    [ApplicationId] ASC
    [UriSegment] ASC,
    [CatalogEntryId] ASC
)

You can see the problem: The order of the index was bad – because ApplicationId was not distinctive (in fact, in most of the cases it’s the same for every row), and because UriSegment was not the first column in the index, this index will not be used if a query uses UriSegment only. Continue reading “Watch your indexes closely”