Export catalog, with linked assets

If you are already using a PIM system, you can stop reading!

If you have been using Commerce for a while, you probably have seen this screen – yes, in Commerce Manager

This allow you to export a catalog, but without a caveat: the exported catalog, most likely, does not contains any linked assets. The reason for that was the asset content types need to be present at the context of the site. In Commerce Manager, the general advice is to not deploy the content types there for simpler management.

Import/Export are also missing features in Catalog UI compared to Commerce Manager. I wish I could have added it, but given my Dojo skill, it’s better to write something UI-less, and here you go: a controller to let you download a catalog with everything attached. Well, here is the entire code that you can drop into your project and build:

using EPiServer.Data;
using EPiServer.Framework.Blobs;
using EPiServer.Logging;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Net;
using System.Net.Http;
using System.Net.Http.Headers;
using System.Web.Http;
using EPiServer.Commerce.Catalog.ContentTypes;
using Mediachase.Commerce.Catalog;
using Mediachase.Commerce.Catalog.ImportExport;
using System.IO;
using System.IO.Compression;
using System.Text;
using System.Xml;

namespace EPiServer.Personalization.Commerce.CatalogFeed.Internal
{
    /// <summary>
    /// Download a catalog.
    /// </summary>
    public class CatalogExportController : ApiController
    {
        private readonly CatalogImportExport _importExport;
        private readonly IBlobFactory _blobFactory;
        private readonly IContentLoader _contentLoader;
        private readonly ReferenceConverter _referenceConverter;
        internal const string DownloadRoute = "episerverapi/catalogs/";
        private static readonly Guid _blobContainerIdentifier = Guid.Parse("119AD01E-ECD1-4781-898B-6DEC356FC8D8");

        private static readonly ILogger _logger = LogManager.GetLogger(typeof(CatalogExportController));

        /// <summary>
        /// Initializes a new instance of the <see cref="CatalogExportController"/> class.
        /// </summary>
        /// <param name="importExport">Catalog import export</param>
        /// <param name="blobFactory">The blob factory.</param>
        /// <param name="contentLoader">The content loader.</param>
        /// <param name="referenceConverter"></param>
        public CatalogExportController(CatalogImportExport importExport,
            IBlobFactory blobFactory,
            IContentLoader contentLoader,
            ReferenceConverter referenceConverter)
        {
            _importExport = importExport;
            _blobFactory = blobFactory;
            _contentLoader = contentLoader;
            _referenceConverter = referenceConverter;

            _importExport.IsModelsAvailable = true;
        }

        /// <summary>
        /// Direct download catalog export for admins.
        /// </summary>
        /// <param name="catalogName">Name of catalog to be exported.</param>
        /// <returns>
        /// Catalog.zip if successful else HttpResponseMessage containing error.
        /// </returns>
        [HttpGet]
        [Authorize(Roles = "CommerceAdmins")]
        [Route(DownloadRoute)]
        public HttpResponseMessage Index(string catalogName)
        {
            var catalogs = _contentLoader.GetChildren<CatalogContent>(_referenceConverter.GetRootLink());
            var catalog = catalogs.First(x => x.Name.Equals(catalogName, StringComparison.OrdinalIgnoreCase));
            if (catalog != null)
            {
                return GetFile(catalog.Name);
            }

            return new HttpResponseMessage
            { Content = new StringContent($"There is no catalog with name {catalogName}.") };
        }

        private HttpResponseMessage GetFile(string catalogName)
        {
            var container = Blob.GetContainerIdentifier(_blobContainerIdentifier);
            var blob = _blobFactory.CreateBlob(container, ".zip");
            using (var stream = blob.OpenWrite())
            {
                using (var zipArchive = new ZipArchive(stream, ZipArchiveMode.Create, false))
                {
                    var entry = zipArchive.CreateEntry("catalog.xml");

                    using (var entryStream = entry.Open())
                    {
                        _importExport.Export(catalogName, entryStream, Path.GetTempPath());
                    }
                }
            }

            var response = new HttpResponseMessage
            {
                Content = new PushStreamContent(async (outputStream, content, context) =>
                {
                    var fileStream = blob.OpenRead();

                    await fileStream.CopyToAsync(outputStream)
                        .ContinueWith(task =>
                        {
                            fileStream.Close();
                            outputStream.Close();

                            if (task.IsFaulted)
                            {
                                _logger.Error($"Catalog download failed", task.Exception);
                                return;
                            }

                            _logger.Information($"Feed download completed.");

                        });
                }, new MediaTypeHeaderValue("application/zip"))
            };
            return response;
        }
    }
}

And now you can access to this path http://yoursite.com/episerverapi/catalogs?catalogName=fashion to download the catalog named “Fashion”.

A few notes:

  • This requires Admin access, for obvious reasons. You will need to log in to your website first before accessing the path above
  • It can take some time for big catalogs, so be patient if that’s the case ;). Yes another approach is to have this as a scheduled job when you can export the catalog in background, but that make the selection of catalog to export much more complicated. If you have only one catalog, then go ahead!

Disabling Catalog Dto cache: maybe, don’t?

Recently (as recent as this morning) I was asked to look into a case when the Find indexing performance was subpar. Upon investigation – looking at a properly captured trace from dotTrace – it was clear that at least 30% percent of time was spending in loading the CatalogDto

This is something that should not happen, as the CatalogDto should have been cached. Also, a normal site should have very few catalogs, so the cache should be very effective. However, data does not lie – it has been hitting database a lot, and a quick check on the site settings revealed that the entire DTO cache has been indeed, disabled

 <Cache enabled="false" collectionTimeout="0:0:0" entryTimeout="0:0:0" nodeTimeout="0:0:0" schemaTimeout="0:0:0" /> 

By setting these timeout settings to 0, the cache is immediately invalidated, rendering them useless. The CatalogDto, therefore, is loaded everytime from database, causing the bottleneck.

The reason for setting those timeout to 0 was probably – I guess – to reduce the memory footprint of the site. However, Catalog DTOs are fairly small in size, and since Commerce 11, it has been smart enough to skip caching the DTOs if there is cache on a higher (content ) level, thanks to my colleague Magnus Rahl. So DTOs should not be of any concerns, if you are not actively using them (and in most of the case, you should not). By re-enabling the cache, the indexing time can be cut, at least 30%, according to the aforementioned trace.

As you might wonder, Catalog content provider still uses the DTOs internally, therefore it would load those for data.

Moral of the story:

  • The cache settings are there, but because you can, does not mean you should. I personally think cache settings should be as hidden as possible from accidental changes. Disabling cache, and in a lesser extend, changing default cache timeout, can have unforeseeable consequences. Only do so if you have strong reasons to do so. Or better, let us know why you need to do that, and we can figure out a compromise.

Dynamic data store is slow, (but) you can do better.

If you have been developing with Episerver CMS for a while, you probably know about its embedded “ORM”, called Dynamic Data Store, or DDS for short. It allows you to define strongly typed types which are mapped to database directly to you. You don’t have to create the table(s), don’t have to write stored procedures to insert/query/delete data. Sounds very convenient, right? The fact is, DDS is quite frequently used, and more often than you might think, mis-used.

As Joel Spolsky once said Every abstraction is leaky, an ORM will likely make you forget about the nature of the RDBMS under neath, and that can cause performance problems, sometime severe problems.

Let me make it clear to you

DDS is slow, and it is not suitable for big sets of data.

If you want to store a few settings for your website, DDS should be fine. However, if you are thinking about hundreds of items, it is probably worth looking else. Thousands and more items, then it would be a NO.

I did spend some time trying to bench mark the DDS to see how bad it is. A simple test is to add 10.000 items to a store, then query by each item, then deleted by each item, to see how long does it take

The item is defined like this, this is just another boring POCO:

internal class ShippingArea : IDynamicData
{
    public Identity Id { get; set; }

    public string PostCode { get; set; }

    public string Area { get; set; }

    public DateTime Expires { get; set; }
}

The store is defined like this

    public class ShippingAreaStore
    {
        private const string TokenStoreName = "ShippingArea";

        internal virtual ShippingArea CreateNew(string postCode, string area)
        {
            var token = new ShippingArea
            {
                Id = Identity.NewIdentity(),
                PostCode = postCode,
                Area = area,
                Expires = DateTime.UtcNow.AddDays(1)
            };
            GetStore().Save(token);
            return token;
        }

        internal virtual IEnumerable<ShippingArea> LoadAll()
        {
            return GetStore().LoadAll<ShippingArea>();
        }

        internal virtual IEnumerable<ShippingArea> Find(IDictionary<string, object> parameters)
        {
            return GetStore().Find<ShippingArea>(parameters);
        }

        internal virtual void Delete(ShippingArea shippingArea)
        {
            GetStore().Delete(shippingArea);
        }

        internal virtual ShippingArea Get(Identity tokenId)
        {
            return GetStore().Load<ShippingArea>(tokenId);
        }

        private static DynamicDataStore GetStore()
        {
            return DynamicDataStoreFactory.Instance.CreateStore(TokenStoreName, typeof(ShippingArea));
        }
    }

Then I have some quick and dirty code in QuickSilver ProductController.Index to measure the time (You will have to forgive some bad coding practices here ;). As usual StopWatch should be used on demonstration only, it should not be used in production. If you want a good break down of your code execution, use tools like dotTrace. If you want to measure production performance, use some monitoring system like NewRelic or Azure Application Insights ):

        var shippingAreaStore = ServiceLocator.Current.GetInstance<ShippingAreaStore>();
        var dictionary = new Dictionary<string, string>();
        for (int i = 0; i < 10000; i++)
        {
            dictionary[RandomString(6)] = RandomString(10);
        }
        var identities = new List<ShippingArea>();
        var sw = new Stopwatch();
        sw.Start();
        foreach (var pair in dictionary)
        {
            shippingAreaStore.CreateNew(pair.Key, pair.Value);
        }
        sw.Stop();
        _logger.Error($"Creating 10000 items took {sw.ElapsedMilliseconds}");
        sw.Restart();
        foreach (var pair in dictionary)
        {
            Dictionary<string, object> parameters = new Dictionary<string, object>();
            parameters.Add("PostCode", pair.Key);
            parameters.Add("Area", pair.Value);
            identities.AddRange(shippingAreaStore.Find(parameters));
        }

        sw.Stop();
        _logger.Error($"Querying 10000 items took {sw.ElapsedMilliseconds}");
        sw.Restart();

        foreach (var id in identities)
        {
            shippingAreaStore.Delete(id);
        }
        sw.Stop();
        _logger.Error($"Deleting 10000 items took {sw.ElapsedMilliseconds}");

Everything is ready. So a few tries gave us a fairly stable result:

2019-12-02 13:33:01,574 Creating 10000 items took 11938

2019-12-02 13:34:59,594 Querying 10000 items took 118009

2019-12-02 13:35:24,728 Deleting 10000 items took 25131

And this is strictly single-threaded, the site will certainly perform worse when it comes to real site with a lot of traffic, and thus multiple insert-query-delete at the same time.

Can we do better?

There is a little better attribute that many people don’t know about DDS: you can mark a field as indexed, by adding [EPiServerDataIndex] attribute to the properties. The new class would look like this.

    [EPiServerDataStore]
    internal class ShippingArea : IDynamicData
    {
        public Identity Id { get; set; }

        [EPiServerDataIndex]
        public string PostCode { get; set; }

        [EPiServerDataIndex]
        public string Area { get; set; }

        public DateTime Expires { get; set; }
    }

If you peek into the database during the test, you can see that the data is now being written to Indexed_String01 and Indexed_String02 columns, instead of String01 and String02 as without the attributes. Such changes give us quite drastic improvement:

2019-12-02 15:38:16,376 Creating 10000 items took 7741

2019-12-02 15:38:19,245 Querying 10000 items took 2867

2019-12-02 15:38:44,266 Deleting 10000 items took 25019

The querying benefits greatly from the new index, as it no longer has to do a Clustered Index Scan, it can now do a non clustered index seek + Key look up. Deleting is still equally slow, because the delete is done by a Clustered Index delete on the Id column, which we already have, and the index on an Uniqueidentifier column is not the most effective one.

Before you are happy which such improvement, keep in mind that there are two indexes added for Indexed_String01 and Indexed_String02 separately. Naturally, we would want a combination, clustered even, on those columns, but we just can’t.

What if we want to go bare metal and create a table ourselves, write the query ourselves? Our repository would look like this

public class ShippingAreaStore2
    {
        private readonly IDatabaseExecutor _databaseExecutor;

        public ShippingAreaStore2(IDatabaseExecutor databaseExecutor)
        {
            _databaseExecutor = databaseExecutor;
        }

        /// <summary>
        /// Creates and stores a new token.
        /// </summary>
        /// <param name="blobId">The id of the blob for which the token is valid.</param>
        /// <returns>The id of the new token.</returns>
        internal virtual ShippingArea CreateNew(string postCode, string area)
        {
            var token = new ShippingArea
            {
                Id = Identity.NewIdentity(),
                PostCode = postCode,
                Area = area,
                Expires = DateTime.UtcNow.AddDays(1)
            };
            _databaseExecutor.Execute(() =>
            {
                var cmd = _databaseExecutor.CreateCommand();
                cmd.CommandText = "ShippingArea_Add";
                cmd.CommandType = CommandType.StoredProcedure;
                cmd.Parameters.Add(_databaseExecutor.CreateParameter("Id", token.Id.ExternalId));
                cmd.Parameters.Add(_databaseExecutor.CreateParameter("PostCode", token.PostCode));
                cmd.Parameters.Add(_databaseExecutor.CreateParameter("Area", token.Area));
                cmd.Parameters.Add(_databaseExecutor.CreateParameter("Expires", token.Expires));
                cmd.ExecuteNonQuery();
            });

            return token;
        }

        internal virtual IEnumerable<ShippingArea> Find(IDictionary<string, object> parameters)
        {
            return _databaseExecutor.Execute<IEnumerable<ShippingArea>>(() =>
            {
                var areas = new List<ShippingArea>();
                var cmd = _databaseExecutor.CreateCommand();
                cmd.CommandText = "ShippingArea_Find";
                cmd.CommandType = CommandType.StoredProcedure;
                cmd.Parameters.Add(_databaseExecutor.CreateParameter("PostCode", parameters.Values.First()));
                cmd.Parameters.Add(_databaseExecutor.CreateParameter("Area", parameters.Values.Last()));
                var reader = cmd.ExecuteReader();
                while (reader.Read())
                {
                    areas.Add(new ShippingArea
                    {
                        Id = (Guid)reader["Id"],
                        PostCode = (string)reader["PostCode"],
                        Area = (string)reader["Area"],
                        Expires = (DateTime)reader["Expires"]
                    });
                }
                return areas;
            });
        }

        /// <summary>
        /// Deletes a token from the store.
        /// </summary>
        /// <param name="token">The token to be deleted.</param>
        internal virtual void Delete(ShippingArea area)
        {
            _databaseExecutor.Execute(() =>
            {
                var cmd = _databaseExecutor.CreateCommand();
                cmd.CommandText = "ShippingArea_Delete";
                cmd.CommandType = CommandType.StoredProcedure;
                cmd.Parameters.Add(_databaseExecutor.CreateParameter("PostCode", area.PostCode));
                cmd.Parameters.Add(_databaseExecutor.CreateParameter("Area", area.Area));
                cmd.ExecuteNonQuery();
            });
        }
    }

And those would give us the results:

2019-12-02 16:44:14,785 Creating 10000 items took 2977

2019-12-02 16:44:17,114 Querying 10000 items took 2315

2019-12-02 16:44:20,307 Deleting 10000 items took 3190

Moral of the story?

DDS is slow and you should be avoid using it if you are working with fairly big set of data. If you have to use DDS for whatever reason, make sure to at least try to index the columns that you query the most.

And in the end of the days, hand-crafted custom table + query beats everything. Remember that you can use some tools like Dapper to do most of the works for you.

Hide certain tabs in Catalog UI

It has been a while since I write something in my blog – have been “fairly” busy making Commerce even faster for a while. But I should take a break from time to time and share things that will benefit community as a whole – and this is one of that break.

Today I come across this question on World https://world.episerver.com/forum/developer-forum/Episerver-Commerce/Thread-Container/2019/10/remove-item-from-tab-in-content-editor/ . Basically, how to hide a specific tab in the Catalog UI when you open All Properties view of a catalog content.

The original poster has found a solution from https://world.episerver.com/forum/legacy-forums/Episerver-7-CMS/Thread-Container/2013/10/Is-there-any-way-to-hide-the-settings-tab/ . While it works, I think it is not the easiest or simple way to do it. Is there a simpler way? Yes.

The Related Entries tab is generated for content with implements IAssociating interface. Bad news is EntryContentBase implements that interface, so each and every entry type you have, has that tab. But good news is we can override the implementation – by just override the property defined by IAssociating.

How?

Simple as this

        /// <inheritdoc />
        [IgnoreMetaDataPlusSynchronization]
        [Display(AutoGenerateField = false)]
        public override Associations Associations { get; set; }

We are overriding the Associations property, and the change the Display attribute to have AutoGenerateField = false. Just try to build it and see

No Related Views! But is it the end of the story. Not yet, Related Views can still be accessed by the menu

A complete solution is to also disable that view. How? By using the same technique here https://world.episerver.com/blogs/Quan-Mai/Dates/2019/8/enable-sticky-mode-for-catalog-content/ i.e. using `UIDescriptor`. You can disable certain views by adding this to your constructor

AddDisabledView(CommerceViewConfiguration.RelatedEntriesEditViewName);

A few notes:

  • This only affects the type you add the property, so for example you can hide the tab for Products, but still show it for Variants.
  • Related Entries is not the only tab you can hide. By applying the same technique you can have a lot of control over what you can hide, and what you show. I will leave that to you for exploration!

IContentLoader.Get(contentLink) is considered harmful for catalog content.

A while ago I wrote about how you should be aware of IContentLoader.GetChildren<T>(contentLink) here. However, that is only half of story.

IContentLoader.Get<T>(contentLink) is also considered harmful. Not in terms of it causes damage to your site (we would never, ever let that happen), nor it is slow (not unless you abuse it), but because it can behave very unexpectedly.

As you might already know, catalog content fully supports language versions, which means a catalog might have multiple languages enabled, and each and every catalog item in that catalog (node/category, and entry) will be available in those languages. However, those languages are not equal, (only) one is master language. What’s the difference then?

One of very important characteristics of that is how it affects the properties. Properties with [CultureSpecific] attribute decorated will be different in each language, and therefore, can be edited in each language. Properties without [CultureSpecific] attribute decorated will be the same in all languages, and can only be edited in master language. In Catalog UI, if you switch to non master languages, those properties will be grayed out, indicating they can’t be edited.

Now, why IContentLoader.Get<T>(contentLink) is considered harmful? Because you don’t supply a CultureInfo to let it know which version you want, it relies on the current preferred language to load the content. And if you have a catalog which has master language that is different from the current preferred language, you are loading a non-master language version. And then if you try to edit a non [CultureSpecific] property, then save it, the changes will not be saved, without error or warning.

It then will be very confusing because it sometimes works (someone changes the current preferred language that matches the catalog master language, and sometimes it doesn’t.

Which can cost you hours, if not days, to figure out what is wrong with your code.

Same thing applies to IContentLoader.TryGet<T>(contentLink)

Solution? Always use the overload that takes a CultureInfo or a LoaderOptions parameter, even if you just want to read the content. That creates a “good” habit and you can quickly spot code that might be problematic.

Use this to load master language version, if you wish to update some non CultureSpecific property.

 new LoaderOptions() { LanguageLoaderOption.MasterLanguage() }

Later versions of Commerce will log a warning if you are trying to save a non master language version with one or more changed non [CultureSpecific]properties.

Control the thousand separator for Money in Episerver Commerce

If you are selling goods in multiple markets which same currency but with different languages, such as EuroZone, you might notice that while everything looks quite good, except that the thousand separator might be off from time to time: it is always the same and does not change to match with the language, so sometimes it’s correct, sometimes it’s not.

Let’s take a step back to see how to properly show the thousand delimiter 

In the United States, this character is a comma (,). In Germany, it is a period (.). Thus one thousand and twenty-five is displayed as 1,025 in the United States and 1.025 in Germany. In Sweden, the thousands separator is a space.

https://docs.microsoft.com/en-us/globalization/locale/number-formatting

You might ask why the problem happens with Episerver Commerce. In Commerce, each currency has an attached NumberFormatInfo which let the framework knows how to format the currency. During startup, the system will loop through the available CultureInfo and assign its .NumberFormat to the currency.

The problem is there might be multiple CultureInfo that can handle same currency, for example, EUR which is used across Eurozone, can be handled by multiple (20? ) cultures. However, the first matching CultureInfo to handle the format of the currency will be used. In most of the cases, it will be br-FR (because the CultureInfo(s) are sorted by name, and this CultureInfo is the first in the list to handle EUR)

br-FR does not have a thousand separator, but a whitespace. That’s why even if your language is de-DE, the amount in EUR will not be properly formatted as 1.234,45 but 1 234,45

How to fix that problem?

Luckily, we can set the NumberFormatInfo attached for each currency. If you are only selling in Germany, you can make sure that EUR is always formatted in German style, by adding this to one of your initialization modules:

var culture = CultureInfo.GetCultureInfo("de-DE");
Currency.SetFormat("EUR", culture.NumberFormat);

But if you have multiple languages for one currency, this will simply not work (because it’s static, so it will affect all customer). Your only option is to avoid using Money.ToString(), but to use Money.ToString(IFormatProvider), for example

money.ToString(CultureInfo.CurrentUICulture);

Assuming CultureInfo.CurrentUiCulture is set to correct one.

This, however, does not resolve the problem with merchandisers using Commerce Manager. They might have to work with orders from multiple markets, and for example, if your site is selling good stuffs in Europe, there are chances that merchandisers see the prices without correct thousand separator. Most of places in Commerce Manager uses Money.ToString(), and there is a reason for that: it’s too risky to use Money.ToString(CultureInfo.CurrentUICulture), because if a merchandiser uses English, he or she is likely gonna see money formatted as “$” instead of “€”, and that is a much bigger problem of itself.

Moral of the story: localization is hard, and sometimes a compromise is needed.

Fixing ASP.NET Membership performance – part 1

Even though it is not the best identity management system in the .NET world, ASP.NET Membership provider is still fairly widely used, especially for systems that have been running for quite long time with a significant amount of users: migrating to a better system like AspNetIdentity does not comes cheap. However, built from early days of ASP.NET mean Membership provider has numerous significant limitations: beside the “architecture” problems, it also has limited performance. Depends on who you ask, the ultimate “maximum” number of customers that ASP.NET membership provider can handle ranges from 30.000 to 750.000. That does not sound great. Today if you start a new project, you should be probably better off with AspNetIdentity or some other solutions, but if your website is using ASP.NET membership provider and there is currently no plan to migrate, then read on.

The one I will be used for this blog post has around 950.000 registered users, and the site is doing great – but that was achieved by some very fine grained performance tuning, and a very high end Azure subscription.

A performance overview 

I have been using ASP.NET membership provider for years, but I have never looked into it from performance aspects. (Even though I have done some very nasty digging to their table structure). And now I have the chance, I realize how bad it is.

It’s a fairly common seen in the aspnet_* tables that the indexes have ApplicationId as the first column. It does not take a database master to know it is a very ineffective way to create an index – in most of the cases, you only have on ApplicationId in your website, making those indexes useless when you want to, for example, query by UserId. This is a rookie mistake – a newbie tends to make order of columns in the index as same as they appear in the table, thinking, that that SQL Server will just do magic to exchange the order for the best performance. It’s not how SQL Server – or in general – RDBMS systems work.

It is OK to be a newbie or to misunderstand some concepts. I had the very same misconception once, and learned my lessons. However, it should not be OK for a framework to make that mistake, and never correct it.

That is the beginning of much bigger problems. Because of the ineffective order of columns, the builtin indexes are as almost useless. That makes the queries, which should be very fast, become unnecessarily slow, wasting resources and increasing your site average response time. This is of course bad news. But good news is it’s in database level, so we can change it for the better. It if were in the application level then our chance of doing that is close to none.

Missing indexes

If you use Membership.GetUserNameByEmail on your website a lot, you might notice that it is … slow. It leads to this query:

        SELECT  u.UserName
        FROM    dbo.aspnet_Applications a, dbo.aspnet_Users u, dbo.aspnet_Membership m
        WHERE   LOWER(@ApplicationName) = a.LoweredApplicationName AND
                u.ApplicationId = a.ApplicationId    AND
                u.UserId = m.UserId AND
                LOWER(@Email) = m.LoweredEmail

Let’s just ignore the style for now (INNER JOIN would be a much more popular choice), and look into the what is actually done here. So it joins 3 tables by their keys. The join with aspnet_Applications would be fairly simple, because you usually have just one application. The join between aspnet_Users and aspnet_Membership is also simple, because both of them have index on UserId – clustered on aspnet_Users and non-clustered on aspnet_Membership

The last one is actually problematic. The clustered index on aspnet_Membership actually looks like this

CREATE CLUSTERED INDEX [aspnet_Membership_index]
    ON [dbo].[aspnet_Membership]([ApplicationId] ASC, [LoweredEmail] ASC);

Uh oh. Even if this contains LoweredEmail, it’s the worst possible kind of index. By using the least distinctive column in the first, it defeats the purpose of the index completely. Every request to get user name by email address will need to perform a full table scan (oops!)

This is a the last thing you want to see in a execution plan, especially with a fairly big table. 

It should have been just

CREATE CLUSTERED INDEX [aspnet_Membership_index]
    ON [dbo].[aspnet_Membership]([LoweredEmail] ASC);

which helps SQL Server to use the optimal execution plan

If you look into Azure SQL Database recommendation, it suggest you to create a non clustered index on LoweredEmail. That is not technically incorrect, and it still helps. However, keep in mind that each non clustered index will have to “duplicate” the clustered index, for the purpose of identify the rows, so keeping the useless clustered index actually increases wastes and slows down performance (even just a little, because you have to perform more reads to get the same data). However, if your database is currently performing badly, adding a non clustered index is a much quicker and safer option. The change to clustered index should be done with caution at low traffic time.

Tested the stored procedure on database above, without any additional index

Table 'aspnet_Membership'. Scan count 9, logical reads 20101, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'aspnet_Applications'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'aspnet_Users'. Scan count 0, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row affected)

 SQL Server Execution Times:
   CPU time = 237 ms,  elapsed time = 182 ms.

With new non clustered index


(1 row affected)
Table 'aspnet_Applications'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'aspnet_Users'. Scan count 0, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'aspnet_Membership'. Scan count 1, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row affected)

 SQL Server Execution Times:
   CPU time = 15 ms,  elapsed time = 89 ms.

With new clustered index:

(1 row affected)
Table 'aspnet_Applications'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'aspnet_Users'. Scan count 0, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'aspnet_Membership'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row affected)

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 89 ms.

Don’t we have a clear winner?

Speed up catalog routing if you have multiple children under catalog

A normal catalog structure is like this: you have a few high level categories under the catalog, then each high level category has a few lower level categories under it, then each lower level category has their children, so on and so forth until you reach the leaves – catalog entries.

However it is not uncommon that you have multiple children (categories and entries) directly under catalog. Even though that is not something you should do, it happens. 

But that is not without drawbacks. You might notice it is slow to route to a product. It might not be visible to naked eyes, but if you use some decent profilers (which I personally recommend dotTrace), it can be fairly obvious that your site is not routing optimally.

Why?

To route to a specific catalog content, for example http://commerceref/en/fashion/mens/mens-shirts/p-39101253/, the default router have to figure out which content is mapped to an url segment. So with default registration where the catalog root is the default routing root, we will start with the catalog which maps to the first part of route (fashion ). How do it figure out which content to route for the next part (mens ) ? 

Until recently, what it does it to call GetChildren on the catalog ContentReference . Now you can see the problem. Even with a cached result, that is still too much – GetChildren with a big number of children is definitely expensive.

We noticed this behavior, thanks to Erik Norberg. An improvement have been made in Commerce 12.10 to make sure even with a number of children directly under Catalog, the router should perform adequately efficient.

If you can’t upgrade to 12.10 or later (you should!), then you might have a workaround that improve the performance. By adding your own implementation of HierarchicalCatalogPartialRouter, you can override how you would get the children content – by using a more lightweight method (GetBySegment)

    public class CustomHierarchicalCatalogPartialRouter : HierarchicalCatalogPartialRouter
    {
        private readonly IContentLoader _contentLoader;

        public CustomHierarchicalCatalogPartialRouter(Func<ContentReference> routeStartingPoint, CatalogContentBase commerceRoot, bool enableOutgoingSeoUri) : base(routeStartingPoint, commerceRoot, enableOutgoingSeoUri)
        {
        }

        public CustomHierarchicalCatalogPartialRouter(Func<ContentReference> routeStartingPoint, CatalogContentBase commerceRoot, bool supportSeoUri, IContentLoader contentLoader, IRoutingSegmentLoader routingSegmentLoader, IContentVersionRepository contentVersionRepository, IUrlSegmentRouter urlSegmentRouter, IContentLanguageSettingsHandler contentLanguageSettingsHandler, ServiceAccessor<HttpContextBase> httpContextAccessor) : base(routeStartingPoint, commerceRoot, supportSeoUri, contentLoader, routingSegmentLoader, contentVersionRepository, urlSegmentRouter, contentLanguageSettingsHandler, httpContextAccessor)
        {
            _contentLoader = contentLoader;
        }

        protected override CatalogContentBase FindNextContentInSegmentPair(CatalogContentBase catalogContent, SegmentPair segmentPair, SegmentContext segmentContext, CultureInfo cultureInfo)
        {
            return _contentLoader.GetBySegment(catalogContent.ContentLink, segmentPair.Next, cultureInfo) as CatalogContentBase;
        }
    }

And then instead of using CatalogRouteHelper.MapDefaultHierarchialRouter , you register your router directly

 var referenceConverter = ServiceLocator.Current.GetInstance<ReferenceConverter>();
            var contentLoader = ServiceLocator.Current.GetInstance<IContentLoader>();
            var commerceRootContent = contentLoader.Get<CatalogContentBase>(referenceConverter.GetRootLink());
            routes.RegisterPartialRouter(new HierarchicalCatalogPartialRouter(startingPoint, commerceRootContent, enableOutgoingSeoUri));

(ServiceLocator is just to make it easier to understand the code. You should do this in an IInitializationModule, so use context.Locate.Advanced instead.

This is applicable from 9.2.0 and newer versions. 

Moral of the story:

  • Catalog structure can play a big role when it comes to performance.
  • You should do profiling whenever you can
  • We do that too, and we make sure to include improvements in later versions, so keeping your website up to date is a good way to tune performance.

Refactoring Commerce catalog code, a story

It is not a secret that I am a fan of refactoring. Clean. shorter, simpler code is always better. It’s always a pleasure to delete some code while keeping all functionalities: less code means less possible bugs, and less places to change when you have to change.

However, while refactoring can bring a lot of enjoying to the one who actually does it, it’s very hard to share the experience: most of the cases it’s very specific and the problem itself is not that interesting to the outside world. This story is an exception because it might be helpful/useful for other Commerce developer.

Continue reading “Refactoring Commerce catalog code, a story”

Commerce batching performance – part 2: Loading prices and inventories

UPDATE: When looked into it, I realize that I have a lazy loading collection of entry codes, so each test had to spent time to resolve the entry code(s) from the content links. That actually costs quite a lot of time, and therefore causing the performance tests to return incorrect results. That was corrected and the results are now updated.

In previous post we talked about how loading orders in batch can actually improve your website performance, and we came to a conclusion that 1000-3000 orders per batch probably yields the best performance result.

But orders are not the only thing you would need to load on your website. A more common scenario is to load prices and inventories for product. So If you are displaying a product listing page, it’s quite common to load prices and inventories for all products in that page. How should it be loaded?

Continue reading “Commerce batching performance – part 2: Loading prices and inventories”