Episerver CMS performance optimization – part 1

Update: In Episerver CMS 11, released today (November 21st 2017), the simpleaddress router has been moved to the last of the route table.

Original post:

This is an unusual post – it is not about Commerce – my area of expertise, but about CMS. Recently I’ve been working on some support cases where SQL Server instance is on high utilization, and in some scenarios it eventually slows down the site. After investigation, it’s likely to come from a small, simple and helpful feature: Simple address.

CMS content can have a property named “Simple address”, which allows you to create a “shortcut” url for that content. So if you have one page with “name in url” as “contact-us” under a page name “about-us” under Home page, you can access it via https://mysupercoolsite.com/en/about-us/contact-us. Or you can set the Simple address for that page as “contact-us”, and then you can access it directly via https://mysupercoolsite.com/contact-us.

Changing Simple Address in Edit UI

Now it’s an interesting part: To make the simple address concept work, Episerver has to call a special stored procedure (“netquicksearchbyexternalurl” if you wonder) to check if any published content has that simple address (“ExternalUrl” is the field internally). If a content is not found, a 404 error will be shown.

Sounds reasonable, right? But that small and handy feature can come with a drawback: performance!
[quads id=2]

What? Why? How?

The simple address is the first router to be registered by Episerver – so it will also be the first one to process all URLs (except for static files for example, of course). To see which URLs are being processed by Simple Address – let’s play a bit. Note that this is highly experimental so you should not try it on production (You’ll likely not to, but I feel obliged to say). Let’s add a table to track all requests to  netquicksearchbyexternalurl

CREATE TABLE [dbo].[SimpleAddressRequest](
	[RequestTime] [datetime2](7) NULL,
	[RequestUrl] [nvarchar](260) NULL
) ON [PRIMARY]

GO

And modify the stored procedure itself to insert to that table whenever it try to resolve an url:

ALTER PROCEDURE [dbo].[netQuickSearchByExternalUrl]
(
	@Url	NVARCHAR(255),
	@CurrentTime	DATETIME
)
AS
BEGIN
	SET NOCOUNT ON
	
	DECLARE @LoweredUrl NVARCHAR(255)
	
	SET @LoweredUrl = Lower(@Url)

	INSERT INTO dbo.SimpleAddressRequest VALUES(GETDATE(), @LoweredUrl)

	/*
		Performance notes: The subquery "Pages" must not have any more predicates or return the values used in the outer WHERE-clause, otherwise
		SQL Server falls back to a costly index scan. The performance hints LOOP on the joins are also required for the same reason, the resultset
		from "Pages" is so small that a loop join is superior in performance to index scan/hash match, a factor 1000x.
	*/
	
	SELECT 
		tblPageLanguage.fkPageID,
		tblLanguageBranch.LanguageID as LanguageBranch
	FROM 
		(
			SELECT fkPageID,fkLanguageBranchID
			FROM tblPageLanguage
			WHERE tblPageLanguage.ExternalURL=@LoweredUrl
		) AS Pages
	INNER LOOP JOIN 
		tblPage ON tblPage.pkID = Pages.fkPageID
	INNER LOOP JOIN
		tblPageLanguage ON tblPageLanguage.fkPageID=Pages.fkPageID AND tblPageLanguage.fkLanguageBranchID=Pages.fkLanguageBranchID
	INNER LOOP JOIN
		tblLanguageBranch ON tblLanguageBranch.pkID = Pages.fkLanguageBranchID
	WHERE 
		tblPage.Deleted=0 AND 
		tblPageLanguage.[Status]=4 AND
		tblPageLanguage.StartPublish <= @CurrentTime AND
		(tblPageLanguage.StopPublish IS NULL OR tblPageLanguage.StopPublish >= @CurrentTime)
	ORDER BY
		tblPageLanguage.Changed DESC
END

That’s all. Now just try to visit the website. There are the requests SimpleAddress router handled when I just browse around Quicksilver:

You can see many of them are clearly not simple addresses – they are assets or catalog content. However, all of them will be queried against tblPageLanguage. The router for Simple Address has cache internally, but if you have enough requests, it’ll soon fill in the cache and trigger the cache trimming, therefore the view will be queried again. Even worse, if you have enough rows in tblPageLanguage, each query will take more time and resource – tblPageLanguage (and its underlying tblContentLanguage table) contains information for all CMS Content (Pages, assets, etc.) so it’s easy to have quite big size (The one I investigated has 3.7M rows). When you have a lot of queries which are not super fast to run, it’s only matter of time before it bring your SQL Server instance to its knees.

netquicksearchbyexternalurl under spotlight. You won’t want to see this, of course

Is there a solution for this problem? Yes. If you don’t use Simple Address at all, just disable the router, and it will save you from queries after queries. This simple code in an InitializableModule will do that:

public void Initialize(InitializationEngine context)
{
Global.RoutesRegistered += Global_RoutesRegistered;
}

        private void Global_RoutesRegistered(object sender, RouteRegistrationEventArgs e)
        {
            var simpleAddressRouter =
                e.Routes.OfType<IContentRoute>().FirstOrDefault(r => r.Name.Equals("simpleaddress"));
            if (simpleAddressRouter != null)
            {
                e.Routes.Remove((RouteBase) simpleAddressRouter);
            }
        }

This will remove SimpleAddress from the registered routes, and it will trouble you no more.

12 thoughts on “Episerver CMS performance optimization – part 1

  1. Hi Quan

    Great job. Very nice article.
    Exciting and nice to know the limits and bottlenecks of the system/solution.

    Sadly though, I have a marketing department above me, calling the shots. They are determined that SimpleAddress is the way to go, due to visibility and branding through deeplinks.

    Not that is matter in this specific case, but could you include the EPiServer version in your future posts, for reference?
    We’re on the verge to start updating, and it would be neat for me to know whether the information could be relevant as we move forward.

    Have a nice day.

    Best regards,
    Henrik C. Jensen

    1. Hi Henrik,
      When I think a post only applies for a specific version, I will mention it. If I don’t then that means it applies for a good range of version – include the most recent one. (I’ll update if future version changes that)
      It should be able to move the SimpleAddress to the last in the router list. I filed a bug for CMS Core team so they can look into it.

  2. Good post. Most marketing departments I’ve worked with make heavy use of the simple address feature (its main use case being shorter urls to put on printed materials).

    If disabling isn’t an option, it looks from your investigations that there is a fair bit of optimisation that Episerver could add to the core implementation that may help large sites where the cache is trimmed.

    Why go to the database at all if the path is one of the known / reserved keyword routes (such as ~/globalassets or ~/episerver). I’m presuming that this also tries to match any static routes as well, so why not provide a initialisation API where I can tell the simpleaddress router to ignore any requests for directory that I know are not simple addresses.

    Mark

  3. I moved my simple address route to be last. Works great! 🙂
    Makes sense to have the most common routes resolved first and then in descending order. There are usually very few simple urls on a site and seldom much traffic to them so makes sense to keep it among the last ones. If nothing else to save some memory since it doesn’t have to store every possible url in cache for the episerver site…

  4. Could we change the SimpleAddress router order to last routing due to we need to use SimpleAddress function 🙂

    1. Technically you can – RouteCollection is inherited from Collection so you can remove and re-add it (to the end)

  5. This is a great post indeed, but I have to agree with some other guys here.
    Rather than just removing simpleaddress you should absolutely move it last in your routecollection.

    If you are 100% sure you’ll never use it then it’s fine to remove it, but more often than not marketing people will require the feature.

    Moving it last in my routecollection made a huge difference on our performance, that should be your number one go to solution.

    Great article though, and it’s very well written 🙂

  6. Hi Quan,

    Thanks so much for posting this and sharing that stored procedure name as I was looking for any kind of detailed insight into how Episerver handles “simple address” lookup.

    I also have a marketing department that makes extreme use of the simple address concept. Here in lies my problem…I’m doing a port from Umbraco to Episerver. Umbraco actually allows multiple “simple addresses” by storing them as a comma separated string in their “umbracoUrlAlias” field. Our site has 100k+ pages and I’ve never had reason to be concerned about performance from it handling the mapping.

    Unfortunately my c# coding skills could be better and my SQL is even more suspect. Can you help me understand what is the reason behind the complexity of the Episerver mapping? Do you see any possible hope of Episerver allowing for multiple “simple addresses”? Any other ideas or thoughts would be super appreciated.

    Thanks for your time.

    1. Hi Matt,

      Welcome to my blog and glad to hear that you moved to Episerver. The thing with one simple address is that it allows quick look up for the simple address. If you have something like “first-url,second-url,third-url”, the index in database is not efficient – you will need to do a full table scan every time.
      That can be solved by moving simple addresses to a separated table. However that’s a major change. I’m not quite sure why CMS did not support multiple simple addresses at first place, perhaps because the product manager(s) did not see that as necessary, or perhaps a simple oversight. I can bring up that discussion with them, but I doubt it will be a strong case (i.e. it might not be prioritized, if at all).

      1. Hey Quan,

        Thanks so much for the response and for your thoughts on multiple simple addresses. It makes sense that a multiple simple address lookup would be very intense processing-wise without its own table and conversion to a unique set of keys. I totally get that might be a gigantic overhaul for Episerver based on the current system, but thanks for even considering bringing it up as a new feature discussion.

        To get around the limitation, I have a proof of concept using the Redirect Manager plugin to 301 redirect from multiple branded urls to the single simple address. To me it feels like a hack to use a “page moved” redirect to create branded urls, but it seems this is common practice without appearing to impact SEO. The benefit of Episerver having at least one simple address is we can redirect to that simple address and not worry if the page has been moved in the tree structure. Only downside i see with this setup is the browser bar will have to update the url to the simple address. Well that, and the added load time for the redirect on the branded page requests.

        Can you think of anything else I might not be considering before we end up way too deep into using this process?

        Thanks again,
        Matt

Leave a Reply

Your email address will not be published. Required fields are marked *