Speed up your catalog entries indexing
Even with the raising popularity of FindCommerce, I suspect that many Commerce customers are still using the search provider system, as it comes with an undeniable benefit: It can be used within Commerce Manager. And while I suspect a majority of you have the eventual indexing turned on (aka the entry will be indexed as soon as it’s changed), many still index the entries on demand. Aka manually, or periodically via scheduled job.
We received a report from a customer recently as the indexing timeout, as he was indexing a large amount of entries (about 250.000 entries on 8 catalogs). When we looked into the problem, we discover a possible improvement which is almost free for you.
Unlike the eventual indexing, or the rebuild index (which the entire index will be rebuilt), the manual indexing must care about the changed entries. It of course should not index the unchanged entries because that’d be a waste of resources. But it also must not miss any changed entries – otherwise the search results might not be correct. The manual indexing is done through 3 steps:
- Identify the changed entries (usually from the last time of rebuilding the index until now)
- Get the changed entries
- Send them to the search provider to index.
The one we are going to speed up to day is the first step – identity the changed entries. How are they identified? Well, an entry is considered “changed” and needed to re-index if it has one of these changed:
- If its metadata is changed
- If one of its node-relations is changed
- If its CatagEntryDto information is changed.
The first two are quite easy to determine – if an entry’s metadata is changed, the Modified field of CatalogContentEx will be updated, and if it’s node relation is updated, the Modified field of NodeEntryRelation will be updated. Some simple SQL queries will get you that.
But the last one is … tricky. There is no Modified column in CatalogEntry. So how Commerce pulls that information?
No, it’s from ApplicationLog. Every time an entry is changed, either added, updated and deleted, a log entry will be added to ApplicationLog, and that will be used for several purposes, such as Auditing, or in this case, to know which entry was edited.
(You might be questioning the design, but I honestly don’t know. It’s like that since long long ago, even before I joined Episerver).
This is when we might have performance problem, because ApplicationLog is an ever-growing table, and it will get bigger and bigger over time. If it is big enough, (like the one I tested is 40 millions rows strong), the time to pull out the entries we need might take a long time. And this is the place where we can improve the performance by adding a covered index:
CREATE NONCLUSTERED INDEX [IDX_ApplicationLog_Covered]
ON [dbo].[ApplicationLog] ([Source],[Operation],[ObjectType],[Created])
Beware, this can take a pretty long time to run. On above database, it took 11 minutes to run on my machine. But IMO, it’s worth it, because a query to identity the changed entries which previously ran in more than 2 minutes, now runs in less than 1 second! More than 100x times faster!
Wait, will Episerver integrate this change in upcoming releases? I don’t know – adding the index into a newly created table is easy enough, but it can be troublesome if you are upgrading a very big ApplicationLog table. The query to create the index might need to run separately, possibly in context of SQL Server Management studio to avoid issues such as timeout.
And is it possible to truncate the ApplicationLog table? Well, after you do a full reindex, it’s only good for auditing reasons, so we know who did what and when. You might simply move the rows in to a “copied” table and truncate the original one. Note that, you’ll have to do that at your own risks!