This blog is now ads-free

As with most things in life, if it is not illegal, or immoral, or not prohibitively expensive, I want to try it. Life is so short that we need to experience more. When we still can.

Adverting is one of them. You probably noticed that my blog has plenty of ads (in some cases, a lot). I wanted to see two things

  • How Google ads works
  • If I can make money out of it

The answer to the second question is yes, I can. Just … not a lot. In the last few years I have been making less than 1 SEK per day from ads revenue. Just barely enough to cover the cost for the domain name renewal. I’m lucky enough that my employer is sponsoring the hosting cost, so ads is not making me rich. Not fast enough at least. I’ll likely not live to the day I can buy myself a serious roaster from ads revenue. (And if you want to sponsor me, this is the roaster I want to buy )

And ads can be annoying and sometimes, intrusive, I decided to turn off ads from this blog. I will add a button so you can donate/sponsor me some money if you think the blog is helpful to you (or you just want to be nice and toss a coin to your witcher, I meant, blogger)

I promise I will use those money on weed, I meant, green coffee beans. I started coffee roasting recently, and stay true to my philosophy, I want to test as many coffee as I can. Some are reasonable priced, some are pretty expensive.

In any cases, thank you for visiting my blog. I hope it is helpful to you, and I wish you a wonderful day!

Don’t let the execution plan fools you

Don’t get me wrong, execution plan is one of the best tools at your disposal if you want to optimize a SQL query. No, it is the must have tool. It is not the only tool you will need, but if you have to pick only one, pick it.

But it is important to know that execution plan can be misleading. It is very useful to see where is the bottleneck is within a statement. It is not exactly useful when you need to compare two statements.

Let’s compare these two queries that I am working to optimize

SELECT	OG.OrderGroupId
		FROM	OrderGroup OG
		INNER JOIN	OrderGroup_PurchaseOrder PO ON OG.OrderGroupId = PO.ObjectId WHERE 1 = 1  AND OG.Status IN(SELECT Item FROM ecf_splitlist('Cancelled')) ORDER BY OG.OrderGroupId DESC
        OFFSET 0  ROWS 
        FETCH NEXT 50 ROWS ONLY

versus

SELECT	OG.OrderGroupId
		FROM	OrderGroup OG
		INNER JOIN	OrderGroup_PurchaseOrder PO ON OG.OrderGroupId = PO.ObjectId  WHERE 1 = 1  AND OG.Status IN('Cancelled') ORDER BY OG.OrderGroupId DESC
        OFFSET 0  ROWS 
        FETCH NEXT 50 ROWS ONLY

These are 99% similar, except for the statement OG.Status IN ..., with and without calling the split function.

If you look at the execution plan only, it seems the former is much faster than the latter. It takes only 14% of the time, while the latter takes 86%, so if based on those figures only, we might think the first one is ~6 times faster than the second one.

Except it is not. If we turn on the IO statistics, it is a very different story

The first query has significantly more IO operations than the second

(50 rows affected)
Table 'OrderGroup_PurchaseOrder'. Scan count 0, logical reads 162, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table '#BA76F977'. Scan count 1, logical reads 8386, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'OrderGroup'. Scan count 1, logical reads 356, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

versus

(50 rows affected)
Table 'OrderGroup'. Scan count 1, logical reads 356, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'OrderGroup_PurchaseOrder'. Scan count 1, logical reads 143, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

The first has slightly more logical reads on OrderGroup and OrderGroup_PurchaseOrder, but significantly more in a temp table (which is, inside the ecf_splitlist function).

The moral of the story? Execution plan is helpful, but not to compare query to query. In most cases, IO statistics are much more useful.

Migrate Catalog content properties

A colleague asked me yesterday – how do we migrate properties of catalog content. There is, unfortunately, no official way to do it. There are several unofficial ways to do it, however. Today we will explore the way I personally recommend – for its safety and backward compatible.

Let’s say we have FashionProduct with a MSRP property with type of Money, now we would want to change it to Decimal . There are a some hacky ways to do this, but all of them require direct database manipulation which we should try to avoid – if possible.

First we will need this piece of code. it was “stolen” from a colleague of mine and has been used for countless times. You probably want to bookmark it as it’ll likely be useful in the future (I should probably do so myself as I have to find it every time I need). It is a snippet to traverse the catalog structure based on the content type you’d want.

public virtual IEnumerable<T> GetEntriesRecursive<T>(ContentReference parentLink, CultureInfo defaultCulture) where T : EntryContentBase
    {
        foreach (var nodeContent in LoadChildrenBatched<NodeContent>(parentLink, defaultCulture))
        {
            foreach (var entry in GetEntriesRecursive<T>(nodeContent.ContentLink, defaultCulture))
            {
                yield return entry;
            }
        }

        foreach (var entry in LoadChildrenBatched<T>(parentLink, defaultCulture))
        {
            yield return entry;
        }
    }

    private IEnumerable<T> LoadChildrenBatched<T>(ContentReference parentLink, CultureInfo defaultCulture) where T : IContent
    {
        var start = 0;

        while (true)
        {
            var batch = _contentLoader.GetChildren<T>(parentLink, defaultCulture, start, 50);
            if (!batch.Any())
            {
                yield break;
            }

            foreach (var content in batch)
            {
                // Don't include linked products to avoid including them multiple times when traversing the catalog
                if (!parentLink.CompareToIgnoreWorkID(content.ParentLink))
                {
                    continue;
                }

                yield return content;
            }
            start += 50;
        }
    }

To make sure we don’t load to many content at once, the batch is set size 50 but that is of course configurable (up to you)!

Now the fun part, where it actually does the work. Once we have the content, we will need to actually migrate the data, it is can be simple as this

private void MigrateProperty<T>(IEnumerable<T> contents) where T: EntryContentBase
{
      var batch = new List<T>();
      foreach(var content in contents)
      {
           var writeableClone = content.CreateWriteableClone<T>();
           Transform(writeableClone);
           batch.Add(writeableClone);
      }
      _contentRepository.Publish(batch, PublishAction.SyncDraft);
}

With the Transform method you can do whatever you want with the property value. As you might just want to rename it – it can do nothing except assign value to the new property. Or in the case we mentioned at the beginning, convert Money to Decimal is an easy task (Money is the less precision version of Decimal). Note that if you convert between data types, for example from double to int , there are potential data loss, but you are probably aware of that already.

The final step is to publish the change. For performance reasons, it is probably the best that you the Publish extension method of IContentRepository and save multiple content in one batch – may of of size 50 or 100. Those will skip things like creating new versions for optimal performance. You can read it about here New simple batch saving API for Commerce | Optimizely Developer C

The remaining question is where to put it. In a perfect world, I’d say in a migration step (i.e. a class that implement IMigrationStep ), so you ensure that your data will be properly migrated before anything else run, for example your new code that access the new property, or indexing of your content after migration. But if you have a sizeable catalog, this will take time and it might not be a good idea to let your users wait for it to complete. For that, it makes senses to do this in a schedule job and when it completes, you make a switch.

Migrating properties is not an easy or quick task, but it can be done with relative ease. It also reminds us about modeling – try to get it right from beginning so we don’t have to migrate. In the end, the fastest code is the code that does not need to be run!

Green coffee bean suppliers in EU

Why the criteria

FOB Price: one of the important aspect of specialty coffee is fair trade – farmers produce high quality coffee, and they should get fair price for what they harvest. FOB Price is basically what the importer (in many cases, the supplier themselves) pays for the coffee. It’s a great deal of transparency. And while it’s not required, I would be happy to know what the farmers are getting.

Cupping score: Coffee that is rated as specialty has to at least 80 cupping score by SCA (it’s not the only criteria, but an important one). If you are paying 200/20 EUR or more per kg for green coffee beans, you should know what you get. While a higher cupping score is not necessarily more expensive, it depends a lot on the supply-demand balance, but I’d be happier to pay more money for higher cupping score.

Year of harvest: Green coffee beans have much longer shelf life than roasted coffee ones. While for roasted, you should finish your bag within 2 months (and some purists might say just 1 month), green beans can be keep for 12 months in proper storage conditions, if not more. But they do not last forever, the fresher the beans = the better, which is why it’s important for the supplier to disclose the year of harvest (or even, month of harvest)

Suppliers in Sweden

I started with Whileelkcoffee and Rawcoffee. Too early to tell about the bean’s quality, but I must say I really like Whiteelkcoffee’s packaging. They are a branch/subsidiary of Kafferosteriet Koppar AB, which has some great coffees.

It is told that almost any roasters will sell your raw coffee if you ask nicely, so you probably ask your favorite roaster if they have green beans to sell. The commonly accepted “rule” is green coffee is 1/2 price of roasted ones, so you might offer that to the roasters.

I would recommend to buy from suppliers that focus on coffee beans – they usually know more about their stuffs – than the suppliers that sell general groceries. But you can always try your luck, maybe there is a hidden gem somewhere.

SiteUrlFree shipping thresholdBag sizeFOB priceCupping ScoreYear of HarvestNote
White elk coffeeButik: Råkaffe – White Elk Coffee350kr1, 10kgNoNoNo10% for order over 1000kr
Raw coffeeKaffebönor – RawCoffee350kr1kgNoYesYes10% coupon code for next order
Drop coffeeBuy Coffee – scroll down to buy your coffee – Drop Coffee300kr1kgYesNot specific, 86+YesSelect unroasted coffee from drop downlist
Muttley & Jack’sUrsprungskaffe – Muttley & Jack’s (muttleyandjacks.se)No free shipping, 69kr regardless of order value1kgNoYesNo
BaristashoppenRÅKAFFE – ROSTA SJÄLV – Baristashopen350kr250gr, 1kgNoNoNo
MoccazinoHuvudkategori 3 (moccazino.se)No free shipping, min 229kr, shipping cost increased with weight0,75kgNoNoYes
Humle gårdenGreen coffee for those who roast their own coffee – Humlegårdens EkolagerNo free shipping, 99kr shipping, increased with weight500grNoNoYesReduced price for 10 bags

Suppliers outside Sweden

Of course I would prefer buying from Sweden for obvious reasons. But I can’t resist the urge to try some exotic beans, so I have been looking around to see where I can source my green beans from EU.

This is inspired by EU Green Coffee Sources List 2023 (home-barista.com), but with more information you can find in one place. Sadly many of them are are not shipping to Sweden, but you probably can arrange some shipping if you ask nicely.

SiteUrlCountryShip to SwedenShipping costFOB priceCupping scoreBag sizesYear of harvest
fleurdecafeUnroasted coffee beans – Green coffee beans | Fleur de café (fleurdecafe.nl)NetherlandsNoN/ANoNo
groenekoffiewinkelUnroasted Coffee Beans | Up To 15% Discount At Green Coffee Shop (groenekoffiewinkel.nl)NetherlandsNoN/A
fascino-coffeeGreen Coffee Beans | Coffee roasting company Fascino (fascino-coffee.com)NetherlandsNoN/A
godincoffeeUnroasted coffee beans – Green coffee beans | Fleur de café (fleurdecafe.nl)NetherlandsNo
roastrebelsBuy green coffee in small quantities (1kg / 5kg / 15kg) | Roast RebelsGermany/SwizerlandYes9.9EUR, free if order value is 120EUR or moreNoNo1, 5, 15kgYes
rohebohnenGreen Coffee | Rohebohnen.deGermanyYes13.9EURNoNo500gr, 1kgNo
hoofdkwartierUnroasted Coffee Beans Archives – Headquarters Coffee Roasting (hoofdkwartier-koffiebranderij.nl)NetherlandsNo
ShokuninGreen coffee – ShokuninNetherlandsYes22 EUR, free if order value is 200 EUR or moreNoNo
RoestartRohkaffee (roestart.de)GermanyYes15.9 EURNoNo500gr, 1kgNo
goodkarmacoffeeRohkaffee – Good Karma CoffeeGermanyNo
docklandsRohkaffee kaufen und selber rösten (docklands-coffee.de)GermanyYes16EURNoNo1,5 and 10kgYes
rjavitukanSurova kava Archives – Rjavi tukanSlovakiaYes21.9EURNoNo1,3,5, and 10kgNo
koffiebranderijdekoepoortGROENE ONGEBRANDE KOFFIEBONEN – Koffiebranderij de KoepoortNetherlandsYesFlat rate 24.5 EURNoNo1kgNo
Falcon-microShop Europe – Falcon Micro (falcon-micro.com)UK (EU warehouse)Yes246kr for minimum 5kg, increased with weightNoYes5 and 10kgNo
88 grainsShop | 88 GrainesFranceYes6.9EUR minimum, increased with weight, free for 300 EUR+NoYes1kgYes (Month/year)
kaffeboxKaffeabonnement med Skandinavias topp mikrobrennerier – KaffeBox.noNorwayYesMinimum 62.5NOK, increased with weightNoNo250grNo
risterietGrønne/Rå kaffebønner – Risteriet.dk WebshopDenmarkYesNo, 350DKKNoNo1kg, 5kgNo

Building a better wish list – part 1

If you have been using Optimized Customized Commerce, you probably know that, by default, wish list is just a cart with a special name. Can you guess the name? Surprise, surprise, it’s “Wishlist”. It’s been there since forever, from the early day of Mediachase, and then carried over to the new serializable cart. I have been “fine” with it – i.e. I accept the approach unconsciously. But until very recently I realized there are several problems with the approach.

How come it is not a very good idea?

First of all, it shares same table as the normal cart. To search for abandoned carts, you would have to skip the carts with “wishlist” name. There are only a few cart names and they are not evenly distributed, you will have hard time filtering carts by their names.

But there is more. As most customers are using the serializable cart mode now, ever growing wishlists also pose a another problem – each operation on the wishlist – adding or removing item, will result in a big write to the SerializableCart table. If you have just a few items, it might be fine, but a simple test on Commerce shows that with only 9 items in wishlist, the Data column is more than 2700 characters. And wishlists are meant to be kept forever – they will only grow in size.

My saved for later on Amazon – which is the closet thing to a “wish list”. Imagine having that on Optimizely Customized Commerce.

As wishlists are carts, they have to be in same format even though a lot of them are redundant/unnessary.

The biggest benefit, and I think it triumphs all other disadvantages we have listed, of the default wishlist implementation is it’s built-in. You can start using it without almost no additional effort. Get a cart with the predefined name and you are good to go. Building a different wish list definitely costs time and resource, a luxury not everyone can afford.

For that, I have been starting building a wish list service on my free time. I plan to make it open source when the time is right, but we’ll see about that.

Moral of the story

  • It is critical to take a step back, from time to time, to think about what you have done. Things might make less senses when you see it from a different perspective.
  • You can almost always do better.

Lelit Bianca v3 vs ECM Synchronika vs Profitec Pro 700

If you are looking for an espresso machine with range of $3000 (or around €2500 if you are in EU – this is one of the wins for European), you will most likely come to battle of these three. They are probably the most popular options in this price range, and rightly so. The prices are fairly comparable, with Profitec Pro 700 is the cheapest in the US (around $200), and Lelit Bianca v3 is the cheapest in the EU (also around €200). I did quite intensive research on the topic, and finally come to the conclusion (spoiler alert, in the end of this post).

If you haven’t known already, Profitec is a subsidiary of ECM. Pro 700 is still made in Milan, Italy, but it shares a lot of design with ECM Synchronika. Basically two sibling except for some cosmetic difference. I would expect them to perform very similar. For easier comparison, I will compare Bianca and Synchronika. Let’s go through pros and cons of each, and hopefully it will help you come to a decision

They are very similar espresso machines on definitions. Both are E61, dual boiler machines that target home enthusiasts.

Build quality

This is no contest. Synchronika is a clear winner, Pro 700 a second and Bianca comes last. It is not only that Synchronika has better fit and finish, it has clear internal layout which is like an engineer’s dream. Whole Latte Love has several dive in videos for that, and it means if you ever need to service your Synchronika yourself, you will easily know where to go and what to check/change

Bianca has less fit and finish, and its internal is pretty cramped – more on that below

To be very clear Bianca’s build quality is definitely more than decent, and it would last you a very long time with proper care. The cramped inside has two reasons – due to its smaller size and more features.

Size and look

Of all three, Lelit Bianca is smallest, and is the only one come with wood (walnut) knobs and wands finish by default, while the others come with hard black plastic . It is only 29cm wide and only 40cm deep. Both ECM and Profitec are noticeable larger, with the former is 33.5cm wide and 49cm deep, and the latter 34cm and 47cm, respectively.

While look is definitely subjective – make sure each of these machine can fit into your coffee station, either that is under your cupboard or otherwise. One of the biggest selling point of Bianca is the moveable water tank, you can put it behind, left or right. All three can be plumbed and you can put your days of refilling water behind you, but sometimes plumbing is not an option, and being able to move the tank is a huge plus. As it’s my biggest complaint of Lelit Elizabeth. Now it can be solved easily.

Start up time

If you have very stable schedule every day, start up time might not be of your concern, you can use a smart power plug and schedule it to turn on your coffee machine every day at a fixed time. but let’s be very clear here: all these machines take a significantly long time to be fully heated. Not only they have to heat up both boilers to temp and let them stabilize, they also need to heat up the E61 group head via thermosiphon (baristahustle explains it in great detail here EM 3.04 How the E61 Thermosyphon Works – Barista Hustle – but basically, let hot water flow through the head to heat it up). The E61 is very heavy, like 4kg heavy, so it’s important to make it hot, so the water does not lose too much temperature during brewing.

Synchronika takes significantly longer to heat up. By the test of kaffemacher, it takes a whoping 35 minutes to be able to pull 5 shots without failing (not reaching targeted temp)

That is double of what Bianca v3 needs

That means you can start pulling shorts 16 minutes faster on Bianca. That’s impressive. If you want to brew lighter roasts which need higher temps, say, 96*C, unofficial and unscientific tests showed that Bianca is ready in even shorter time (12 minutes), based on the indication on the PID. It’s not breaking any records, but for E61, that’s nothing to be sniffed at.

Temperature stability

This is one interesting test. Thanks to kaffemacher we have measurements from both machines, and it’s a tie

  • ECM Synchronika has more stability during the shot. i.e. with a 25s shot, the temp between 5s and 25s remains a more straight line (albeit hotter toward the end). With shot after shot however, it tends to be under temp after being idle for some time
  • Lelit Bianca has more stability between shots. Temp within shot is fluctuated a bit, but does not rise up as much as with Synchronika. You can, however, adjust the PID with settings like temp offset to have even better temp stability, especially after your machine has been idling for a while.

Features

Bianca hand down.

Bianca comes with the default flow control by default. ECM Synchronika and Profitec Pro 700 can be retrofitted with the E61 flow control package, which cost you somewhere $200 more, plus installation. As most people has commented, Bianca flow control feels natural and nicer to use. That is of course subjective, but it is not too surprising. The main difference is that Bianca flow control has ~200 degree travel from fully open to fully close, while the E61 flow control is ~720 degree. The former allows some more fine tuning, but it is less intuitive to use.

Bianca can pre-infuse even with water tank, while ECM and Profitec need plumbed in to pre-infuse (using line pressure). Bianca v3 has low flow settings which make pre infusion even more flexible. You can pre-infuse in any way you like.

Lelit is also known to make their Lelit Control Center – LCC settings available to end users and you can fine tune your machine even further. Most notably, the temp offset (between the boiler, and the targeted temp at group head), so you can fine tune your brew temp to what you would like.

Conclusion

When I bought my Lelit Elizabeth, I thought about Bianca as something I wanted but couldn’t get, and if I upgrade, I would pick it. After two years, when I finally decided to upgrade, for some reasons I skipped Bianca. I almost decided to go with Synchronika but slowly and steadily Bianca won me back. And I will be soon one of its owners.

With that said, you can’t go wrong with each option. Those three are the most popular options in their price range, and there’s reason for that – they are that good.

Optimizing an interesting query

It’s not a secret, I love optimizing things. In a sense, I am both an Optimizer (literally) and an optimizer. And today we will be back to basic – optimizing a tricky SQL query.

The query in question is this particular stored procedure ecf_CatalogNode_GetAllChildNodes, this is used to get all children nodes of specific nodes. It is used in between to find all entries that are direct, or indirect children of specific nodes. Why, you might ask, because when you change the url segment of the node, you want to make sure that all entries that are under that node, will have their indexed object refreshed.

Let’s take a look at this stored procedure, this is how it looks like

CREATE PROCEDURE [dbo].[ecf_CatalogNode_GetAllChildNodes]
    @catalogNodeIds udttCatalogNodeList readonly
AS
BEGIN
    WITH all_node_relations AS 
    (
        SELECT ParentNodeId, CatalogNodeId AS ChildNodeId FROM CatalogNode
        WHERE ParentNodeId > 0
        UNION
        SELECT ParentNodeId, ChildNodeId FROM CatalogNodeRelation
    ),
    hierarchy AS
    (
        SELECT 
            n.CatalogNodeId,
            '|' + CAST(n.CatalogNodeId AS nvarchar(4000)) + '|' AS CyclePrevention
        FROM @catalogNodeIds n
        UNION ALL
        SELECT
            children.ChildNodeId AS CatalogNodeId,
            parent.CyclePrevention + CAST(children.ChildNodeId AS nvarchar(4000)) + '|' AS CyclePrevention
        FROM hierarchy parent
        JOIN all_node_relations children ON parent.CatalogNodeId = children.ParentNodeId
        WHERE CHARINDEX('|' + CAST(children.ChildNodeId AS nvarchar(4000)) + '|', parent.CyclePrevention) = 0
    )
    SELECT CatalogNodeId FROM hierarchy
END

I previously wrote about the relations between entities in Commerce catalog, here Commerce relation(ship), a story – Quan Mai’s blog (vimvq1987.com) , so relations between nodes can be a bit complicated – a node can have one true parent defined in CatalogNode table, and then other “linked” nodes in CatalogNodeRelation . So to find all children – and grand children of a node, you need to get from both.

Getting children of a node from CatalogNode or CatalogNodeRelation is simple, but things become more complicated when you have to get grandchildren, then great-grandchildren, and so on, and so forth. with that, CTE needs to be used in a recursive way. But then there is a problem arises – there is a chance, small, but still, that the data was added in a correct way, so circular reference is possible. i.e. A is a parent of B, which is a parent of C, and itself is a parent of A. To stop the SP from running forever, a check needs to be added to make sure any circular reference is cut short.

This brings back memory as the first ever support case I worked on at Optimizely (then Episerver) was with a circular reference. The site would crash whenever someone visited the catalog management in Commerce Manager. That was around June, 2012 (feeling old now?). My “boss” at that time involuntarily volunteered me for the case. See what you made me do, boss.

Now you can grasp the basic of what the SP does – let’s get back to the original problem. it’s slow to run especially with big catalog and complex node structure. As always, to optimize everything you need to find the bottleneck – time to fire up SQL Server Management Studio and turn on the Actual Execution Plan

I decided to go with 66, the “root” catalog node. this query yield around 18k rows

declare @Nodes udttCatalogNodeList 

insert into @Nodes (CatalogNodeId) select 66

exec ecf_CatalogNode_GetAllChildNodes @Nodes

and also 18s of execution.

Mind you, this is on my machine with pretty powerful CPU (AMD Ryzen 7 5800x, 8 cores 16 threads), and a very fast nvme PCIe SSD (Western Digital Black SN850 2TB). If this was executed on Azure Sql database for example, a timeout is almost certainly guaranteed. So time of execution should only be compared relatively with each other.

If we look at the execution plan, it is quite obvious where the bottleneck is. A scan on CatalogNode table is heavy (it read 79M rows on that operation). As suggest by Anders from Timeout when deleting CatalogNodes from a large catalog (optimizely.com), adding a non clustered index on ParentNodeId column would improve it quite a lot. And indeed it does. The execution time is reduced to 5 second.

And the number of rows read on CatalogNode reduced to just 17k

This is of course a very nice improvement. But the customer reported that it is not enough and the SP is still giving timeout, i.e. further optimization is needed.

Naturally, the next step would be to see if we can skip the circular check. It was added as a safe measure to avoid bad data. It should not be there, as the check should be performed at data modification. But it is there for historical reasons and we can’t just change it, not trivially. So let’s try it for our curiousity.

The modified query looks like this (basically just commented out any code related to the CyclePrevention

ALTER PROCEDURE [dbo].[ecf_CatalogNode_GetAllChildNodes]
    @catalogNodeIds udttCatalogNodeList readonly
AS
BEGIN
    WITH all_node_relations AS 
    (
        SELECT ParentNodeId, CatalogNodeId AS ChildNodeId FROM CatalogNode
        WHERE ParentNodeId > 0
        UNION
        SELECT ParentNodeId, ChildNodeId FROM CatalogNodeRelation
    ),
    hierarchy AS
    (
        SELECT 
            n.CatalogNodeId
			--, '|' + CAST(n.CatalogNodeId AS nvarchar(4000)) + '|' AS CyclePrevention
        FROM @catalogNodeIds n
        UNION ALL
        SELECT
            children.ChildNodeId AS CatalogNodeId
			--, parent.CyclePrevention + CAST(children.ChildNodeId AS nvarchar(4000)) + '|' AS CyclePrevention
        FROM hierarchy parent
        JOIN all_node_relations children ON parent.CatalogNodeId = children.ParentNodeId
        --WHERE CHARINDEX('|' + CAST(children.ChildNodeId AS nvarchar(4000)) + '|', parent.CyclePrevention) = 0
    )
    SELECT CatalogNodeId FROM hierarchy
END

And the improvement is quite impressive (more than I expected), the query completes almost instantly (less than 1s). The read on CatalogNodeRelation significantly reduced

A word of warning here, execution plan can’t be simply compared as-is. If I run two versions side by side, it gives quite misleading comparison

Even though the top one (without the circular reference check) is much faster than the original (the bottom one), SQL Server estimates that the first is slower (almost 2x slower than the second). So execution plan should be used to see what has been done and what is likely the bottleneck inside a query, it should not be used as comparison between queries. In most cases, comparing statistics using set statistics io on is the best way to compare.

If not for the fact that we are changing the behavior of the stored procedure, I would be happy with this approach. The chance of running into circular reference is small, but it is not zero. As we said, we can in theory gating the relation during insert/updating, but that would be too big a change to start with. This is one of constraint as we work at framework level – we have to step carefully to not break anything. A breaking change is bad, but a data corruption is simply unacceptable. I spent a few hours (probably more than I should) trying to optimize the circular reference check, but no better solution is found.

The next approach would be – as we can guess, to make sure that we get rid of the Clustered Index Scan happened on the CatalogNodeRelation table. The solution would be quite simple, a non clustered index on the `ParentNodeId should be enough.

Great success. The performance is comparable with the “non circular reference check” approach.

As adding an index is a non breaking change (and albeit in some cases it can cause performance regression, like in A curious case of SQL execution plan – Quan Mai’s blog (vimvq1987.com) , but it is rare, also, in this case the cardinality of the ParentNodeId is most likely quite well distributed).

That is all for today. Hopefully you learn one thing or two about optimizing queries in your daily works.

Delete orphaned assets

I was asked this question: we have about 3TB of assets, any way to clean it up.

These days, storage is cheap, but still not free. and big storage means you need space for back up. and with that, bandwidth and time.

Is there away to clean up things you no longer need?

Yes!

Optimizely Content already has a scheduled job named Remove Abandoned BLOBs, but this job only removes the blobs that have no content associated. I.e. the content is deleted by IContentRepository.Delete but the blob was left behind. The job uses the log to find out which content were deleted, then find those blobs.

How’s about the assets that still have contents associated with them, but not used anywhere? Time to get your hands dirty!

Due to the nature of this task, it is best to make it a scheduled job.

All of the assets are children under the global asset root. By iterating over them, we can check if each of them is being used by another content. If not, we will add them to a list for later delete. Before deleting the content, we will find the blob and then delete it as well. Easy, right?

To get the content recursively we use this little piece of code

        public virtual IEnumerable<T> GetAssetRecursive<T>(ContentReference parentLink, CultureInfo defaultCulture) where T : MediaData
        {
            foreach (var folder in LoadChildrenBatched<ContentFolder>(parentLink, defaultCulture))
            {
                foreach (var entry in GetAssetRecursive<T>(folder.ContentLink, defaultCulture))
                {
                    yield return entry;
                }
            }

            foreach (var entry in LoadChildrenBatched<T>(parentLink, defaultCulture))
            {
                yield return entry;
            }
        }

        private IEnumerable<T> LoadChildrenBatched<T>(ContentReference parentLink, CultureInfo defaultCulture) where T : IContent
        {
            var start = 0;

            while (!_isStopped)
            {
                var batch = _contentRepository.GetChildren<T>(parentLink, defaultCulture, start, 50);
                if (!batch.Any())
                {
                    yield break;
                }
                foreach (var content in batch)
                {
                    // Don't include linked products to avoid including them multiple times when traversing the catalog
                    if (!parentLink.CompareToIgnoreWorkID(content.ParentLink))
                    {
                        continue;
                    }

                    yield return content;
                }
                start += 50;
            }
        }

And we will start from SiteDefinition.Current.GlobalAssetsRoot, and use IContentRepository.GetReferencesToContent to see if it is used in any content (both CMS and Catalog). If not, we add it to a list. Later, we use IPermanentLinkMapper to see if it has any blob associated, and delete that as well

            foreach (var asset in GetAssetRecursive<MediaData>(SiteDefinition.Current.GlobalAssetsRoot, CultureInfo.InvariantCulture))
            {
                totalAsset++;
                if (!_contentRepository.GetReferencesToContent(asset.ContentLink, false).Any())
                {
                    toDelete.Add(asset.ContentLink.ToReferenceWithoutVersion());
                }

                if (toDelete.Count % 50 == 0)
                {
                    var maps = _permanentLinkMapper.Find(toDelete);
                    foreach (var map in maps)
                    {
                        deletedAsset++;
                        _contentRepository.Delete(map.ContentReference, true, EPiServer.Security.AccessLevel.NoAccess);
                        var container = Blob.GetContainerIdentifier(map.Guid);
                        //Probably redundency, can just delete directly
                        var blob = _blobFactory.GetBlob(container);
                        if (blob != null)
                        {
                            _blobFactory.Delete(container);
                        }
                        OnStatusChanged($"Deleting asset with id {map.ContentReference}");
                    }
                    toDelete.Clear();
                }
            }

We need another round of delete after the while loop to clean up the left over (or if we have less than 50 abandoned assets)

And we’re done!

Testing this job is simple – uploading a few assets to your cms and do not use it anywhere, then run the job. it should delete those assets.

Things to improve: we might want to make sure only assets that created more than a certain number of days ago are deleted. This allows editors to upload assets for later uses without having to use them immediately.

The code has been open sourced at vimvq1987/DeleteUnusedAssets: Delete unused assets from an Optimizely/Episerver site (github.com) , and I have uploaded a nuget package to Packages (optimizely.com) to be reviewed.

Index only Catalog content

If you are using Find to index your content, you likely have used the Find Indexing job – which would index everything in one go. Today I stumped upon this question – A way to run indexing job for Commerce only | Optimizely Develope – and it is a good one – if you have many of content in CMS side, and they don’t change that often, if at all – you certain don’t want to waste time and resource in trying to reindex them again. Is there away to just index catalog content?

Yes, there is. It is a bit hacky solution, but it can certain work. But first, let’s dive in on how Find indexing job does it work. It relies on IIndexingJobService , which itself relies on ContentIndexer to do the job. In its turn, ContentIndexer uses a list of IReindexInformation to know which content to index, and in which languages. Here’s what it looks like

    public interface IReindexInformation
    {
        /// <summary>
        /// Content links to be reindexed.
        /// </summary>
        IEnumerable<ReindexTarget> ReindexTargets { get; }

        /// <summary>
        /// Gets the root to index.
        /// </summary>
        ContentReference Root { get; }
    }

It has one Root, and multiple ReindexTarget, which contains

    public class ReindexTarget
    {
        /// <summary>
        /// The content references.
        /// </summary>
        public IEnumerable<ContentReference> ContentLinks { get; set; }

        /// <summary>
        /// The languages the collection of <see cref="ContentReference"/> are enabled on.
        /// </summary>
        public IEnumerable<CultureInfo> Languages { get; set; }

        /// <summary>
        /// The site that the collection of <see cref="ContentReference"/> appears on
        /// or <c>null</c> if unknown.
        /// </summary>
        public SiteDefinition SiteDefinition { get; set; }
    }

As you might have guessed, Commerce has its own IReindexInformation to index catalog content. If we can only use that to run our job. This is how our “hack” begins

The interface IContentIndexer has no method to control the IReindexInformation`, but the default implementation ContentIndexer does. We set it to the only one we need, so here it is

        List<IReindexInformation> targets;
        var contentIndexer = _contentIndexer as ContentIndexer;
        if (contentIndexer != null)
        {
            targets = contentIndexer.ReindexInformation.ToList();
            var commerceReIndexInformation = targets.FirstOrDefault(x => x.GetType() == typeof(CommerceReIndexInformation));
            contentIndexer.ReindexInformation = new List<IReindexInformation>() { commerceReIndexInformation };
            _indexingJobService.Start(OnStatusChanged);

            contentIndexer.ReindexInformation = targets;
        }

A note is that you will still see the “Indexing Global assets and other data” message, because IIndexingJobService implementation will go through all SiteDefinition regardless and show that message, but the internal ContentIndexer will skip if the SiteDefinition passed to it does not match the SiteDefinition in the IReindexInformation (and for CommerceReIndexInformation it’s SiteDefinition.Empty

As I mentioned in the beginning, this is a bit hacky solution, as you have to cast IContentIndexer to its concrete implementation. The proper solution would be implement IContentIndexer yourself. Given that’s not a trivial job, I’ll leave at that.

Loading the contacts/organizations, the right way

If you have been using Business Foundation, you most likely know about a limitation – you can only load the first 1000 objects using the GetXXX methods. For example, by using CustomerContext.Current.GetOrganizations(), you can load the first 1000 organizations. In theory, you can get more objects by changing the value of MaxObjectsList. However, changing that has consequences. Changing that will affect all types of objects, including contacts, organizations, and your custom objects. Also, loading too much in one go is almost never a good idea.

Is there a better way?

Yes, of course – which is why we have this blog post

There is a “hidden” method from base class of Business Foundation – BusinessManager that takes paging parameters

public static EntityObject[] List(string metaClassName, FilterElement[] filters, SortingElement[] sorting, int? start, int? count)

You will need to convert the results to the type you want. Note that all Business Foundation objects are inherited from EntityObject. So if you want to get the contacts by paging, it would look like this:

                var contacts = BusinessManager.List(ContactEntity.ClassName, new FilterElement[0], new SortingElement[] { new SortingElement(sortField, sortType) }, startIndex, recordsToRetrieve)
                .OfType<CustomerContact>();

Let’s go through the parameters one by one.

  • The first you need is the class name of your objects. For contacts, you can use ContactEntity.ClassName as shown above. For organizations, OrganizationEntity.ClassName
  • Next one is the filter. As you are trying to load all objects, you can just pass in an empty (but not null) instance – new FilterElement[0]
  • Third one is how you want to sort it. If you pass an empty array, it will be sort by default. If you want to sort by Name for example, set your sortField to Name and sortType to one of SortingElementType (Asc or Desc)
  • Forth and fifth ones are what we are looking for, they’re simply paging parameters – which position to start getting, and how many objects to get. Combine this with a simple while loop, you can get all of your Business Foundation objects.

And that’s about it, my friends.

What’s about caching?

Caching with list is always tricky – as you have to keep track of each item in the list to make sure you invalidate the list cache if one of the item is changed (updated/removed). For the purpose of just loading all contacts/organizations, it is probably better to just skip caching, for simplicity.