Performance Comparison - OFFSET/LAG/PREVIOUS/LOOKUP
Was looking to get some input on a calculation scenario as community pages were a bit vague (or I'm using the wrong keywords).
Now, I'm sure most of us understand how Anaplan thrives on being multi-threaded where possible, but when it comes to performing time-based calculations, there's a variety of ways to do the same thing.
For example, when attempting to retrieve a value from a previous period, you can use,
OFFSET(Source Line, -1)
LAG(Source Line, 1, 0)
Source Line[LOOKUP:Previous Month Lookup]
I'm conscious that in some cases, particularly when the line item is a circular reference (i.e. referencing a previous period of itself), the LOOKUP option isn't possible.
My understanding is that with exception of the LOOKUP method, all other calculations were serial-type calculations which would infer significantly slower performance when used repeatedly (versus LOOKUP), but I was curious whether there's any actual bench-marking into how efficient (or rather, inefficient) each version is.
Reason for my question is that our team is currently undertaking a review of our models/model performance and are attempting to optimise where possible.
I've attached the results, and it backs up my hypothesis
I ran the test on an example model with the following metrics and formulae:
Module = 1000 Accounts * 100 Countries (with Top Levels) by week for 1, 2, 3, 4 and 5 years
Previous = PREVIOUS(Data)
Offset = OFFSET(Data, -1,0)
Lag = LAG(Data,1,0,STRICT) - Note the use of the Strict Parameter - This is useful to prevent "false" dependencies from negative parameters (very helpful to remove circular references)
Lookup = Data[LOOKUP:Time Setting.Prior Period]
As expected the volume of calculation for Previous is the smallest and the efficiency increases as the timescale (and number of time periods) increases. Also, as expected OFFSET and LAG are pretty much the same, with LOOKUP performing the worse of the four.
This is because of the way Anaplan is architected. The Directed Acyclic Graph (DAG) is at the heart of the Hyperblock and provides the indexing for the calculations. Each block is indexed so the engine knows that the calculate on each cell change.
Each time member is it's own block, so the notion on Previous is an inbuilt dependency. This is more efficient than deriving a dependency through OFFSET or LAG. LOOKUP is worse because the Engine needs to "look" for the LOOKUP value and then retrieve the value. This is more work for the engine than a direct "get 1 period" used in the OFFSET and LAG functions. I suspect if I made the OFFSET and LAG parameters an input line item rather than a fixed parameter that they would perform worse than a hard coded parameter
I will caveat by saying that the analysis above is on model opening and only shows the "volume" of calculation and there are no interrelated dependencies; but that said, it is a good guide. If you need some detailed performance analysis on specific calculations or user entries, we are able to provide this service. If this is of interest, please speak to your Business Partner, or Partner Success Manager
I hope this gives you some more insight into how the engine works
Good look with the optimisation
I have heard PREVIOUS is better in performance compared to other time functions. I think you can try to stick to time functions if it is on seeded TIME dimension and ignore LOOKUP.
It's a good question and from a performance perspective, I will find out.
Before I do that though, I will share some thoughts
The PLANS standard has some core principles, Simple is better than complex being one of them. Also although the P stands for Performance, the L and A also stand for Logical and Auditable
So, for me, what is the most logical answer to the requirement "I need the previous period". Answer PREVIOUS()
It is much easier to understand what the formula is doing and it says so!!
OFFSET and LAG are very flexible and can deliver the same result, as you say, but why over complicate things.
Also, As @ArunManickam mentioned, I suspect LOOKUP will be the worst performing; again why make the engine work hard. Using a lookup for each period is likely to mean a lot more work as potentially the value could be any time period, whereas PREVIOUS = the Prior period.
Anyway, I will run some tests and post the results here
Thanks for the responses,
Definitely an advocate of making things simpler, but in our current situation, we're erring more on the side of performance as our current model needs a bit of tuning.
I'm also surprised that LOOKUP would be the worst performer.
My understanding is that using the PREVIOUS function (or the alternatives) effectively predicates a dependency on the previous cell, meaning it wouldn't be able to process a calculation in parallel, whereas the LOOKUP would.
Also, I imagine Anaplan can interpret/detect which cells are dirty resulting only in the impacted cells being updated when using a LOOKUP, versus the entire PREVIOUS string being re-looped (if that makes sense).
@DavidSmith, if you're able to shed some light once you've had a chance to test, it'd be greatly appreciated.
That is some impressive turn around time.
In the context of your comparison, it's actually quite alarming the difference in calculation time between PREVIOUS and LOOKUP by the time we're hitting 5 years, particularly as our model has a 15 year timescale.
From our perspective, I think our next logical step would be to export our Line Items and review the number of instances where we've applied the LOOKUP variant, amend, compare two models (pre & post change) and go from there.
Depending on how we go, I'll see if I can provide some before/after benchmarks to determine (at least anecdotally) what sort of improvement others could expect by making similar changes.
Thank you again @DavidSmith; was exactly the analysis I was after.
No problem, Part of the evolution of PLANS and the Planual is to give definitive guidance where we can!!
Once, we start hitting long timescales, the performance implications do start to kick in!!. As @ArunManickam alluded to, this is another example of where PREVIOUS is efficient compared to CUMULATE for timescales when you start going beyond 1-2 years; but also can depend on the inter-relationships, so often not quite "Black and White"
As mentioned, previously, there are many ways we can model in Anaplan and my favour saying is "Just because you can, doesn't mean you should" - It is a core principle of PLANS.
So my advice to all modellers is look at the problem, ask yourself "what am I trying to achieve" and then look for the the most logical solution
LOOKUP, OFFSET and LAG all have functionality designed for different purposes and are very flexible. They can also provide the same result in different ways. So the important thing is to find the simplest and logical solution; it is often the most performant too!!