Performance Comparison - YEARVALUE(x) vs x[LOOKUP: FY]
Was hoping to get some insight into the performance implications of obtaining the year value of a line using either YEARVALUE(x) or x[LOOKUP: FY].
Both will retrieve the same result assuming summaries/timescales are setup correctly, but they are definitely not calculated the same (based on anecdotal tests). From what I can tell, YEARVALUE/MONTHVALUE (et al) appear to run significantly faster than their LOOKUP equivalent and the difference grows based on the granularity of the timescale.
I presume that using YEARVALUE triggers a calculation to occur within the same block vs LOOKUP requiring additional steps to find the value, but this doesn't appear to be documented anywhere (that I can find).
Was hoping someone had some insight to yay/nay my thoughts.
@luke_e I would say that your analysis is spot on.
I recently had an older model reviewed by the Hypercare team at Anaplan and one of the biggest takeaways is that the line items were using a lookup on time vs using Yearvalue() etc were some of the biggest negative impacts on the model open time analysis, and luckily are some of the easiest ones to fix!
I believe that your take on 'why' is accurate as well, instead of needing to go elsewhere to find what data it needs to pull it is able to use a direct reference with the Yearvalue, etc all within the Time Block.
@luke_e Thanks for the raised topic!
@jasonblinn @luke_e Do you think it is generally true that specific formulas are faster than "general" formulas?
For example, that YEARTODATE will be faster than CUMULATE and even option PREVIOUS(LI)+LI (plus IF to reset the year).
I know that Anaplan uses separate blocks for time calculations. This certainly speeds up the CUMULATE formula, but in the choice between YEARTODATE and PREVIOUS(LI)+LI, the answer is not so obvious.1
Yes! Formulas like YEARTODATE are designed to work in a very specific way, in essence they know what dimensions they're going to operate on and how they're going to calculate on that dimension.
Taking CUMULATE vs PREVIOUS as you've raised, I have this useful diagram to help explain how they work.
The arrows represent how the function is looking at other cells to work out what it needs to calculate the value. CUMULATE will look at all cells, which means the arrows in red are redundant, it decides it does not need those values at this position, but it has to check.
PREVIOUS only looks at the previous value. (I know, didn't really need to explain that one!)
The formula to mimic CUMULATE just pulls from the source cell and the previous cell, so that's 2 "bits of work" compared to the 6 that CUMULATE does.
LOOKUPS have to... look up the data 😂 but to do that the function needs to understand the relationship of the data, what is this mapping, what does it relate to, what's the difference between my source and target cells.
So when considering the use or a certain function you have to try to think about how that may work, the more "generic" the function the more it may have to do in working out the calculation (like LOOKUP can be used with so many variations of dimensions for example).
The fairly recent update to LEAD and LAG to have a STRICT parameter, highlights this too. This was added to improve performance, so when using it with STRICT, LAG will only apply to the past and will have less dependencies to other cells.
@MarkWarren Thank you very much for your reply!
Does this mean that function YEARTODATE will work the same way as "previous" construction (let's say for a time of 1 year, without going through a year), because to calculate they need to "look" the same amount of cells?1
Yes, very similar, it knows to work along the time dimension in a specific order.0
Appreciate the responses @jasonblinn @MarkWarren .
Glad to hear that my peformance metrics weren't lying, hah.
But agreed, fortunately the majority of our x[LOOKUP: FY] usage can be quickly updated using the line item extract except where there's additional parameters involved (which in our scenario is minimal).