Hi Team,
I'm hoping to find further information regarding Calculation Waves in regards to best model performance & best practice
I have some models whereby the performance is not optimal & I think this maybe partly due to excess calculation waves. Is there any information regarding what effects waves have on performance?
As an example when you are staging data across a number of modules what impact does having an additional module to stage the data have?
Thanks
Mark
Solved! Go to Solution.
You ask some of the best questions. By calculation waves I assume you're referring to the D.A.G. or the Directed Acyclic Graph (aka the Hyperblock). @DavidSmith gives us a glimpse into this engine in his truth about sparsity, start with Myth#2. This, to me, has been a bit of a controversy because I feel calculation optimization conflicts with the PLANS methodology at times. You have the ability to use D.A.G. multi-processing but you have write your formulas so they aren't dependent on each other.
For example:
In this case, Formula 2 is serialized. "C" must calculate before "D" can calculate. Formula 3, however, can calculate at the same time as formula 1 because it's not dependent.
So by calculation waves, I believe it comes down to the D.A.G. calculating the dependencies and working through the calculations. Try reading @DavidSmith Sparsity article. Hopefully, that will help. Beyond that, we'll have to get David or one of the other pros to jump in.
Thanks @JaredDolich
Yes it's an interesting one & I might be thinking too deeply about it but I remember during 2020 CPX there was a session on Best Practices & they spoke briefly about calculation waves and the effects on performance
I guess I am trying to increase my understanding of the Hyperblock/best practices to determine whether I should be materially focusing on Waves when building a model or if function/formula composition, avoiding text/using Booleans etc & dimensional consistency (referencing modules with the same dimensionality) is more impactful to performance
Thanks
Yeah, I think you'll get no disagreement that Booleans are the way to go. Using system modules, where you do your calculations once and refer to them ongoing is also a HUGE performance boost. Nested IF statements should be broken up and if possible, always have the most likely outcome of the IF statement first so Anaplan can exit the formula as soon as possible.
To that end, the staging modules you refer to, double check that you don't have anything calculating on multiple dimensions that can be calculated on few dimensions.
Lastly, I've only read this, so I don't know if this really helps or not but it sounds like the indexing of the dimensions is based on the order in the "applies to" column. Make sure the modules use the dimensions in the same order in the applies to column. So if you have PRODUCT, LOCATION in one module and LOCATION, PRODUCT in another, try to get them to line up the same. You will have to manually type them in the applies to column to get them to line up though (I learned this the hard way).
Anyway, just some ramblings of things you probably already know. Hopefully, we can get some of the Hyperblock Pros to weigh in.
Agreed with all of those @JaredDolich
I guess I'm trying to determine if all of those standard best practice items that you mention have the same impact of reducing calculation waves
I'm rambling too but just putting the thoughts out there 😁
Mark
Glad that you are asking uncomfortable questions😀
This is a very broad question which needs to be answered in multiple steps. Most of things @JaredDolich has already covered. But let’s understand it wrt the performance of the model. Here is the article by @Griffink which I call as Pure Gold. Just go through it, you will get to know loads of things.
https://community.anaplan.com/t5/Blog/Lionpoint-Group-Enhanced-Anaplan-Model-Performance/ba-p/63465
Feel free to post any further questions.
Note: Go Slow while reading thru the article.
I think the term you use as calculation wave is what jared refers as the algorithm that opens the model.
In which case yes the "order" and references has a lot of importance, as stated in the planual rule don't daisy chain.
Recently I worked on optimizing a model with 12K+ line items and the impact of small calculations can be important in the end depending where they are in the calculation chain.
12k line items! @nathan_rudman 🤕
I was actually referring to how the model performs in general, e.g. how fast it is to move around, open modules etc rather than the initial opening
Echoing the above, in terms of building models, it is best to model formula in the most logical way
Take a simple example:
Revenue = Price * Sales Volume
COGS = Cost Price * Sales Volume
Profit = Revenue - COGS
Margin % = Profit / Revenue *100
So there are 3 "waves" to the calculation
This is logical when looking at the formula and follows best practice to split formulae up into separate line items
However, there are occasions when one of the line items might cause a blockage and prevent downstream calculations - These can often be when many elements of a formula are combined, especially having formulae as parameters in other functions.
We are able to trace these blockages in our lab, so if you think there is a specific problem, contact your Business Partner who can help arrange this
So, taking the example above, if we needed to, we could re-engineer the formula as follows to reduce the number of waves:
This might reduce the calculation time and unblock the dependencies, however, it is not always clear, so as mentioned at the outside, I would not advise trying to pre-empt this when building formulae. @MarkTurkenburg don't over think it. The engine is complex and it will split tasks and utilise the processing power in the most efficient way. The Planual is written to try and work with the engine as much as possible not against, so using best practice should, in the most part, lead to good performance.
I hope this helps
David