OEG Best Practice: Formulas and their effects on model performance

AnaplanOEG
edited June 2023 in Best Practices

Details of known issues 

Challenge

Recommendations

Performance issues with long nested formulas

Need to have a long formula on time as a result of nested intermediate calculations.

If the model size does not prevent from adding extra line items, it's a better practice to create multiple intermediate line items and reduce the size of the formula, as opposed to nesting all intermediate calculations into one gigantic formula.

This applies to summary formulae (SUM, LOOKUP, SELECT).

Combining SUM and LOOKUP in the same line item formula can cause performance issues in some cases. If you have noticed a drop in performance after adding a combined SUM and LOOKUP to a single line item, then split it into two line items.

RANKCUMULATE causes slowness

A current issue with the RANKCUMULATE formula can mean that the time to open the model, including rollback, can be up to five times slower than they should be.

There is currently no suitable workaround. Our recommendations are to stay within the constraints defined in Anapedia.

SUM/LOOKUP with large cell count

Separate formulas into different line items to reduce calculation time (fewer cells need to recalculate parts of a formula that would only affect a subset of the data).

A known issue with SUM/LOOKUP combinations within a formula can lead to slow model open and calculation times, particularly if the line item has a large cell count.

Example:

All line items do not apply to time or versions.

  • Y = X[SUM: R, LOOKUP: R]
  • Y Applies to [A,B]
  • X Applies to [A,B]
  • R Applies to [B] list formatted [C]

Recommendation:

  • Add a new line item 'intermediate' that must have 'Applies To' set to the 'Format' of 'R'
  • intermediate = X[SUM: R]
  • Y = intermediate[LOOKUP: R]
  •  This issue is currently being worked on by Development and a fix will be available in a future release

Calculations are over non-common dimensions

Anaplan calculates quicker if calculations are over common dimensions. Again, best seen in an example. If you have,
List W, X
Y = A + B
Y Applies To W, X
A Applies To W
B Applies To W
This performs slower than,
Y = Intermediate
Intermediate = A + B
Intermediate Applies To W
All other dimensions are the same as above.
Similarly, you can substitute A & B above for a formula, e.g. SUM/LOOKUP calculations.

Cell history truncated

Currently, history generation has a time limit of 60 seconds set. The history generation is split into three stages with 1/3 of time allocated to each.

  • The first stage is to build a list of columns required for the grid. This involves reading all the history. If this takes more than 20 seconds, then the user receives the message "history truncated after x seconds - please modify the date range," where X is how many seconds it took. No history is generated.
  • If the first stage completes within 20 seconds, it goes on to generate the full list of history. 
  • In the grid only the first 1000 rows are displayed; the user must Export history to get a full history. This can take significant time depending on volume. 

The same steps are taken for model and cell history. The cell history is generated from loading the entire model history and searching through the history for the relevant cell information. When the model history gets too large then it is currently truncated to prevent performance issues.

Make it real time when needed

Do not make it real time unless it needs to be.

By this we mean, do not have line items where users input data being referenced by other line items unless they have to be. A way around this could be to have users have their data input sections, which is not referenced anywhere, or as little as possible, and, say, at the end of the day when no users are in the model, run an import which would update into cells where calculations are then done. This may not always be possible if the end user needs to see resulting calculations from his inputs, but if you can limit these to just do the calculations that he needs to see and use imports during quiet times then this will still help.

We see this often when not all reporting modules need to be recalculated real time. In many cases, many of these modules are good to be calculated the day after.

Reduce dependencies

Don't have line items that are dependent on other line items unnecessarily.This can cause Anaplan to not utilize the maximum number of calculations it can do at once. This happens where a line items formula cannot be calculated because it is waiting on results of other line items. A basic example of this can be seen with line item's A, B, and C having the formulas:
A - no formula
B= A
C = B
Here B would be calculated, and then C would be calculated after this. Whereas if the setup was:
A - no formula
B = A
C = A
Here B and C can be calculated at the same time. This also helps if line item B is not needed it can then be removed, further reducing the number of calculations and the size of the model. This needs to considered on a case-by-case basis and is a tradeoff between duplicating calculations and utilizing as many threads as possible. If line item B was referenced by a few other line items, it may indeed be quicker to have this line item.

Summary calculation

Summary cells often take processing time even if they are not actually recalculated because they must check all the lower level cells.

Reduce summaries to ‘None’ wherever possible. This not only reduces aggregations, but also the size of the model.

Author David Smith.
Contributing author Guillaume Arnaud.

Comments

  • Informative post! Thank you @DavidSmith.

     

    Question for you regarding "Calculations are over non common dimensions":

     

    Taken to the logical extreme, if many intermediate line items are used, would this not adversely impact model performance by reducing Anaplan's ability to perform parallel calculations? For instance, if you have many intermediate line items for a formula that could otherwise be done in a single line item, is it possible that performance could be worse?

     

    My goal here is to understand if performance is so adversely impacted by calculating over non-common dimensions that I should *never* do it, or whether it depends. If the answer is "it depends", then it would be helpful to understand the nuance, since it's difficult to measure the amount of time associated with the recalculation of a specific complex formula in a project setting where there's a lot going on in a model.

  • Hi

    Thanks for your quesiton and interest in the post

    The intermediate steps will calculate in parallel and after that there will be a smaller calculation pulling in the intermediate values to the original line item.

    So we have created a non-parallel sequence, but the reduction in calculation time for the intermediate, by reducing the dimensions the line item calculates over, is much greater than the extra step to reference the intermediate calculation.

    The other benefit of intermediates is when only one of the values changes only the intermediate is recalculated and the overall calculation is much lower.

    So, following the DISCO princples of model building, try and group calculartions together with like structures.  It will give the benefits above and abover it provides a single place for all calculations that relate to the common dimensionality

    I hope this helps

    David

  • Thanks for this informative post @DavidSmith!

     

    I have a question about something that confuses me. It's best practice to set non-long formulas in modules instead of long ones. It also meets the auditable standard of PLANS.

     

    But if we split the formulas by creating intermediate line items, wouldn't that increase the size of the model while doubling or tripling the amount of cells? From my point of view it might be better to create a bit longer but auditable formulas than to increase remarkably the amount of cells. 

     

    Could you please share your thoughts? I'd appreciate your reply.

     

    Aykut

  • @aykcos - great question. 

    @DavidSmith - do you have a perspective/rationale on increasing the size of the model as a way to improve calculation speed?

    Thanks!

  • @aykcos @adpinto Note that this aticle was originally published in 2018. Since then, there has been some good content published that I think will help with your questions. Please see the below:

    https://community.anaplan.com/kb/articles/80770-2-02-18-break-up-formulas

    I'll also add that model performance has a lot of factors, and the Planual is a set of general guidelines to improve your models with a number of factors in mind. In the real world, you will often need to make conscious tradeoffs in order to meet other contraints, like business requirements, release schedules, or cost.