Formulas and their effect on model performance

Details of known issues 

Challenge

Recommendations

Performance issues with long nested formulas

Need to have a long formula on time as a result of nested intermediate calculations.

If the model size does not prevent from adding extra line items, it's a better practice to create multiple intermediate line items and reduce the size of the formula, as opposed to nesting all intermediate calculations into one gigantic formula.

This applies to summary formulae (SUM, LOOKUP, SELECT).

Combining SUM and LOOKUP in the same line item formula can cause performance issues in some cases. If you have noticed a drop in performance after adding a combined SUM and LOOKUP to a single line item, then split it into two line items.

RANKCUMULATE causes slowness

A current issue with the RANKCUMULATE formula can mean that the time to open the model, including rollback, can be up to five times slower than they should be.

There is currently no suitable workaround. Our recommendations are to stay within the constraints defined in Anapedia.

SUM/LOOKUP with large cell count

Separate formulas into different line items to reduce calculation time (fewer cells need to recalculate parts of a formula that would only affect a subset of the data).

A known issue with SUM/LOOKUP combinations within a formula can lead to slow model open and calculation times, particularly if the line item has a large cell count.

Example:

All line items do not apply to time or versions.

  • Y = X[SUM: R, LOOKUP: R]
  • Y Applies to [A,B]
  • X Applies to [A,B]
  • R Applies to [B] list formatted [C]

Recommendation:

  • Add a new line item 'intermediate' that must have 'Applies To' set to the 'Format' of 'R'
  • intermediate = X[SUM: R]
  • Y = intermediate[LOOKUP: R]
  •  This issue is currently being worked on by Development and a fix will be available in a future release

Calculations are over non-common dimensions

Anaplan calculates quicker if calculations are over common dimensions. Again, best seen in an example. If you have,
List W, X
Y = A + B
Y Applies To W, X
A Applies To W
B Applies To W
This performs slower than,
Y = Intermediate
Intermediate = A + B
Intermediate Applies To W
All other dimensions are the same as above.
Similarly, you can substitute A & B above for a formula, e.g. SUM/LOOKUP calculations.

Cell history truncated

Currently, history generation has a time limit of 60 seconds set. The history generation is split into three stages with 1/3 of time allocated to each.

  • The first stage is to build a list of columns required for the grid. This involves reading all the history. If this takes more than 20 seconds, then the user receives the message "history truncated after x seconds - please modify the date range," where X is how many seconds it took. No history is generated.
  • If the first stage completes within 20 seconds, it goes on to generate the full list of history. 
  • In the grid only the first 1000 rows are displayed; the user must Export history to get a full history. This can take significant time depending on volume. 

The same steps are taken for model and cell history. The cell history is generated from loading the entire model history and searching through the history for the relevant cell information. When the model history gets too large then it is currently truncated to prevent performance issues. Unfortunately, this can make it impossible to retrieve the cell history that is needed.

Make it real time when needed

Do not make it real time unless it needs to be.

By this we mean, do not have line items where users input data being referenced by other line items unless they have to be. A way around this could be to have users have their data input sections, which is not referenced anywhere, or as little as possible, and, say, at the end of the day when no users are in the model, run an import which would update into cells where calculations are then done. This may not always be possible if the end user needs to see resulting calculations from his inputs, but if you can limit these to just do the calculations that he needs to see and use imports during quiet times then this will still help.

We see this often when not all reporting modules need to be recalculated real time. In many cases, many of these modules are good to be calculated the day after.

Reduce dependencies

Don't have line items that are dependent on other line items unnecessarily.This can cause Anaplan to not utilize the maximum number of calculations it can do at once. This happens where a line items formula cannot be calculated because it is waiting on results of other line items. A basic example of this can be seen with line item's A, B, and C having the formulas:
A - no formula
B= A
C = B
Here B would be calculated, and then C would be calculated after this. Whereas if the setup was:
A - no formula
B = A
C = A
Here B and C can be calculated at the same time. This also helps if line item B is not needed it can then be removed, further reducing the number of calculations and the size of the model. This needs to considered on a case-by-case basis and is a tradeoff between duplicating calculations and utilizing as many threads as possible. If line item B was referenced by a few other line items, it may indeed be quicker to have this line item.

Summary calculation

Summary cells often take processing time even if they are not actually recalculated because they must check all the lower level cells.

Reduce summaries to ‘None’ wherever possible. This not only reduces aggregations, but also the size of the model.

The content in this article has not been evaluated for all Anaplan implementations and may not be recommended for your specific situation.
Please consult your internal administrators prior to applying any of the ideas or steps in this article.
Comments

Informative post! Thank you @DavidSmith.

 

Question for you regarding "Calculations are over non common dimensions":

 

Taken to the logical extreme, if many intermediate line items are used, would this not adversely impact model performance by reducing Anaplan's ability to perform parallel calculations? For instance, if you have many intermediate line items for a formula that could otherwise be done in a single line item, is it possible that performance could be worse?

 

My goal here is to understand if performance is so adversely impacted by calculating over non-common dimensions that I should *never* do it, or whether it depends. If the answer is "it depends", then it would be helpful to understand the nuance, since it's difficult to measure the amount of time associated with the recalculation of a specific complex formula in a project setting where there's a lot going on in a model.

Hi

Thanks for your quesiton and interest in the post

The intermediate steps will calculate in parallel and after that there will be a smaller calculation pulling in the intermediate values to the original line item.

So we have created a non-parallel sequence, but the reduction in calculation time for the intermediate, by reducing the dimensions the line item calculates over, is much greater than the extra step to reference the intermediate calculation.

The other benefit of intermediates is when only one of the values changes only the intermediate is recalculated and the overall calculation is much lower.

So, following the DISCO princples of model building, try and group calculartions together with like structures.  It will give the benefits above and abover it provides a single place for all calculations that relate to the common dimensionality

I hope this helps

David