Decreasing the length of your formulas

rob_marshall · October 2020

All too often, when analyzing a model, we see very long, convoluted, and complex formulas. These are commonly nested IF THEN ELSE statements and can be in excess of 5,000 characters.

Line item subsets (LISS) can be used to shorten the length of the formula while making the solution more dynamic, scalable, maintainable—and will, in turn, increase the health of the model, as well as the overall performance.

Problem statement: I need to summarize line items from three different source modules by customers, products, and scenarios, but when the version changes, I need the line items to render the proper information for the customer and product.

Model design

Lists:

Transactions—list of all transactions
Company Flat—list of all companies
Product Flat—list of all products
Scenarios—list of scenarios, very easily could have used the native versions
Metrics—list of the metrics needing to be reported (units, price, net amount)

Modules:

Transaction01 Actual Data: loaded from the source system

Transaction02 Plan Data: loaded from the source system

Transaction03 Budget Data: loaded from the source system

Let’s take a look at a way that is possible—but is hard to maintain, performs poorly, and contradicts many Planual rules.

Solution #1:

Create a module dimensionalized by Customer Flat, Product Flat, Scenarios, Metrics, and Time. I am using the Metrics list (comprising of Units, Price, and Net Amount) because I only want to have one line item defined for the final solution.

The formula for the line item Solution #1:

IF ITEM(Scenarios) = Scenarios.Actuals THEN IF ITEM(Metrics) = Metrics.Units THEN 'TRANS01 Actual Data'.'Units - Actual'[SUM: SYS Trans Properties.Company, SUM: SYS Trans Properties.Product] ELSE IF ITEM(Metrics) = Metrics.Price THEN 'TRANS01 Actual Data'.'Price - Actual'[SUM: SYS Trans Properties.Company, SUM: SYS Trans Properties.Product] ELSE IF ITEM(Metrics) = Metrics.Net Amount THEN 'TRANS01 Actual Data'.'Net Amount - Actual'[SUM: SYS Trans Properties.Company, SUM: SYS Trans Properties.Product] ELSE 0 ELSE IF ITEM(Scenarios) = Scenarios.Plan THEN IF ITEM(Metrics) = Metrics.Units THEN 'TRANS02 Plan Data'.'Units - Plan'[SUM: SYS Trans Properties.Company, SUM: SYS Trans Properties.Product] ELSE IF ITEM(Metrics) = Metrics.Price THEN 'TRANS02 Plan Data'.'Price - Plan'[SUM: SYS Trans Properties.Company, SUM: SYS Trans Properties.Product] ELSE IF ITEM(Metrics) = Metrics.Net Amount THEN 'TRANS02 Plan Data'.'Net Amount - Plan'[SUM: SYS Trans Properties.Company, SUM: SYS Trans Properties.Product] ELSE 0 ELSE IF ITEM(Scenarios) = Scenarios.Budget THEN IF ITEM(Metrics) = Metrics.Units THEN 'TRANS03 Budget Data'.'Units - Budget'[SUM: SYS Trans Properties.Company, SUM: SYS Trans Properties.Product] ELSE IF ITEM(Metrics) = Metrics.Price THEN 'TRANS03 Budget Data'.'Price - Budget'[SUM: SYS Trans Properties.Company, SUM: SYS Trans Properties.Product] ELSE IF ITEM(Metrics) = Metrics.Net Amount THEN 'TRANS03 Budget Data'.'Net Amount - Budget'[SUM: SYS Trans Properties.Company, SUM: SYS Trans Properties.Product] ELSE 0 ELSE 0

The good news is, this works. But it violates so many best practices, yet we see this all the time. This is relatively tame compared to what we normally see in that this is only 1,535 characters (with spaces), while we routinely see upwards of 4,000 to 7,000 character formulas.

Let’s look at what is wrong with the formula:

Hard coding (Planual Rule: 2.02-12 – No Hardcoding to List Members)

The formula contains hardcoded references to different lists (Scenarios and Metrics). What if a new Scenario is created or a new Metric is created? The model builder will have to remember every line item that is similar to the above and update it. Similarly, if one of those list members needs to be removed, currently that is not possible because those members are hardcoded to this line item. Additionally, if these lists (Scenario and Metrics) need to be marked as Production Data lists (for ALM), this would not be possible, and you would get the following: error:

To get around the hardcoding, you could use SYS modules with Booleans.

This does remove the hardcoding and makes it a bit more dynamic, but the formula is still not efficient:

IF SYS Scenarios Properties.Actual? THEN IF SYS Metrics Properties.Units? THEN 'TRANS01 Actual Data'.'Units - Actual'[SUM: SYS Trans Properties.Company, SUM: SYS Trans Properties.Product] ELSE IF SYS Metrics Properties.Price? THEN 'TRANS01 Actual Data'.'Price - Actual'[SUM: SYS Trans Properties.Company, SUM: SYS Trans Properties.Product] ELSE IF SYS Metrics Properties.Net Amount? THEN 'TRANS01 Actual Data'.'Net Amount - Actual'[SUM: SYS Trans Properties.Company, SUM: SYS Trans Properties.Product] ELSE 0 ELSE IF SYS Scenarios Properties.Plan? THEN IF SYS Metrics Properties.Units? THEN 'TRANS02 Plan Data'.'Units - Plan'[SUM: SYS Trans Properties.Company, SUM: SYS Trans Properties.Product] ELSE IF SYS Metrics Properties.Price? THEN 'TRANS02 Plan Data'.'Price - Plan'[SUM: SYS Trans Properties.Company, SUM: SYS Trans Properties.Product] ELSE IF SYS Metrics Properties.Net Amount? THEN 'TRANS02 Plan Data'.'Net Amount - Plan'[SUM: SYS Trans Properties.Company, SUM: SYS Trans Properties.Product] ELSE 0 ELSE IF SYS Scenarios Properties.Budget? THEN IF SYS Metrics Properties.Units? THEN 'TRANS03 Budget Data'.'Units - Budget'[SUM: SYS Trans Properties.Company, SUM: SYS Trans Properties.Product] ELSE IF SYS Metrics Properties.Price? THEN 'TRANS03 Budget Data'.'Price - Budget'[SUM: SYS Trans Properties.Company, SUM: SYS Trans Properties.Product] ELSE IF SYS Metrics Properties.Net Amount? THEN 'TRANS03 Budget Data'.'Net Amount - Budget'[SUM: SYS Trans Properties.Company, SUM: SYS Trans Properties.Product] ELSE 0 ELSE 0

You have to remember, even though this formula is defined only once for the line item, it is getting calculated on the number of cells that line item contains. So, if this line item contains 1 million cells, this formula is getting processed 1 million times.

Maintenance

This was touched on a bit earlier, but there is no way this can be easily maintained, easily understood by new model builders, nor is it scalable. When you are creating the logic for these line items, you have to remind yourself you may not just be designing this for you to maintain; you may pass on to either the client or to another model builder and want this process to be as seamless as possible.

Poor performance

Planual Rules:

1.07-02 Version Line Item subsets

2.02-01 Nested Ifs

2.02-02 <12 expressions in a formula

2.02-03 No Repeated expressions

2.02-18 Break up formulas

Long nested IF THEN ELSE statements are bad for performance because this formula is now single-threaded. This means it will traverse the formula until it gets a “hit”, in this case where the Scenario and Metric combination match. Remember, it is also better to break up the formulas into multiple line items versus squeezing the logic into one giant line item.

Another aspect of performance is the number of functions in a formula. Look at all of the SUM’s in this formula (nine of them), yet they’re almost identical with the only differences being the line item and source of the data. It would be much more efficient if a calculation module was created to do the sums separately, but how do you do that with different source modules and different line items? This is where a line item subset comes into play and makes life much easier.

The solution:

Let’s take a look at how we can solve this problem in a better way by consolidating our source data, creating a dynamic way to align our metrics, and bringing it all together with a better performing formula. As you can see below, the data flows from left to right without traversing back to previous modules.

Step 1 - Line Item subset

Create a line item subset (LISS) using the three source modules and check all of the metrics.

If you are not familiar with how to create a line item subset see here.

Step 2 - Collect statement

Create a module dimensionalized by the line item subset you just created and the lists of the source modules of the line item subset—in this case, Transactions.
Create a new line item, number formatted, with the formula being COLLECT(). This collect statement essentially gets the data for all of the line items and turns the line items into a “dynamic” list. Also, since this is a “staging” module, you will be able to change Summary option to None.

Step 3 - Calculation module

Create a calculation module to sum the data from the different source modules to the desired dimensions—in this case, Customer Flat, Product Flat, the line item subset (to get the “dynamic” line items), and Time.

Why is this so important? Remember the nine different SUM’s in the original solution? Well, we have reduced those down to just one, yet we are getting the exact same results.

Step 4 - Mapping module

Create a “mapping” module which aligns the metrics (list) to the versions list with the line item being formatted as the line item subset.

What is great with this solution is it is 100% dynamic in that we can map the relevant “metrics” to versions without changing the formula. If we don’t want to calculate these metrics for any combination, then just remove or blank out the values.

Step 5 - The results

In the original solution module, dimensionalized by Customer Flat, Product Flat, and Time, create a new line item named Solution 2, formatted as number.
For the formula, simply type 'CALC01: Sum of Metrics'.Data[LOOKUP: 'Map01: Metrics to Scenarios'.Mapping]

Basically, we are doing a lookup of the values that we summed in step 3 (the single summation formula) based on the mapping values in step 4.

Step 6 - Adding adjustments or a different source

Now, let’s say the use case changed, and now we want to add adjustments to the result, but only for the Plan version. If we had the original solution with the big IF THEN ELSE statement, we would have to reengineer to incorporate this new source for only the Plan version. With the new solution, we can simply create a new line item subset, enter the values into the mapping module for the correct version, and then update the final formula.

Adjustment module: Same dimensionality as the results (Customer Flat, Product Flat, and Time).

Create a new line item subset using the above module as the source.

Create a “collect” module for this data with the summaries turned off.

Create a new line item in the Mapping module, but this time format it to the line item subset you just created for Adjustments.

Notice, we only entered line items for the Plan, but we could do this for the other versions as well.

In our reporting module, you can either create a new line item or continue with Solution 2 and type this in for your formula: 'CALC01: Sum of Metrics'.Data[LOOKUP: 'Map01: Metrics to Scenrios'.Mapping] + 'COLC02: Collection of Adjustments'.Data[LOOKUP: 'Map01: Metrics to Scenarios'.Adjustment Mapping]

As you can see, the data for the Plan version is different, but when we change the version to Actual or to Budget, the numbers will be the same because we are not adding the adjustments in for those versions (in the mapping module, those line items were blank).

Conclusion

By creating an additional module or two, as well as employing a line item subset, you can greatly increase the overall health and performance of your model while also making it much easier to scale and maintain. For more information on Line Item Subsets, please go to this article (Line Item Subsets Demystified).

einas.ibrahim · October 2020

Hey @rob_marshall

I got a headache just reading through the long formulas, I can't imagine you trying to deconstruct it. Very impressive work.
I have seen formulas that span 3 pages on a word document (sorry didn't count the characters 😊) and what I usually notice is that the design of the data modules is not correct or optimum which leads to the need to use nested IFs.
The issue I am usually faced with is...
Do I try to fix the formula as much as I can while still depending on the faulty data modules design?
OR
Solve the problem from the root and redesign the data modules?
I am sure you can imagine what consequences will be there due to redesigning the data modules since they are used by other calculations modules -at least.

rob_marshall · October 2020

@einas.ibrahim

That is always a tough call to make. What I was really hoping to convey in this article is you can do LISS on multiple modules and include multiple sources in your results based off of different mappings. That is the key to this! The mapping module doesn't have to be just on Actuals or Budget, but can be on entirely different source modules all done with the correct mapping.

andrewtye · October 2020

@rob_marshall - wouldn't you get the same result if you were to load the three source modules into one module dimensioned with versions listing? then you wouldn't need to go through the multiple re-mappings. but i get the issue you're trying to convey.

rob_marshall · October 2020

@andrewtye

You should always try to not use actions and instead use formulas as actions can be potentially blocking actions for other users (Planual 5.01-03).

Additionally, using formulas is more dynamic, and if done correctly, performs much better, and the results are immediate instead of having to wait on a process to be run.

DavidSmith · October 2020

To the above point, remember Anaplan (and the Hyperblock) is designed for data entry by users and then the DAG (Directed Acyclic Graph) kicks in. Think pinpoint rifle shot rather than shotgun!

We often see actions used as a crutch for lack of modelling understanding. Much better to model if necessary; way more efficient

David

einas.ibrahim · October 2020

@rob_marshall

You conveyed your point about using LISS perfectly. This solution is brilliant.
Although the main issue to address was the long formula, I appreciate the flexibility the LISS and mapping modules brings even more.

I have used a combination of LISS and mapping modules to solve similar - yet at a smaller scale - issues before. Your methodical thinking and the way you expanded on the use of LISS takes this to a new level.

Thank you, I already feel smarter after reading your article, please keep them coming.

PhilippErkinger · October 2020

One of the areas I still end up with massive IF THEN ELSE statements is in variance reports that need to be in a very specific format. Interestingly, this happens in almost every FP&A use case. Simple mappings / SUM functions often don't work because of different calculation methods for each line or because the end-users want to see values shown as %.

It would be fantastic if a module could handle several line item subsets. Meaning a COLLECT() on top of a COLLECT(). In the early days of Anaplan, this was possible. Let's hope it will come back one day. Until then, I will always recommend to first use the Excel Add-In for variance reports before building such cumbersome logic in Anaplan.

DavidSmith · November 2020

@PhilippErkinger

In my first implementation I "accidentally" created a module with 2 line item subsets

Problem was it ended up crashing the model when I used a formula summary method

Live and learn!

David

CallumW · November 2020

This is a fantastic article, thank you @rob_marshall ! Will certainly be referencing this article when talking to customers on optimisation in the future.

PhilippErkinger · November 2020

I didn't say that it worked well, but it was possible! 😉 Rumour has it that there are still models out there that have two line item subsets.

rob_marshall · November 2020

@CallumW

Thanks, glad you liked it.

Decreasing the length of your formulas

Model design

Maintenance

Poor performance

Comments

Categories