SUM or IF? Performance and Readability
I'm interested to see which is the better performer and also, subjectively, which function is more 'readable': SUM or IF, in the following scenario...
Consider that you are building a model for an international retailer. This retailer is setting up stores in different countries. When setting up a store in the first month, set up costs will be incurred. These are one-off costs that apply to one month only, and vary on the country.
You want to give users the ability to tweak the setup costs and setup month. This could be achieved quite simply using the following:
- A dashboard containing;
- Input assumptions (setup period and cost)
- P&L output
Dashboard showing inputs and P&L
Here is a view of the IF formula*:
And here is a view of the sum formula:
*The IF formula should be reversed as the most common scenario (and therefore earliest exit) is that the current period is not the setup period.
Note that the two formulae are producing the same result. I am mostly interested in which formula is more efficient, and also which formula people think is more understandable.
Off the top of my head some here are some arguments in favour of each:
- Gives the ability to add multiple conditions in the future.
- Technically follows the 'natural language' of the process, being if it's the first month the store attracts setup costs.
- Formula shows the what is pulling through before anything else (Setup Input.Setup Cost).
- No fiddling with early exit on IFs.
I'm keen to see the opinion of others on this topic!
Coming out of retirement for this one since you asked nicely
If think in this case where the source and target is a 1:1 match, the IF statement would be quicker. Remember SUM and LOOKUPs need to perform "transformations" and thus more work than just pointing to the source line item. The check for the IF is fairly marginal.
As an aside and not quite this use case, but we have found that using SUM from a large source list summing into a timescale can be heavy due to the collation of the source data into time blocks. (e.g. a series of expense line with different start dates and you want to sum them all together). If you are using something like that, with a large timescale, we recommend using a fake timescale for the initial aggregation and then a SUM into the real timescale
Back to helping develop the engine!
Great topic. I prefer IF as I think the benefits you've described are more important. I also wanted to humbly mention that I would have used the IF function which helps it to find the most common scenario - IF Setup Input.Setup Period<>Item(Time) Then 0 ELSE Setup Input.Setup Period as per PLANUAL suggestions. For me, an ability to add multiple conditions in the future is a very important reason to choose IF, as business usually approaches with these types of requests quite often. Or maybe it is a good idea to use line item "Setup Period" in a subsidiary view so that it has Boolean format, which takes less space, although I'm not sure how/if it affects the performance.2
Agree with @DmitryP - this is a great topic. And @DmitryP idea of exiting the IF statement early is excellent.
You might also try the ITEM(TIME) boolean in a system module. You don't want to have to calculate ITEM(TIME) every single cell in your report module if you can avoid it.
While I don't know which is more efficient, I do know that you want to make your IF statement exit as early as possible.
So in this case, the IF statement might read: If not set up month? then 0 else set up cost.
Most cells will exit early so you won't have to evaluate the rest of the IF statement. Much more efficient.
If I were to guess, the IF statement would be faster if you exit early (DISCO). However, the SUM is easier to read and audit (PLANS).
So we have a good example of where DISCO and PLANS compete with each other a bit.
Just some thoughts, anyway.
Interesting... wonder if @rob_marshall / @DavidSmith (with his old hat on) might be able to provide some light on this.
If the P&L were at a store level (and so potentially hundreds) instead of country using an If statement would be the better option as you'd be looking up the month and country. Even if you had a store system module looking up against the country and then summing that in the P&L module.
BTW.. did you see the section on AnaplanLive! last week on Sum vs Lookup?2
I want to believe SUM is faster.
would require some real testing to confirm1
In my estimation, the SUM will be faster because the formula is not sequential, it is just doing the straight SUM not matter what. The IF on the other hand, is being run being "rationalized" at every cell to see if it needs to be kicked off or not. If I can get some free time, let me see if I can test this out fully.
I prefer SUM.
1. It also allows to add conditions in future, beauty is that end user can add the conditions without depending upon a model builder to change the formula
2. SUM is more readable for me. IF is readable only for a finite number of conditions, beyond 5 conditions it becomes untidy.
Interesting post indeed. I have come across something similar while working on the retail side where I have to show value and percentage as a picklist.
Here are my thoughts,
1. Sum performs brilliantly. However, in most of the case formula seems to be not readable as it includes multiple sum statements. ( Note: Sum here would act as value holder but not the sum across the block )
2. "If logic" works great too. However, there is a dependency on if the result that would slow down the parallel computation. It does matter when you are working on a large model. This is always a battle between "readable formula" vs "Performance".
I end up picking "If logic" as the user wanted to drill down on the formula.