Rule of thumb when NOT to "exit early"?

Hi,

Trying to optimize some stuff in a model and therefore looking for advice.

We have some logic in which i forecast some things for a product lifetime starting with todays actual.

In which of these following cases would you just go with the formula and skip all the exit early stuff. Since the answer is probably "it depends". Would appreciate if you could answer with a motivation like "If less than 36 periods then this else that". Looking for any good rule of thumbs when models shouldn't be overengineered.

Example A
IF
NOT filtermodule.producttimerange?
THEN
0
ELSE
IF
NOT filtermodule.latestactuals?
THEN
A * (B - Properties.Value) - Othermodule.C
ELSE
actualmodule.actualvalue

Example B
IF
NOT filtermodule.producttimerange?
THEN 0
ELSE
IF
NOT filtermodule.latestactuals?
THEN
Previous(x) * (B - Properties.Value) - Othermodule.C
ELSE
actualmodule.actualvalue


Example C
IF
NOT filtermodule.producttimerange?
THEN
0
ELSE
IF
NOT filtermodule.latestactuals?
THEN
"A value that is most likely 0" * (B - Properties.Value)
ELSE
actualmodule.actualvalue


Best Answer

  • @DavidBackman

    It is all about how many times it is getting kicked off. If I have an Employee list with 100K list members, but only 90K are active, I would want to do my logic on on the inactive ones. Same thing with Time. If I have a timescale of 10 years with 9 of them being Actuals (old data) with one Forecast month, then I want to do my calculations on the Forecast and not IsActualVersion().

Answers

  • @DavidBackman

    The early exit is what you expect to occur or hit more often.

  • @rob_marshall
    Yes, but the question is when is the performance hit of the if statement(s) worse than the extra x+y calculation.

  • @DavidBackman

    The performance hit is reading through the entire IF statement…If it hits the majority of time on the first IF or the first parameter, then that will be faster as it will not have to continue through the IF statement.

  • @rob_marshall
    So from a performance perspective an if statement where the sum of all evaluated conditions is still "easier" than the now skipped calculations is worth doing? Disregarding administration, readability etc.

  • @DavidBackman

    Honestly, I am not understanding your statement. If I have and IF statement like yours, If NOT product_time_range then ….. If Product_Time_Range is FALSE more times than not, then that should be first. If it is true more times than it is false, then you would should remove the NOT.

  • @rob_marshall
    Sorry for the unclear statement. The productime range is a boolean filter that is TRUE for the active periods of the product which in our case is less than the entire time range. So we are hitting the first 0 in most cases.

    But to make things easier lets consider this statement:

    A and B are both numbered cells and there are 1000 cells each.

    IF
    A = 1
    THEN
    0
    ELSE
    A + B


    I understand that "exit early" is a good performance strategy so that if a majority of cell A are 1, then there would be a potential performance boost. However Im trying to understand what overhead the IF-statement would add. From your answers the if statement itself doesn't seem to have any noteable impact. It is instead the condition, in this case A=1, that we need to look at.

    Our possibilities are then to either calculate A+B 1000 times. Or calculate A=1 1000 times and A+B hopefully fewer times.

    What Im trying to get a feel for is how complicated the ELSE part should be for it to be worth the added condition. Assuming that in this example, A+B is calculated so quickly by the engine that the added condition doesnt really help even if A+B gets calculated fewer times because of it.

    However that might not be the case if A+B instead was CUMULATE(A). Which is a lot heavier. Or maybe already if the formula was A+B+C+D*E*F.

    Does that make sense?





  • @DavidBackman

    Yep, that makes sense. As you stated, there are two parts of the performance, the true calculation piece (the A+B or the Cumulate()) as well as the reading of the IF statement. The calculation piece is what it is, whether it is in the first part (the THEN side) or the latter (the ELSE part). What you need to think about is how often it gets kicked off. If it is A+B, then that is not a big deal and will not be noticeable, however if it is a cumulate (which is often single threaded), then you want to do that the least amount of times.

    In your statement, the calculation of the IF is not that big of a deal, but we see folks having 14 levels of these IF statements. What you have to remember is the IF statement is not getting kicked off one time, but once per every cell (if you have 1M cells in the line item, the formula is getting kicked off 1M times).

  • @rob_marshall

    So with that said. If the condition is an "IF ITEM(TIME) >= Start Period AND ITEM(TIME) < = End Period"

    When do you think you would care about exit early? Probably CUMULATE, probably NOT A+B. But what about A + (B + C) * D * (E + F) sourcing data from at least 2 additional sources?

    Im personally leaning more towards if its only math and its not triggering previous, lag, cumulate etc its not worth it. If it requires some sum/lookup then maybe depending on dimensionality.

  • @rob_marshall
    Thanks for your answers!