Creating QUARTILE function within Anaplan

This is a method of building the Excel “Quartile” function within Anaplan – it requires 11 Line items, requires that the Data dimension has a ‘top level’ value, and uses the RANK function which has an upper limit of 50 million cells. To start, create a module (or use an existing calc module) dimensioned by your base data and whatever dimensions are necessary. In my example, I used just a single "Data List" dimension. 

 

In that module, add the following formulas:

 

Data Module

Count = 1

Rank = RANK({Value} <wherever that is in the model>, ASCENDING, SEQUENTIAL)

 

The count summary method must be set to Sum, the Rank summary method can be set to None.

 

Then create a summary module, without the base data dimension but all other dimensions (if any) the same.

 

If you are replicating the QUARTILE.INC then add the following formulas:

Summary Module

Number of Items = Data Module.Count

Q1 item marker = 0.25 * (Number of Items - 1) + 1

Q2 item marker = 0.5 * (Number of Items - 1) + 1

Q3 item marker = 0.75 * (Number of Items - 1) + 1

 

If you are replicating the QUARTILE.EXC then the section after the percentage value should be changed to * (Number of Items + 1) for all three item marker formulas – For example:
Q1 item marker = 0.25 * (Number of Items + 1) 

 

This now tells us which ranked item should selected for the Median (Q2), First Quartile (Q1), and Top Quartile (Q3) values.

 

Then go back to the Data Modules and we will build three line items that pull out the correct items, or interpolate between them if necessary:

Data Module

1st Quartile = IF ABS(Rank - Summary Module.'Q1 item marker') < 1 THEN {Value} * (1 - ABS(Rank - Summary Module.'Q1 item marker')) ELSE 0

2nd Quartile = IF ABS(Rank - Summary Module.'Q2 item marker') < 1 THEN {Value} * (1 - ABS(Rank - Summary Module.'Q2 item marker')) ELSE 0

3rd Quartile = IF ABS(Rank - Summary Module.'Q3 item marker') < 1 THEN {Value} * (1 - ABS(Rank - Summary Module.'Q3 item marker')) ELSE 0

 

These must all have the summary method of Sum. The IF clause in this formula will only pull through values whose rank matches the appropriate median/quartile value, and the THEN clause will take that only the specific value if it lands exactly, or interpolate in situations where the median lies between two values, and the two partial values will be automatically added up by the summary method of “Sum”.

 

Then finally, in the Summary Module, we add 3 more line items to pull the correct summarized value. Since the Data dimension is not present in the Summary Module, the default behavior is to take the top level summarized value:

Summary Module

1st quartile = Data Module.'1st Quartile'

2nd quartile = Data Module.'2nd Quartile'

3rd quartile = Data Module.'3rd Quartile'

 

From there, those summarized line items can be used however is needed downstream.

 

One thing to note, if only the Top Quartile value is needed, then the Q1 and Q2 line items in both the Data and Summary module can be removed.

 

Screenshots of my Data and Summary modules with a very simple data set, recreating the QUARTILE.INC function, and comparing it to the Excel result:

 

Quartile Example and Comparison.png

 

Tagged:

Answers

  • @NoahJ 

     

    Good one. Thanks for sharing it.

  • Great stuff and thanks for sharing.

    Have you reached any performance issue related to the use of the RANK() function?

  • Thanks for the feedback! I have only tested this solution on moderately sized data sets, not any nearing the 50 million limit, so I have not been greatly impacted by the calculation time. This article discuss the reasoning for the limit: https://help.anaplan.com/anapedia/Content/Calculation_Functions/All/RANK.html 

     

    The ranking of the values is a critical step in the process, I don't think there is an easy way to do it without the RANK formula. If a data set larger than 50,000,000 were to require the Quartile function a workaround would be necessary - possibly a Sort and then import into a dummy list? But I believe that would be very time intensive and really only work if no other dimensions were applied to the base data.

     

     

  • @Ramana Reddy 

     

    Thanks @NoahJ  This was helpful

  • @NoahJ Thank you, this was really helpful and it worked for me. I then tried implementing it with a dimension (idepartment) so I can either choose a particular department or all departments. I can get it to work up until summing the partial amounts in the quartile. I have to use a formula summary method so that the partial amounts calculate correctly at the all departments top level but the issue is I cannot sum it to then refer to it in the quartile summary module. Have you ever been successful at achieving this?

  • @kristenr Interesting - yeah it looks like my original design assumed just one dimension in the data module. But with some modifications it should work for two (and potentially more).

    So first, to account for a new Department dimension added to the "Value", certain line items need to be adjusted to accommodate. In the "Data" module, the "Count" line item should stay only on the "Data" dimension, and the "Rank" formula needs to rank in groupings based on the department item. In the "Summary" module, the "Total number of items" and the "Item marker" line items need to stay as no dimensions, while the final "Quartile results" line items should be on the Department dimension. This will show the correct values for all Departments simultaneously and should work in most situations - but it will not correctly determine the quartile for the Top Level ("All Departments" in this case) column of the data.

    But it sounds like you were able to get that far already, but did need the top level?

    First of all, to confirm, your data would look something like this?

    The "Data items" on the left are what you would want to find the quartile for, and you'd want to find it for each department separately as well as the All Departments column?

    I will say that I am having a hard time picturing what the data would be that you would want to find the quartile at both the "Department" level as well as the "All Departments" level at the same time… but there could definitely be a situation that I'm just not thinking of, so I'll just go for it! The "fix" to make it work is not particularly pretty, be warned. If this feature request ever gets granted it would be much easier - we effectively want to use "Formula" summary across the Department dimension, and then Sum up the "Data" dimension. With the tools we have it is still possible, just requires a good number more steps.

    Basically after setting the line items in our data module to "Formula", we then have to manually set up our sums using the [SUM: ] functionality. It's fairly easy at the Department level - we have a line item called "Department Item" that is on both the Department and Data dimensions, and then we can use it to sum into our Department Summary module. But to get the "All Departments" value, we again have to do a SUM:, but this time we are not summing into the Departments dimension, so we have to create a "Single Item List", stick it in a line item on the Data Dimension, then use a [SELECT: Top Level, SUM: Single Item List Item] to get the summarized value. We can then do even more shenanigans with the Ratio summary setting to combine them into a single line:

    I have attached the "model blueprint" (i.e. line items export) of all modules I used to build this (change the extension to .csv first) - I grouped them in modules by dimension, so that I would only have to use subsidiary views where strictly necessary, but I kind of ignored "DISCO" in the process, so take it with a grain of salt.

  • @NoahJ Thank you so much for helping me with this, very appreciated! I think I am getting there but I am having trouble with the 1st quartile - Top Level only coming up as zeroes. Also when I tried to use ratio I got this error:


    Also, when I tried the ratio, I got this error which is probably due to the Top Level issue.



    I have attached my blueprint if you are able to see where I am going wrong. Thank you!

  • @kristenr you need to add a "Top Level Item" to your "Single Item List" in the list settings:

    That should fix the issue! And I can give some context as to why: in Anaplan, when data goes from a source that has a specific dimension ("Single Item List" in this case) to one that does not, the only way it can succeed is if there is something specified in the "Top Level" field - that value gets used as the "default cell" when that dimension is no longer present.

    If there is no value, then the system isn't sure which value from the list to take (even if it is obvious to the human user). In the case of subsidiary view it will just show up as blank, and in the case of line item formulas/summary/DCA things like that an error will get thrown.

    In a normal line item you could get around this with a SELECT statement, but that can't be done within the Ratio summary settings, so it needs to have that Top Level item. Adding that into the list settings will fix both the issues you are seeing.

    Hope that helps!!

  • Thanks @NoahJ I must be doing something else wrong as the Top Level Only line item still coming up as zero! Although the ratio worked once I put in the Top Level.

  • @kristenr maybe I'll DM you and we can figure it out 😊