Sort Lists greater than 1 million items

Order Lists is extremely helpful ( but we have a couple key lists that are over 1 million items, so Order Lists does not work for them. We are looking at around 11 million items for current year and prior year data.


We have transactions that are feeding into HUB from our source system, that come in based upon a transaction number. This module/list is what we use to create our hierarchy levels. It was created with First Occurrence as a boolean to determine what the first instance was to create the various hierarchy levels (L1, L2, L3 and L4). 


The issue is that there are some L3 items we want to exclude from going creating a hierarchy; however that same item may also be the first instance of L1 and L2 - meaning if we exclude it, our hierarchy for L1 and L2 is not set up, so any "valid" hierarchies at those levels will not be created.


What are some other options to get the proper hierarchy created?; while not creating hierarchy levels for items that will ultimately have no activity because they are being excluded by the L3 exclusion.


See attached for visual examples of what we want to accomplish. 

Example 1 shows excess hierarchies being created that will never be used - increasing sparcity and space consumption. Example 2 shows issue if first item is first occurrence of L1 - no hierarchies created. Example 3 shows how L2 that are needed would not be created. Example 4 shows what we want to happen - hierarchies without L3 exclusions would be created and no unnecessary hierarchies would be created.

Best Answer

  • @jtkerlin 


    Correct, data in the source system is not at the aggregated level, but at the detail level so you don't need all of those "parent" codes in your most detailed code.  Take this example, a cost center hierarchy....when I am loading the data, I don't need the code to have the Level 3 member in, I just need to know where the data is at the most detailed member, then my hierarchy can then aggregate it up.


    so, if I have a "flat" list that has all of my cost centers in it, and a module that has their direct parent, then I can create a hierarchy for the data to aggregate.  


    Flat list that has every Cost Center, parents and children.



    Create a SYS module for the cost centers where I import the "parent"



    In the above module, if I don't want to build a certain member in a certain level, I can do that with logic here.


    Then create the "build" of the hierarchy:


    The above is shown more in the 305 video, but what this is really doing is a recursive call to get the top level (Anaplan) at the end (or bottom) of the chain.


    Again, this can be done fairly easily, but if you need to build the above on the transactional data, you can do that as well, just make sure you are building it off the "transactional" list and not the transactional module (the one with Time in the Applies To).




  • @jtkerlin 


    Good questions, but as a consultant, I have a few questions for you:

    • what is the use case you are implementing?
    • does it really need invoice/transactional-based data?
    • If not and your end-users are planning at a higher level (boy, I hope so), then why not bring in the data at the aggregated level to the Data Hub?  If the users want to "validate" the data, then put that in a different model from the true planning model so it doesn't bloat the planning model for everyone when only a few folks will be needing that data.

    As for your lists, and this will reference bullet #2 in the above, if you really don't need transactional level data, bring the data in at an aggregated level and create a custom list.  Also, when you are capturing or creating logic for your list, this should not be done in a transactional view, but in a module that is only dimensionalized by the list (so no time dimension).  This way, you will be able to save space/cells because the logic will not be getting kicked off on every time period.


    Lastly, I wouldn't create your codes the way you did, you don't necessarily need every parent as part of the code, just the base or most detailed.  There is an old course that is available in the OnDemand area (305: Hub Model Hierarchy Management: link) that attempts to show how hierarchies can be created by just knowing the parent/child relationship.  If you can get a pull from the source system of all cost centers (in this example) and what their parent is, you can derive the hierarchy without using transactional data.


    Also, not sure if you have seen this, but this article gives some good suggestions on the loading of data to a Data Hub as well as the spoke model (link).


    Hope this helps,




  • These are great questions and something that we are currently evaluating. I am sure that everything I am noting will sound familiar. 


    I do not recall why exactly we had transactional level data coming in rather than aggregated at some other level. There are a couple other aspects of our tool that need a slightly lower level than what we are planning at (auto tagging of variances and things), but that level will be higher than each transactional level.


    The tool is for planning revenue by client, department and product. We currently have the transactional data to build the hierarchy and ingest the revenues at the L1-L4 levels. The hierarchy and PY and current year revenues are all that are used as the starting point from our billing system, then in Planning we layer on Forecast, Budget and variance explanations (which is only down to the L4 level).


    One of the reasons why we have data coming into Hub that is subsequently not being used/ingested to Planning is due to inflexibility of changes to the incoming data feed. It can take 1-2 months to have the data feed changed/adjusted when things change, so we are filtering out the irrelevant data in Anaplan instead to increase our flexibility. Currently we do not have a need in Anaplan for this data, but that could change and we would end up with timing issues to get the data feed adjusted accordingly.


    To put into different words, I believe you are suggesting to us the L4 code string and extract out the various layers that would be needed to create the hierarchies (sort of going backward from detail to highest level roll up, rather than the opposite).


    Thank you for the links, I will be viewing shortly.

  • This is a little different, for our planning hierarchy we want to only create hierarchy levels or intersections where there is activity. The L4 is the lowest level - Product, a Product can exist in many L3s (Department) and which can exist in may L2s. The string code that would be related to a specific L2, L3 and L4 would be unique, but the L4 itself (outside of the entire string) can exist none, once or many times.


    It seems like similar logic that you showed would apply.


    Sorry for the more basic question and clarification, but I am not naturally a coder or developer so still trying to wrap my head around that type of thinking and within Anaplan.

  • @jtkerlin 


    No worries at all and these types of questions are exactly what this community forum is for.  Are you building out a BOM?  If so, check out this link...


    and look at the solution that David Smith gave.  Very lengthy but very thorough document in the attached area of his post.



  • Not sure - what is a BOM? (the web tells me "bill of materials")


    We are doing sevice revenue planning/budgeting at monthly and quarterly checkpoints.


    Our current tool was created by an outside consultant, we are changing/updating some things as we need based business data/processes we first believed to be true but were not. We have to change what data sources are coming into HUB so that we can have be more relevant and timely information for various countries.

    When the tool was created I had zero knowledge of Anaplan, over the last twelve or so months my understanding has improved, and now it is seeing how things were done and whether they still make sense or not.

  • @jtkerlin 


    so yes, BOM = Bill of Materials.


    And it sounds like you are doing the right thing...Just remember, Anaplan is not an EDW (Enterprise Datahouse) nor should it be used as one.  Only bring in the data that is needed and the correct granularity.  Also, if you don't need certain parts of the hierarchy built, then you can manage that with an SYS Properties module (basically a "flat" module that is only dimensionalized by that list, like Cost Centers).


    Again, sounds like you are on the right track, let us know how we can help out.