Distinguish between List Item or Line Item
I am new to Anaplan, so I would request your generous help.
Order Number or Transaction Number is a Line Item or List Item?
And if the answer is List.
Then let me know, if millions of transactions happen daily in any Retail chain or Bank.
And if we create Numbered List for the Transaction, after 1 Billion records, how do we manage to store and update the data.
(i.e. How to remove and store old Transactions or Orders. such that we can accommodate new records).
The answer to your first question would be list item.
The answer to your second question has many steps and really depends on what you're trying to achieve so I will be broad.
To load in transactional data, you first need to identify the attributes and properties within the data that make it unique. For example this could be a code such as 1,2,3,4 for each row of data or a unique transaction number. It could also be a concatenation of a few unique properties such as product, date, amount. When combined this must uniquely identify each row.
If you are changing the data regularly and do need to store the previous load of data you need to identify the frequency is it daily, weekly, monthly or quaterly etc. If you want to store the data by this time frame then you need to setup the import module by time and identify the time you are loading it to with each import.
If you are loading in more than 1Bn rows of data you need to define what the output is and can you stage and consolidate the input data. Meaning do you need to be able to see every unique transaction from your highest summary level.
Much of this depends on what you are calculating and why?
If you can define this then it will be easier to determine how to approach this, but it will require different import processes and actions to control the data and it also depends on whether your import is from a flat file or uses and API connector.
Some best practices can be referenced here:
I hope this helps,
Definitely a list without a top level item. The "data" will be line items
And there are some really important guidelines on staging the data:
1. The data should be held in a Data Hub outside of the main planning model. If users need to see the detail, create a separate view of the data in the Data Hub, or create a separate Analysis Model for them; In most cases, the planning does not need to be at the same level of detail, so keep it out of the planning model
2. You need to establish what data is actually needed. Just because the data exists at daily level doesn't mean you need to bring it into Anaplan at that granularity. If you don't need it, get it aggregated at source
3. Get a code created (Primary Key) at source to identify the unique properties of each transaction and use delimiters in the code if the length of the different members changes (e.g. Store codes that vary between 3 and 5 chrs). If the length of the variable is fixed, then the delimiter is not necessary
4. Do not include the date in 3. This is really important. It is much more efficient and smaller to house the data in a two dimensional module
5. Create a system module dimensioned by the list from 3. to hold all of the non time based attributes.
6. Calculate the attributes from the code whenever possible; there should be no need to store them as text fields (remember, you should eliminate text whenever possible)
7. Create a module dimensioned by 3. and the timescale to house the time based values
8. Turn the summaries off in this module
9. Use a time range to limit the range of 7 if necessary.
10. If you need to further aggregate the data for export to planning models, create summary modules in the data hub; this is more efficient than aggregating on the downstream import
11. If you need to reference to lists as part of validations, use flat lists with no parent levels; you shouldn't need composite list structures in a data hub
In terms of reaching the 1Billion limit, adopting the technique in 2 and 3 should help restrict the size in the first place and the key point is do not delete and reload the list each time. Using the code from 3 means this is unnecessary.
If you do think even with that, you are going to reaching limits, you will have to consider splitting the lists by year, creating a new model per year, archiving models/data, etc. This is when having the data in a Data Hub model makes scaling possible
We are going to be publishing a blog all about Data Hubs shortly and also see my blogs on sparsity for real world examples of how effective this technique is as well as clarifying some myths about sparsity
I hope this helps