Quick reference guide on check points to be considered before going ahead with Data Hub Setup
Any data that goes in and out of Anaplan should use Data Hub - Get an alignment with business users, model builders & IT team that any data that goes IN and OUT of Anaplan will go through Data Hub.
What Data? What Granularity? - Decide on what data will be stored in Anaplan and at what granularity (Ex: In Capex planning, we can bring data at ASSET CLASS level instead of ASSET level)
How many years of data? - Get an alignment with business users on how many years of data (Actual & Forecast/Budget) to be loaded into Anaplan (Ex: Previous year, Current Year & 2 future years)
Which Currencies? Get an alignment with business users on whether we need data in all transaction currencies or we just need Local Currency (LC) & Group currencies (GC) like USD, EUR.
Which Source Systems? We need to make a decision on which source systems will feed data to Anaplan. This has to be analyzed during requirement gathering phase. During design phase, we need to decide if the data flow will be automated, do we need to setup connections with the source systems or we can load the source data from flat files.
Frequency of Data Refresh - Get an alignment with business users and IT team on how frequently we need to refresh the data from source system (Ex: Daily or Weekly or monthly refresh). Also decide if we need to go with DATA PULL or DATA PUSH mechanism.
Clear & Overwrite (or) Append - We need to decide if we need to CLEAR & OVERWRITE (or) APPEND to the dataset.
Data Strategy, Retention & Archival Strategy - Get an alignment with business users, model builders and IT team on how many years of data will be retained in the Anaplan instance. How frequently we need to Archive the data and after how many years we can purge the data. We also need to decide on how to automate the archival and purging process.
Flat modules to keep the size to minimum - Since the entire data needs of Anaplan will be stored in Data Hub, we need to be cautious of the model size. We can keep the data hub modules as FLAT modules and non-dimensionalised.
Modules with subset of data - It is better to store the data in subsets so the modules are flexible for future requirements. Example: Version based Headcount (HC) numbers, i.e., we can create separate modules for ACTUAL HEADCOUNT numbers, FORECAST HEADCOUNT numbers etc..
Primary Key for data feeds - Since the data hub modules are non-dimensionalized, we need to differentiate the data using primary key (Ex: Can be a combination of version, time & account - ACTUAL_WAGES_01012018).
Love the post! One thing I'd add to this conversation though is that on your first point, there may be occasions where it makes sense not to pull the data back into the data hub. Primarily, there's a workspace size hit to duplicating the information, so if you're moving hundreds of millions of cells or more out of Anaplan, it may make sense to pull directly from the source.
Thanks Alec. You got a valid point! We came across multiple use cases where we had to push directly from Anaplan source models to Target applications bypassing Anaplan DataHub. We evaluated pros and cons of such options during data design phase (data interface and data flow design). Thanks for raising this topic.
Great article. I would also add that talking to the business not just about what levels of granularity is needed but also why those levels are needed is an important conversation? Sometimes, the levels of granularity don't provide many benefits and simply could be a matter of not wanting to change old processes. Change management and having these discussions early on helps in the long-run.
Good point Sreekanth. Understanding the business reason for granularity needs will help in designing an optimal data hub. In addition, we also need to 1) List out Drill Down requirements 2) List out Drill Through requirements 3) Prepare data design and data flow for sensitive data.
Great points here! A few other things we would want to consider:
Security - Should we house our data hub in a separate workspace to differentiate who the administrators are between our operating model(s) and the data hub? We often have sensitive data held in the data hub that we want to have strict control over who can view this data.
Metadata Mangement - We are able to do a lot with metadata to ensure this is consistent throughout your Anaplan models. For example, if we load GL Accounts from your source ERP system, and want them to be displayed as "GL Account ID - GL Account Name", we are able to concatenate these in the data hub and push our to all target models in order to drive consistency throughout all models. This ensures nothing has to be changed in your source ERP system but can allow for more customization in Anaplan.
Who owns the data hub - the business team or IT? This should be a consideration when setting up an initial data hub. I have seen both at various customers and should be established as early as possibly to ensure roles and responsibilities are clearly laid out
Thanks Trevor. All the three pointers are very useful and should be scoped into Data Flow design and finalized during the early phase of the project. As always, you are best in giving us the perfect solution :-)
I would be very careful in your last point, you do not want to bring in data with a primary key that is tied to either the version or the time period as you are inflating the overall size of the list for no gain. For example, lets say you you are bringing in data for 12 months, if your primary key appends the period to the member, (WAGES_01012018), you would have 12 members with the only thing changing is the date. So, if the data (without appending the period) totals 100 members, by appending, you have now multipled the result set by 12. Additionally, if you had Versions to the member, you would need to multiple the number by the number of versions. Very seldom, if at all, do I see Versions in the Data Hub.
The following should be the process of loading transactional data:
Create unique list. This could be the Account concatenated with the Department concatenated with the Trading Partner.
Create a Transction module dimensionalized by the above unique list and Time. Only transactional data should be in this module, no lists/texts/booleans.
Create an "Attribute" or "Properties" module storing any and all meta data about the list (Account #, Cost Center, Trading partner, etc.).
By following the above process, you will accomplish the following:
a leaner data hub
a cleaner Model Map
Faster data loads (only loads/appends the transactional list and loads transactional data). If the key/code of the list is done correctly, then the "Attributes" or "Properties" module can be populated using formulas from the code of the the list.
the source system will not have to transform the data by appending the Period to your unique key and then having to create transformations to understand which period the transactional amount should be in.
Thanks Rob for suggestion. The solution you proposed will work perfectly and has many benefits. Infact we have few such modules getting data via import and also used for export out of Anaplan. But i think when we create unique list (Say...Account concatenated with the Department concatenated with the Trading Partner), the number of characters should not exceed 50. This is a limitation where your proposed solution may not work. I am not sure if Anaplan has addressed this limitation in recent releases.