Master data hubs
Master data hubs are used within the Anaplan platform to house an organization’s data in a single model. This hub imports data from the corporation’s data warehouse. If no single source is available, such as a data warehouse, then the master data hub will collect data from individual source systems instead. Once all data is consolidated into a single master data hub, it may then be distributed to multiple models throughout an organization’s workspace.
Anaplan Data Architecture
Architecture best practices
One or more Anaplan models may make up the data hub. It is a good practice to separate the master data (hierarchies, lists, and properties) from the transactional data.
The business Anaplan applications will be synchronized from these data hub models using Anaplan native “model-to-model” internal imports.
As a best practice, users should only implement incremental synchronization, which only synchronizes the data in the application that has changed since the last sync from the data hub. Doing this usually provides very fast synchronization.
The graphic below displays best practices for doing this:
Another best practice organizations should follow when building a master data hub is to import a list with properties into a module rather than directly into a list. Using this method, line items are created to correspond with the properties and are imported using the text data type. This will import all of the data without errors or warnings, and allow for very smart dashboards, made of sorts and filters, to highlight integration issues.
Once imported, the data in the master data hub module can then be imported to a list in the required model.
Data hub best practices
The following list consists of best practices for establishing data architecture:
Rationalize the metadata
Balanced hierarchies (not ragged) will ease reporting and security settings
Identify your metric and KPIs and what drives them
Do not try to reconcile disconnected targets to bottom up plans entered at line item level.
Example: Use cost per trip and number of trips for travel expenses, as opposed to inputting every line of travel expense
Simplify the process
Reduce the number of approval levels (threshold-based)
Implement rolling forecasts
Report within the planning tool; keep immediacy where needed
Think outcome and options, not input
Transform your existing process. Do not re-implement existing Excel ® -based processes in Anaplan
Aggregate transactions to SKU level, customer ID
Plan at higher level and cascade down
Plan the number of TBH by role for TBH headcount expenses, as opposed to inputting every TBH employee.
Sales: Sub-region level planning, cascade a rep level
Plan at profit center level, allocate at cost center level based on drivers
The Anaplan Way
Always follow the phases of The Anaplan Way when establishing a master data hub, even in a federated approach:
If you’re familiar with Anaplan, you’ve probably heard the buzz about having a data hub and wondered why it’s considered a “best practice” within the Anaplan community. Wonder no more. Below, I will share four reasons why you should spend the time to build a data hub before Anaplan takes your company by storm.
1. Maintain consistent hierarchies
Hierarchies are a common list structure built by Anaplan and come in a variety of options depending on use case, e.g., product hierarchy, cost center hierarchy, and management hierarchy, just to name a few. These hierarchies should be consistent across the business whether you’re doing demand planning or financial planning. With a data hub, your organization has a higher likelihood of keeping hierarchies consistent over time since everyone is pulling the same structure from one source of truth: the data hub.
2. Avoid sparsity
As you expand the use of Anaplan across multiple departments, you may find that you only need a segment of a list rather than the entire list. For instance, you may want the full list of employees for workforce planning purposes, but only a portion of the employees for incentive compensation calculations. With a data hub, you can distribute only the pertinent information. You can filter the list of employees to build the employee hierarchy in the incentive compensation model, while having the full list of employees in the workforce planning model. Keep them both in sync using the data hub as your source of truth.
3. Separate duties by roles and responsibilities
An increasing number of customers have asked about roles and responsibilities with Anaplan as they expand internally. In Anaplan, we recommend each model have a separate owner. For example, an IT owner for the data hub, an operations owner for the demand planning model, and a finance owner for the financial planning model. The three owners combined would be your Center of Excellence, but each has their separate roles and responsibilities for development and maintenance in the individual models.
4. Accelerate future builds
One of the main reasons many companies choose Anaplan is for the platform’s flexibility. Its use can easily and quickly expand across an entire organization. Development rarely stops after the first implementation. Model builders are enabled and excited to continue to bring Anaplan into other areas of the business. If you start by building the data hub as your source of truth for data and metadata, you can accelerate the development of future models since you already have defined the foundation of the model, the lists, and dimensions.
As you begin to implement, build, and roll out Anaplan, starting with a data hub is a key consideration. In addition to this, there are many other fundamental Anaplan best practices to consider when rolling out a new technology and driving internal adoption.
The Anaplan platform can be configured and deployed in a variety of ways. Two configurations that should be examined prior to each organizations’ implementation of Anaplan are the Central Governance-Central Ownership configuration and Central Governance-Federated Ownership configuration.
Central Governance-Central Ownership configuration
This configuration focuses on using Agile methodology to develop and deploy the Anaplan platform within an organization. Development centers around a central delivery team that is responsible for maintaining a master data hub, as well as all models desired within the organization, such as sales forecasting, T&Q planning, etc.
Central delivery team
In this configuration, the central delivery team is also responsible for many other steps and requirements, or business user inputs, which are carried out in Anaplan and delivered to the rest of the organization. These include:
Building the central model
Communicating release expectations throughout development
Creating and managing hierarchies in data
Data loads (data imports and inputs)
Defect and bug fixes in all models
New use case project development
Agile methodology—The Anaplan Way
As previously mentioned, this configuration also focuses on releasing, developing, and deploying new and improved releases using the Agile methodology. This strategy begins with the sprint planning step and moves to the final deployment step. Once a project reaches deployment, the process begins again for either the next release of the project or the first release of a new project. Following this methodology increases stake holder engagement in releases, promotes project transparency, and shows project results in shorter timeframes.
Central Governance-Federated Ownership configuration
This configuration depends on a central delivery team to first produce a master data hub and/or master model, and then allow the individual departments within an organization to develop and deploy their own applications in Anaplan. These releases are small subsets of the master model that allow departments to perform “what-if” modeling and control their own models or independent applications needed for specific local business needs.
Central delivery team
In this configuration, the central delivery team are only responsible for the following:
Creating and managing hierarchies in data
Data loads (data imports and inputs defect fixes)
Capture and share modeling best practices with the rest of the teams
Federated model ownership
In this model, each department and/or region is responsible for their own development. This includes:
Small subsets of the master model for flexible “what if” modeling
Custom or in depth analysis/metrics
Independent use case models
Loose or no integration with master model
One-way on-demand data integration
Optional data hub integration
Pros and cons
Both of these configurations contain significant pros and cons for implementing them into an organization:
Central Governance-Central Ownership pros
Modeling practices within an organization become standardized for all new and updated releases.
The request process for new projects becomes standardized. One single priority list of enhancement request is maintained and openly communicated.
Communication of platform releases, new build releases, downtime, and more comes from one source and is presented in a clear and consistent manner.
Workspace and licenses
This configuration requires the fewest number of workspaces, which saves on data used in Anaplan, as well as the fewest number of workspace admin licenses.
Central Governance-Central Ownership cons
All build requests, including new use cases, enhancements, and defect fixes, go into a queue to be prioritized by the central delivery team.
This configuration requires a significant weekly time commitment from the central delivery team to prioritize all platform requirements.
Central Governance-Federated Ownership pros
Business user development
This configuration allows for true business development capabilities without comprising the integrity of the core solution developed by the central delivery team.
Maximizes the return of investment and reduce shadow IT processes by enabling the quick spread of the Anaplan platform across an organization as multiple parties are simultaneously developing.
Reduces or completely eliminates wait queue wait times for new uses cases and/or functionality.
Speed of implementation
Having the central team take care of all data integration work via the data hub will speed up application design by enabling federated team to take their actuals and master data out of an Anaplan data hub model, as opposed to having to build their own data integration with source systems.
Central Governance-Federated Ownership cons
Workspace and licenses
More workspace and workspace admin licenses may be necessary in the platform.
In this configuration it is challenging to ensure that model building architecture procedures and best practices are being followed in each model. It requires the central Center of Excellence team to organize recurring meetings with each application builder to share experience and best practices.
Business users without model building skills may have a difficult time building and maintaining their requirements.
Table of Contents
A data hub is a separate model that holds an organization’s data.
Data can be shared with all your models, making expands easier to implement and ensuring data integrity across models.
The data hub model can be placed in a different workspace, allowing for role segregation. This allows you to assign administrator rights to users to manage the data hub without allowing those users access to the production models.
The method for importing to the data hub (into modules, rather than lists) allows you to reconcile properties using formulas.
One type of data hub can be integrated with an organization’s data warehouse and hold ERP, CRM, HR, and other data as shown in this example.
Anaplan Data Architecture But this isn’t the only type of data hub. Some organizations may require a data hub for transactional data, such as bookings, pipeline, or revenue.
Whether you will be using a single data hub or multiple hubs, it is a good idea to plan your approach for importing from the organization’s systems into the data hub(s) as well as how you will synchronize the imports from the data hub to the appropriate model. The graphic below shows best practices.
High level best practices
When building a data hub, the best practice is to import a list with properties into a module rather than directly into a list. Using this method, you set up line items to correspond with the properties and import them using the text data type. This imports all the data without errors or warnings. The data in the data hub module can be imported to a list in the required model.
The exception for importing into a module is if you are using a numbered list without a unique code (or in other words, you are using combination of properties). In that case, you will need to import the properties into the list.
Here are the steps to create the basics of a hub and spoke architecture.
1) Create a model and name it master data hub
You can create the data hub in the same workspace where all the other models are, but a better option is to put the data hub in a different workspace. The advantage is role segregation; you can assign administrator rights to users to manage the Hub and not provide them with access to the actual production models, which are in a different workspace. Large customers may require this segregation of duties.
Note: This functionality became available in release 2016.2.
2) Import your data files into the data hub
Set up your lists. Identify the lists that are required in the data hub. Create these lists using good naming conventions. Set up any needed hierarchies, working from the top level down. Import data into the list from the source files, mapping only the unique name, the parent (if the name rolls up into a hierarchy), and code, if available. Do not import any list properties. These will be imported into a module.
Create corresponding modules for those lists that include properties. For each list, create a module. Name the module [List Name] Properties. In the module, create a line item for each property and use the data type TEXT.
Import the source file into the corresponding module. There should be no errors or warnings.
Automate the process with actions. Each time you imported, an action was created. Name your actions using the appropriate naming conventions.
Note: Indicate the name of the source in the name of the import action.
To automate the process, you’ll want to create one process that includes all your imports. For hierarchies, it is important to get the actions in the correct order. Start with the highest level of the hierarchy list import, then the next level list and on down the hierarchy. Then add the module imports. (The order of the module imports is not critical.)
Now, let's look at an example:
You have a four-level hierarchy to load, such as 1) Employee→ 2) State → 3) Region → 4) Country
Create lists with the right naming conventions. For this example, create these lists:
Set the parent hierarchy to create the composite hierarchy.
Import into each list from the source file(s), and only map name and parent. The exception is the employee list, which includes a code (employee ID) which should be mapped. Properties will be added to the data hub later.
Properties → Modules
Create one module for each list that includes properties. Name the module [List Name] Properties. For this example, only the Employees list includes properties, so create one module named Employee Properties.
In each module, create as many line items as you have properties. For this example, the line items are Salary and Bonus. Open the Blueprint view of the module and in the Format column, select Text. Pivot the module so that the line items are columns.
Import the properties. In the grid view of the module, click on the property you are going to import into. Set up the source as a fixed line item. Select the appropriate line item from the Line Item tab and on the Mapping tab, select the correct column for the data values. You’ll need to import each property (line item) separately. There should be no errors or warnings.
Each time you run an import, an action is created. You can view these actions by selecting Actions from the Model Settings tab. The previous imports into lists and modules have created one import action per list. You can combine these actions into a process that will run each action in the correct order. Name your actions following the naming conventions. Note, the source is included in the action name.
Create one process that includes the imports. Name your process Load [List Name]. Make sure the order is correct: Put the list imports first, starting with the top hierarchy level (numbered as 1) and working down the module imports in any order.
These list imports should be running with zero errors because imports are going into text formatted items.
If some properties should match with items in lists, it's recommended to use FINDITEM formulas to match text to list items:
FINDITEM simply looks at the text formatted line item, and finds the match in the list that you specify. Every time data is uploaded into Anaplan, you just need to make sure all items from the text formatted line item are being loaded into the list. This will be useful as you will be able to always compare the "raw data" to the "Anaplan data," and not have to load that data more than once if there are concerns about the data quality in Anaplan.
If there is not a list of the properties included in your data hub model, first, create that list. Let’s use the example of Territory. Add a line item to the module and select list as the format type, then select the list name of your list of properties—in this case, Territory from the drop-down. Add the FINDITEM formula FINDITEM(x,y) where x is the name of your list (Territory for our example) and y is the line item. You can then filter this line item so that it shows all of the blank items. Correct the data in the source system.
If you will be importing frequently, you may want to set up a dashboard to allow users to view the data so they can make corrections in the source system. Set up a saved view for the errors and add conditional formatting to highlight the missing (blank items) data. You can also include a counter to show the number of errors and add that information to the dashboard.
4) Split models: Filter and Set up Saved Views
If the architecture of your model includes spoke models by regions, you need one master hierarchy that covers all regions and a corresponding module that stores the properties. Use that module and create as many saved views as you have spoke region models. For example, filter on Country GI = Canada if you want to import only Canadian accounts into the spoke model.
You will need to create a saved view for each hierarchy and spoke model.
5) Import to the spoke module
Use the cross-workspace imports if you have decided to put your Master data hub in a separate workspace.
Create the lists that correspond to the hierarchy levels in each spoke model. There is no way to create a list via import for now.
Create the properties in the list where needed. Keep in mind that the import of properties into the data hub as line items is an exception. List properties generally do not vary, unlike a line item in a module, which are often measured over time. Note: Properties can also be housed in modules and there are some benefits to this. See Anapedia - Model Building (specifically, the "List Attributes" and "List attributes in a module" topics). If you decide to use a module to hold the properties, you will need to create a line item for each property type and then import the properties into the module.
To simplify the mapping, make sure the property names in each spoke model match the line item names of the data hub model.
In each spoke model, create an import from the filtered module view of the data hub model into the lists you created in step 1.
In the Actions window, name your imports using naming conventions.
Create a process that includes these actions (imports). Begin with the highest level in the hierarchy and work down to the lowest.
Well done! You have imported your hierarchy from a data hub model.
6) Incremental list imports
When you are in the midst of your peak planning cycle and your large lists are changing frequently, you’ll want to update the data hub and push the changes to the spoke models. Running imports of several thousand list members, may cause performance issues and block users during the import activity.
In a best case scenario, your data warehouse provides a date field that shows when the item was added or modified, and is able to deliver a flat file or table that includes only the changes. Your import into the HUB model will just take few seconds, and you can filter on this date field to only export the changes to the spoke models.
But in most cases, all you have is a full list from the data warehouse, regardless of what has changed. To mitigate this, we'll use a technique to export only the list items that have changed (edited, deleted, updated) since the last export, using the logic in Anaplan.
Setting up the incremental loads:
In the data hub model:
Create a text formatted line item in your module. Name it CHECKSUM, set the format as Text, and enter a formula to concatenate of all the properties that you want to track changes for. These properties will form the base of the incremental import. Example: CHECKSUM = State & Segment & Industry & Parent & Zip Code
Create a second line item, name it CHECKSUM OLD, set the format as Text, and create an import that imports CHECKSUM into CHEKSUM_OLD. Ignore any other mappings.
Name this import: 1/2 im DELTA and put it in a process called "RESET DELTA"
Create a line item and name it "DELTA" and set the format as Boolean. Enter this formula: IF CHECKSUM <> CHECKSUM OLD THEN TRUE ELSE FALSE.
Update the filtered view that you created to export only the hierarchy for a specific region or geography. Add a filter criteria "DELTA = true". You will only see the list items which differ from the last time you imported into the data hub In the example above, we'll import into a spoke model only the list items that are in US East, and that have changed since the last import.
Execute the import from the source into the data hub and then into the spoke models.
In the data hub model, upload the new files and run the process import.
In the spoke models, run the process import that takes the list from the data hub's filtered view. → Check the import logs and verify that only the number of items that have changed are actually imported.
Back in the data hub model, run the RESET DELTA process (1/2 im DELTA import). The RESET DELTA process resets the changes, so you are ready for the next set of changes. Your source, data hub model and spoke models are all in sync.
7) Import actuals (transaction data) into the data hub and then into the spoke models
Rather than importing actuals or transactions directly into a working model, import them to the data hub to make it easier for business users (with workspace admin rights) to easily select the imports they want to add to their spoke models There is one requirement: the file must include a transaction or primary key (identification code) that uniquely identifies each transaction. If there is not a transaction key, your options are as follows:
Option 1: Work with the IT team to determine if it is possible to include a transaction ID in the source. This is the best option, but not always possible.
Option 2: Create the transaction ID in Excel ® . Keep in mind there is a limit of 1 million rows in Excel. Also be careful about how you create the transaction ID in Excel, as some methods may delete leading zeros.
Option 3: Create a numbered list in Anaplan.
Creating a numbered list and importing transaction IDs:
Add a Transaction list (follow your naming conventions!) to the data hub model. In the General Lists window, select the Numbered option to change the Transaction list to a numbered list
In the Transaction list, create a property called "transaction ID", set the format to text. In the General Lists window, select Transaction ID in the Display Name Property field.
Open the Transaction list and add the formula: CODE(ITEM('Transaction')) to the Transaction ID property. It will be used as the display name of the numbered list. When importing into the Transaction list, set it up as indicated below
Map the Transaction ID of the source file to the Code. Remove any selection from the Transactions drop-down list (first source field). If duplicates on the transaction ID are found, reject the import. Otherwise you will introduce corrupted data into the model.
Import the transaction IDs into the Transactions list.
Create the Actuals module. Include the transaction list and as many line items as you have fields (columns) in your source file. Set up the format of your line items. They should be set up as format type text, with the exception of columns that include values that are numbers. For those, the format should be number and include any further definitions needed (for example decimal places, units.)
Add a line item called "Transaction ID" and set the format as text. Enter the formula: CODE(ITEM(Transactions)). This will be used when importing the numbered list into the spoke models.
Run the import of the source file into the Actuals module.
Name your two actions (imports): Import into Transactions (this was the import of the transaction IDs into the Transactions list) and Import into Actuals (this was the import from the source file into the Actuals module). Create a process that includes both imports: first, Import into Transactions, then Import into Actuals.
Why a 2-dimensional module? It is important to understand that the Actuals module is a staging module with two dimensions only: transaction and line items. You can load multiple millions of these transactions and have 50+ line items, which corresponds to the properties of each transaction including version and time. Anaplan will scale without any issues.
Do not create a multi dimensional module at this stage. This will be done in the spoke models, and you will carefully pick what properties will become dimensions. This will impact the spoke model size significantly if you have large lists.
In the Actuals module, create a view that you will use for importing into the spoke model. Create as many saved views as required, based on how you have split the spoke models.
The import into the module will run without errors or warnings. It does not mean that all is clean, as we just loaded some text.
The reconciliation in the data hub consists of verifying that every field of the source system matches an existing item of the list of values for that field.
In the module, create a list formatted line item that corresponds to each field, and use the FINDITEM() function to lookup the actual item. If the name does not match, it will return a blank cell. These cells needs to be tracked in a reconciliation dashboard. The source file will need to be fixed until all transactions actually have a corresponding item in a list.
If there is not a list of the fields included in your data hub model, first create that list. Add a line item to the module and select list as the format type, then select the list name of your list of fields. Add the FINDITEM formula FINDITEM(x,y) where x is the name of your list and y is the line item. See example below:
transaction 0001 is clean, transaction 0002 has an account A4 code that does not match
Set up a dashboard to allow users to view the data so they can make corrections in the source system. Set up a saved view for the errors and add conditional formatting to highlight the missing (blank items) data. You can also include a counter to show the number of errors and add that information to the dashboard.
Import into the spoke models
In the spoke models:
Create the transaction numbered list. Import into this list from the transaction module saved view that you created in the data hub, filtered on any property you need to limit the transactions you want to push. Map the Code of the numbered list of the spoke model to the calculated Transaction ID of the Master data hub model.
Create a transaction flat module. Import into this module from the same transaction module, filtered on any property you need to limit the transactions you want to push that were created in the data hub.
Make sure you select the Calculated Transaction ID as your source. Do not use the Transaction name as it will be different for the same transaction in the data hub model and the spoke model.
Create a target multi dimensional module, using SUM functions from the Transactional module across the line items formatted as list or time.
Simple 2 dimensional module Account, Product
Use SUM functions as much as possible, as it will enable users to use the drill to transaction feature that shows the transaction that make up an aggregated number.
8) Incremental data load
The Actual transaction file might need to be imported several times into the data hub model and from there into the spoke models during the planning peak cycle. If the file is large, it can create performance issues for end users. Since not all transactions will change as the data is imported several times a day, there is a strong opportunity to optimize this process.
In the data hub model transaction module, create the same CHECKSUM, CHECKSUM OLD and DELTA line items. CHECKSUM should concatenate all the fields you want to track the delta on, including the values. "DELTA" line item will actually catch new transactions, as well as modified transactions. See 6. Incremental List Imports above for more information
Filter the view using DELTA to only import transaction list items into the list, and the actuals transaction into the module.
Create an import from CHECKSUM to CHECKSUM OLD, to be able to reset the delta after the imports have run, name this import: 2/2 im DELTA, and add it to the DELTA process created for the list.
In the spoke model, import into the transaction list and into the transaction module, from the transaction filtered view.
Run the DELTA import or process.
You can semi-automate this process and have it run automatically on a frequent basis if incremental loads have been implemented. That provides immediacy of master data and actuals across all models during a planning cycle.
It's semi-automatic because it requires a review of the reconciliation dashboards before pushing the data to the spoke models.
There are a few ways to automate, all requiring an external tool: Anaplan Connect or the customer's ETL.
The automation script needs to execute in this order:
Connect to the master data hub model.
Load the external files into the master data hub model.
Execute the process that imports the list into the data hub.
Execute the process that imports actuals (transactions) into the data hub.
Manual step: Open your reconciliation dashboards, and check that data and the list are clean. Again, these imports should run with zero errors or warnings.
Connect to the spoke model.
Execute the list import process.
Execute the transaction import models. Repeat 5, 6, and 7 for all spoke models.
Connect to the master data hub model.
Run the Clear DELTA process to reset the incremental checks.
Other best practices
Create deletes for all your lists
Create a module called Clear Lists. In the module, create a line item of type Boolean in the module where you have list and properties, call it "CLEAR ALL" and set a formula to TRUE.
In Actions, create a "delete from list using selection" action and set it as below:
Repeat this for all lists and create one process that executes all these delete actions.
Example of a maintenance/reconcile dashboard
Use a maintenance/reconcile dashboard when manual operations are required to update applications from the hub. One method that works well is to create a module that highlights if there are errors in each data source. In that module, create a line item message that displays on the dashboard if there are errors, for example: There are errors that need correcting. A link on this dashboard to the error status page will make it easy for users to check on errors.
A best practice is to automate the list refresh. Combine this with a modeling solution that only exports what has changed.
There should be two saved views: One for development and one for production. That way, the hub can feed the development models with shortened versions of the lists and the production models will get the full lists.
The development (DEV) model will need the imports set up for DEV and production (PROD) if the different saved view option is taken.
The additional ALM consideration is that the lists that are imported into the spoke models from the hub need to be marked as production data.
The data hub houses all global data needed to execute the Anaplan use case. The data hub often houses complex calculations and readies data for downstream models.
The development model is built to the 80/20 rule. It is built upon a global process, regional specific functionality is added in the deployment phase. The model is built to receive data from the data hub.
During this stage, Anaplan Connect or a 3rd party tool is used to automate data integration. Data feeds are built from the source system into the data hub and from the data hub to downstream models.
The application is put through rigorous performance testing, including automated and end user testing. These tests mimic real world usage and exceptionally heavy traffic to see how the system will perform.
The data hub is refreshed with the latest information from the source systems. The data hub readies data for downstream models.
The development model is copied and the appropriate data is loaded from the data hub. Regional specific functionality is added during this phase.
Additional data feeds from the data hub to downstream models are finalized. The integrations are tested and timed to establish baseline SLA. Automatic feeds are placed on timed schedules to keep the data up to date.
The application is again put through rigorous performance testing.
The need for additional data for new use cases is often handled by splitting the data hub into regional data hubs. This helps the system perform more efficiently.
The models built for new use cases are developed and thoroughly tested. Additional functionality can be added to the original models deployed.
Data integration is updated to reflect the new system architecture. Automatic feeds are tested and scheduled according to business needs.
At each stage, the application is put through rigorous performance testing. These tests mimic real world usage and exceptionally heavy traffic to see how the system will perform.