Table of Contents
A data hub is a separate model that holds an organization’s data.
One type of data hub can be integrated with an organization’s data warehouse and hold ERP, CRM, HR, and other data as shown in this example.
But this isn’t the only type of data hub. Some organizations may require a data hub for transactional data, such as bookings, pipeline, or revenue.
Whether you will be using a single data hub or multiple hubs, it is a good idea to plan your approach for importing from the organization’s systems into the data hub(s) as well as how you will synchronize the imports from the data hub to the appropriate model. The graphic below shows best practices.
When building a data hub, the best practice is to import a list with properties into a module rather than directly into a list. Using this method, you set up line items to correspond with the properties and import them using the text data type. This imports all the data without errors or warnings. The data in the data hub module can be imported to a list in the required model.
The exception for importing into a module is if you are using a numbered list without a unique code (or in other words, you are using combination of properties). In that case, you will need to import the properties into the list.
Here are the steps to create the basics of a hub and spoke architecture.
You can create the data hub in the same workspace where all the other models are, but a better option is to put the data hub in a different workspace. The advantage is role segregation; you can assign administrator rights to users to manage the Hub and not provide them with access to the actual production models, which are in a different workspace. Large customers may require this segregation of duties.
Note: This functionality became available in release 2016.2.
Automate the process with actions. Each time you imported, an action was created. Name your actions using the appropriate naming conventions.
Note: Indicate the name of the source in the name of the import action.
To automate the process, you’ll want to create one process that includes all your imports. For hierarchies, it is important to get the actions in the correct order. Start with the highest level of the hierarchy list import, then the next level list and on down the hierarchy. Then add the module imports. (The order of the module imports is not critical.)
Now, let's look at an example:
You have a four-level hierarchy to load, such as 1) Employee→ 2) State → 3) Region → 4) Country
Create lists with the right naming conventions. For this example, create these lists:
Set the parent hierarchy to create the composite hierarchy.
Import into each list from the source file(s), and only map name and parent. The exception is the employee list, which includes a code (employee ID) which should be mapped. Properties will be added to the data hub later.
Create one module for each list that includes properties. Name the module [List Name] Properties. For this example, only the Employees list includes properties, so create one module named Employee Properties.
In each module, create as many line items as you have properties. For this example, the line items are Salary and Bonus. Open the Blueprint view of the module and in the Format column, select Text. Pivot the module so that the line items are columns.
Import the properties. In the grid view of the module, click on the property you are going to import into. Set up the source as a fixed line item. Select the appropriate line item from the Line Item tab and on the Mapping tab, select the correct column for the data values. You’ll need to import each property (line item) separately. There should be no errors or warnings.
Each time you run an import, an action is created. You can view these actions by selecting Actions from the Model Settings tab. The previous imports into lists and modules have created one import action per list. You can combine these actions into a process that will run each action in the correct order. Name your actions following the naming conventions. Note, the source is included in the action name.
Create one process that includes the imports. Name your process Load [List Name]. Make sure the order is correct: Put the list imports first, starting with the top hierarchy level (numbered as 1) and working down the module imports in any order.
These list imports should be running with zero errors because imports are going into text formatted items.
If some properties should match with items in lists, it's recommended to use FINDITEM formulas to match text to list items:
If there is not a list of the properties included in your data hub model, first, create that list. Let’s use the example of Territory. Add a line item to the module and select list as the format type, then select the list name of your list of properties—in this case, Territory from the drop-down. Add the FINDITEM formula FINDITEM(x,y) where x is the name of your list (Territory for our example) and y is the line item. You can then filter this line item so that it shows all of the blank items. Correct the data in the source system.
If you will be importing frequently, you may want to set up a dashboard to allow users to view the data so they can make corrections in the source system. Set up a saved view for the errors and add conditional formatting to highlight the missing (blank items) data. You can also include a counter to show the number of errors and add that information to the dashboard.
If the architecture of your model includes spoke models by regions, you need one master hierarchy that covers all regions and a corresponding module that stores the properties. Use that module and create as many saved views as you have spoke region models. For example, filter on Country GI = Canada if you want to import only Canadian accounts into the spoke model.
You will need to create a saved view for each hierarchy and spoke model.
Use the cross-workspace imports if you have decided to put your Master data hub in a separate workspace.
Well done! You have imported your hierarchy from a data hub model.
When you are in the midst of your peak planning cycle and your large lists are changing frequently, you’ll want to update the data hub and push the changes to the spoke models. Running imports of several thousand list members, may cause performance issues and block users during the import activity.
In a best case scenario, your data warehouse provides a date field that shows when the item was added or modified, and is able to deliver a flat file or table that includes only the changes. Your import into the HUB model will just take few seconds, and you can filter on this date field to only export the changes to the spoke models.
But in most cases, all you have is a full list from the data warehouse, regardless of what has changed. To mitigate this, we'll use a technique to export only the list items that have changed (edited, deleted, updated) since the last export, using the logic in Anaplan.
In the data hub model:
Create a second line item, name it CHECKSUM OLD, set the format as Text, and create an import that imports CHECKSUM into CHEKSUM_OLD. Ignore any other mappings.
Name this import: 1/2 im DELTA and put it in a process called "RESET DELTA"
Execute the import from the source into the data hub and then into the spoke models.
The Actual transaction file might need to be imported several times into the data hub model and from there into the spoke models during the planning peak cycle. If the file is large, it can create performance issues for end users. Since not all transactions will change as the data is imported several times a day, there is a strong opportunity to optimize this process.
Filter the view using DELTA to only import transaction list items into the list, and the actuals transaction into the module.
Create an import from CHECKSUM to CHECKSUM OLD, to be able to reset the delta after the imports have run, name this import: 2/2 im DELTA, and add it to the DELTA process created for the list.
In the spoke model, import into the transaction list and into the transaction module, from the transaction filtered view.
Run the DELTA import or process.
You can semi-automate this process and have it run automatically on a frequent basis if incremental loads have been implemented. That provides immediacy of master data and actuals across all models during a planning cycle.
It's semi-automatic because it requires a review of the reconciliation dashboards before pushing the data to the spoke models.
There are a few ways to automate, all requiring an external tool: Anaplan Connect or the customer's ETL.
The automation script needs to execute in this order:
Create a module called Clear Lists. In the module, create a line item of type Boolean in the module where you have list and properties, call it "CLEAR ALL" and set a formula to TRUE.
In Actions, create a "delete from list using selection" action and set it as below:
Use a maintenance/reconcile dashboard when manual operations are required to update applications from the hub. One method that works well is to create a module that highlights if there are errors in each data source. In that module, create a line item message that displays on the dashboard if there are errors, for example: There are errors that need correcting. A link on this dashboard to the error status page will make it easy for users to check on errors.
A best practice is to automate the list refresh. Combine this with a modeling solution that only exports what has changed.
There should be two saved views: One for development and one for production. That way, the hub can feed the development models with shortened versions of the lists and the production models will get the full lists.
The development (DEV) model will need the imports set up for DEV and production (PROD) if the different saved view option is taken.
The additional ALM consideration is that the lists that are imported into the spoke models from the hub need to be marked as production data.
|DATA HUB||The data hub houses all global data needed to execute the Anaplan use case. The data hub often houses complex calculations and readies data for downstream models.|
|The development model is built to the 80/20 rule. It is built upon a global process, regional specific functionality is added in the deployment phase. The model is built to receive data from the data hub.|
|During this stage, Anaplan Connect or a 3rd party tool is used to automate data integration. Data feeds are built from the source system into the data hub and from the data hub to downstream models.|
|The application is put through rigorous performance testing, including automated and end user testing. These tests mimic real world usage and exceptionally heavy traffic to see how the system will perform.|
|DATA HUB||The data hub is refreshed with the latest information from the source systems. The data hub readies data for downstream models.|
|The development model is copied and the appropriate data is loaded from the data hub. Regional specific functionality is added during this phase.|
|Additional data feeds from the data hub to downstream models are finalized. The integrations are tested and timed to establish baseline SLA. Automatic feeds are placed on timed schedules to keep the data up to date.|
|The application is again put through rigorous performance testing.|
|DATA HUB||The need for additional data for new use cases is often handled by splitting the data hub into regional data hubs. This helps the system perform more efficiently.|
|The models built for new use cases are developed and thoroughly tested. Additional functionality can be added to the original models deployed.|
|Data integration is updated to reflect the new system architecture. Automatic feeds are tested and scheduled according to business needs.|
|At each stage, the application is put through rigorous performance testing. These tests mimic real world usage and exceptionally heavy traffic to see how the system will perform.|