Building a Data Hub

Table of Contents

 

Overview

A data hub is a separate model that holds an organization’s data.

  • Data can be shared with all your models, making expands easier to implement and ensuring data integrity across models.
  • The data hub model can be placed in a different workspace, allowing for role segregation. This allows you to assign administrator rights to users to manage the data hub without allowing those users access to the production models.
  • The method for importing to the data hub (into modules, rather than lists) allows you to reconcile properties using formulas.

One type of data hub can be integrated with an organization’s data warehouse and hold ERP, CRM, HR, and other data as shown in this example.

Data Architecture.pngAnaplan Data ArchitectureBut this isn’t the only type of data hub. Some organizations may require a data hub for transactional data, such as bookings, pipeline, or revenue.

Whether you will be using a single data hub or multiple hubs, it is a good idea to plan your approach for importing from the organization’s systems into the data hub(s) as well as how you will synchronize the imports from the data hub to the appropriate model. The graphic below shows best practices.

 

High level best practices

Architecture Best Practices.png

 

When building a data hub, the best practice is to import a list with properties into a module rather than directly into a list. Using this method, you set up line items to correspond with the properties and import them using the text data type. This imports all the data without errors or warnings. The data in the data hub module can be imported to a list in the required model.

The exception for importing into a module is if you are using a numbered list without a unique code (or in other words, you are using combination of properties). In that case, you will need to import the properties into the list.

 

Implementation steps

Here are the steps to create the basics of a hub and spoke architecture.

1) Create a model and name it master data hub

You can create the data hub in the same workspace where all the other models are, but a better option is to put the data hub in a different workspace. The advantage is role segregation; you can assign administrator rights to users to manage the Hub and not provide them with access to the actual production models, which are in a different workspace. Large customers may require this segregation of duties.

Note: This functionality became available in release 2016.2.

 

2) Import your data files into the data hub

  1. Set up your lists. Identify the lists that are required in the data hub. Create these lists using good naming conventions. Set up any needed hierarchies, working from the top level down. Import data into the list from the source files, mapping only the unique name, the parent (if the name rolls up into a hierarchy), and code, if available.
    Do not import any list properties. These will be imported into a module.
  2. Create corresponding modules for those lists that include properties. For each list, create a module. Name the module [List Name] Properties. In the module, create a line item for each property and use the data type TEXT.
  3. Import the source file into the corresponding module. There should be no errors or warnings.
  4. Automate the process with actions. Each time you imported, an action was created. Name your actions using the appropriate naming conventions.

    Note: Indicate the name of the source in the name of the import action.

    To automate the process, you’ll want to create one process that includes all your imports. For hierarchies, it is important to get the actions in the correct order. Start with the highest level of the hierarchy list import, then the next level list and on down the hierarchy. Then add the module imports. (The order of the module imports is not critical.)

Now, let's look at an example:

You have a four-level hierarchy to load, such as 1) Employee→ 2) State → 3) Region → 4) Country

Step 1 Excel.png

 

Lists

Create lists with the right naming conventions. For this example, create these lists:

  • G1 Country
  • G2 Region
  • G3 State
  • Employee G4

Set the parent hierarchy to create the composite hierarchy.General Lists.png

Import into each list from the source file(s), and only map name and parent. The exception is the employee list, which includes a code (employee ID) which should be mapped. Properties will be added to the data hub later.

 

Properties → Modules

Create one module for each list that includes properties. Name the module [List Name] Properties. For this example, only the Employees list includes properties, so create one module named Employee Properties.

In each module, create as many line items as you have properties. For this example, the line items are Salary and Bonus. Open the Blueprint view of the module and in the Format column, select Text. Pivot the module so that the line items are columns.

Import the properties. In the grid view of the module, click on the property you are going to import into. Set up the source as a fixed line item. Select the appropriate line item from the Line Item tab and on the Mapping tab, select the correct column for the data values. You’ll need to import each property (line item) separately. There should be no errors or warnings.

 Employee Properties.png

 

Actions 

Each time you run an import, an action is created. You can view these actions by selecting Actions from the Model Settings tab. The previous imports into lists and modules have created one import action per list. You can combine these actions into a process that will run each action in the correct order. Name your actions following the naming conventions. Note, the source is included in the action name.

 

Actions.png

Create one process that includes the imports. Name your process Load [List Name]. Make sure the order is correct: Put the list imports first, starting with the top hierarchy level (numbered as 1) and working down the module imports in any order.

 

3) Reconcile

These list imports should be running with zero errors because imports are going into text formatted items.

If some properties should match with items in lists, it's recommended to use FINDITEM formulas to match text to list items:

  • FINDITEM simply looks at the text formatted line item, and finds the match in the list that you specify. Every time data is uploaded into Anaplan, you just need to make sure all items from the text formatted line item are being loaded into the list. This will be useful as you will be able to always compare the "raw data" to the "Anaplan data," and not have to load that data more than once if there are concerns about the data quality in Anaplan.

If there is not a list of the properties included in your data hub model, first, create that list. Let’s use the example of Territory. Add a line item to the module and select list as the format type, then select the list name of your list of properties—in this case, Territory from the drop-down. Add the FINDITEM formula FINDITEM(x,y) where x is the name of your list (Territory for our example) and y is the line item. You can then filter this line item so that it shows all of the blank items. Correct the data in the source system.

If you will be importing frequently, you may want to set up a dashboard to allow users to view the data so they can make corrections in the source system. Set up a saved view for the errors and add conditional formatting to highlight the missing (blank items) data. You can also include a counter to show the number of errors and add that information to the dashboard.

 

4) Split models: Filter and Set up Saved Views

If the architecture of your model includes spoke models by regions, you need one master hierarchy that covers all regions and a corresponding module that stores the properties. Use that module and create as many saved views as you have spoke region models. For example, filter on Country GI = Canada if you want to import only Canadian accounts into the spoke model.

You will need to create a saved view for each hierarchy and spoke model.

 

5) Import to the spoke module

Use the cross-workspace imports if you have decided to put your Master data hub in a separate workspace.

  1. Create the lists that correspond to the hierarchy levels in each spoke model. There is no way to create a list via import for now.
  2. Create the properties in the list where needed. Keep in mind that the import of properties into the data hub as line items is an exception. List properties generally do not vary, unlike a line item in a module, which are often measured over time. Note: Properties can also be housed in modules and there are some benefits to this. See Anapedia - Model Building (specifically, the "List Attributes" and "List attributes in a module" topics). If you decide to use a module to hold the properties, you will need to create a line item for each property type and then import the properties into the module.
  3. To simplify the mapping, make sure the property names in each spoke model match the line item names of the data hub model.
  4. In each spoke model, create an import from the filtered module view of the data hub model into the lists you created in step 1.
  5. In the Actions window, name your imports using naming conventions.
  6. Create a process that includes these actions (imports). Begin with the highest level in the hierarchy and work down to the lowest.

Well done! You have imported your hierarchy from a data hub model.

 

6) Incremental list imports

When you are in the midst of your peak planning cycle and your large lists are changing frequently, you’ll want to update the data hub and push the changes to the spoke models. Running imports of several thousand list members, may cause performance issues and block users during the import activity.

In a best case scenario, your data warehouse provides a date field that shows when the item was added or modified, and is able to deliver a flat file or table that includes only the changes. Your import into the HUB model will just take few seconds, and you can filter on this date field to only export the changes to the spoke models.

But in most cases, all you have is a full list from the data warehouse, regardless of what has changed. To mitigate this, we'll use a technique to export only the list items that have changed (edited, deleted, updated) since the last export, using the logic in Anaplan.

 

Setting up the incremental loads:

In the data hub model:

  1. Create a text formatted line item in your module. Name it CHECKSUM, set the format as Text, and enter a formula to concatenate of all the properties that you want to track changes for. These properties will form the base of the incremental import.
    Example: CHECKSUM = State & Segment & Industry & Parent & Zip CodeAccount A4 Properties.png
  2. Create a second line item, name it CHECKSUM OLD, set the format as Text, and create an import that imports CHECKSUM into CHEKSUM_OLD. Ignore any other mappings.Source Items.png

  3. Name this import: 1/2 im DELTA and put it in a process called "RESET DELTA"

  4. Create a line item and name it "DELTA" and set the format as Boolean. Enter this formula: IF CHECKSUM <> CHECKSUM OLD THEN TRUE ELSE FALSE.
  5. Update the filtered view that you created to export only the hierarchy for a specific region or geography. Add a filter criteria "DELTA = true". You will only see the list items which differ from the last time you imported into the data hubFilter.pngIn the example above, we'll import into a spoke model only the list items that are in US East, and that have changed since the last import.
  6. Execute the import from the source into the data hub and then into the spoke models.

    1. In the data hub model, upload the new files and run the process import.
    2. In the spoke models, run the process import that takes the list from the data hub's filtered view. → Check the import logs and verify that only the number of items that have changed are actually imported.
    3. Back in the data hub model, run the RESET DELTA process (1/2 im DELTA import). The RESET DELTA process resets the changes, so you are ready for the next set of changes. Your source, data hub model and spoke models are all in sync.

7) Incremental data load

The Actual transaction file might need to be imported several times into the data hub model and from there into the spoke models during the planning peak cycle. If the file is large, it can create performance issues for end users. Since not all transactions will change as the data is imported several times a day, there is a strong opportunity to optimize this process.

  • In the data hub model transaction module, create the same CHECKSUM, CHECKSUM OLD and DELTA line items. CHECKSUM should concatenate all the fields you want to track the delta on, including the values. "DELTA" line item will actually catch new transactions, as well as modified transactions. See 6. Incremental List Imports above for more informationActuals 2.png

     

  • Filter the view using DELTA to only import transaction list items into the list, and the actuals transaction into the module.

  • Create an import from CHECKSUM to CHECKSUM OLD, to be able to reset the delta after the imports have run, name this import: 2/2 im DELTA, and add it to the DELTA process created for the list.

  • In the spoke model, import into the transaction list and into the transaction module, from the transaction filtered view.

  • Run the DELTA import or process.

 

 Automation

You can semi-automate this process and have it run automatically on a frequent basis if incremental loads have been implemented. That provides immediacy of master data and actuals across all models during a planning cycle.

It's semi-automatic because it requires a review of the reconciliation dashboards before pushing the data to the spoke models.

There are a few ways to automate, all requiring an external tool: Anaplan Connect or the customer's ETL.

The automation script needs to execute in this order:

  1. Connect to the master data hub model.
  2. Load the external files into the master data hub model.
  3. Execute the process that imports the list into the data hub.
  4. Execute the process that imports actuals (transactions) into the data hub.
    • Manual step: Open your reconciliation dashboards, and check that data and the list are clean. Again, these imports should run with zero errors or warnings.
  5. Connect to the spoke model.
  6. Execute the list import process.
  7. Execute the transaction import models. Repeat 5, 6, and 7 for all spoke models.
  8. Connect to the master data hub model.
  9. Run the Clear DELTA process to reset the incremental checks.

 

Other best practices

  • Create deletes for all your lists

Create a module called Clear Lists. In the module, create a line item of type Boolean in the module where you have list and properties, call it "CLEAR ALL" and set a formula to TRUE.

In Actions, create a "delete from list using selection" action and set it as below:Clear Accounts A4.png

  • Repeat this for all lists and create one process that executes all these delete actions.

 

Example of a maintenance/reconcile dashboard

Use a maintenance/reconcile dashboard when manual operations are required to update applications from the hub. One method that works well is to create a module that highlights if there are errors in each data source. In that module, create a line item message that displays on the dashboard if there are errors, for example: There are errors that need correcting. A link on this dashboard to the error status page will make it easy for users to check on errors.

A best practice is to automate the list refresh. Combine this with a modeling solution that only exports what has changed.

Administration Dashboard.png

 

Dev-test-prod considerations

There should be two saved views: One for development and one for production. That way, the hub can feed the development models with shortened versions of the lists and the production models will get the full lists.

ALM considerations:

The development (DEV) model will need the imports set up for DEV and production (PROD) if the different saved view option is taken.

The additional ALM consideration is that the lists that are imported into the spoke models from the hub need to be marked as production data.

 

Development

DATA HUB The data hub houses all global data needed to execute the Anaplan use case. The data hub often houses complex calculations and readies data for downstream models.
DEVELOPMENT
MODEL
The development model is built to the 80/20 rule. It is built upon a global process, regional specific functionality is added in the deployment phase. The model is built to receive data from the data hub.
DATA
INTEGRATION
During this stage, Anaplan Connect or a 3rd party tool is used to automate data integration. Data feeds are built from the source system into the data hub and from the data hub to downstream models.
PERFORMANCE
TESTING
The application is put through rigorous performance testing, including automated and end user testing. These tests mimic real world usage and exceptionally heavy traffic to see how the system will perform.

 

Deployment

DATA HUB The data hub is refreshed with the latest information from the source systems. The data hub readies data for downstream models.
DEPLOYMENT 
MODEL
The development model is copied and the appropriate data is loaded from the data hub. Regional specific functionality is added during this phase.
DATA
INTEGRATION
Additional data feeds from the data hub to downstream models are finalized. The integrations are tested and timed to establish baseline SLA. Automatic feeds are placed on timed schedules to keep the data up to date.
PERFORMANCE
TESTING
The application is again put through rigorous performance testing.

 

Expansion

DATA HUB The need for additional data for new use cases is often handled by splitting the data hub into regional data hubs. This helps the system perform more efficiently.
MODEL 
DEVELOPMENT
The models built for new use cases are developed and thoroughly tested. Additional functionality can be added to the original models deployed.
DATA
INTEGRATION
Data integration is updated to reflect the new system architecture. Automatic feeds are tested and scheduled according to business needs.
PERFORMANCE
TESTING
At each stage, the application is put through rigorous performance testing. These tests mimic real world usage and exceptionally heavy traffic to see how the system will perform.
The content in this article has not been evaluated for all Anaplan implementations and may not be recommended for your specific situation.
Please consult your internal administrators prior to applying any of the ideas or steps in this article.
Comments

One major aspect of building a data hub is defining the proper flow of data from source systems through your Anaplan models.  With the advent of ALM, solution architects must determine the appropriate path of data from the data hub to the final production environment. In many cases the data should flow through the Development instance. Especially if the data is structural in nature and there is no need to add, delete or import directly in the Production model. Further, I have noticed that some of my clients are moving whole data sets from their data hub into their Dev/Prod models and then performing the scrubbing of the full data set prior to extracting out the unwanted data in their Production model.  The client will experiences large swings of memory usage as well as performance issues. The data scrubbing should be done in the Data Hub model and not in downstream models. 

Potential use case for DCA with large data HUBs and complex model landscapes: DCA is a very useful tool to implement in combination with a data HUB, where model reservations are required for scope selection. To aviod any situation where the incorrect model scope is imported from the HUB, DCA can be established to only grant write access to the model if the HUB selection matches the required model. Otherwise scope selection cannot be imported, which avoids any major issue where the wrong data is imported into a model.