Community perspective: Data Hub organization

Options
KBeltz
edited December 2023 in Blog

This article is intended to help assist and standardize Data Hub organization across the Anaplan platform. This methodology is intended to help the technical teams, not end users, navigate and create your Data Hub model and troubleshoot problems with individual data sources efficiently.

Functional areas

To begin, setup functional areas of Dashboards, Data, Inputs, System, Calculations, and Outputs. You may not use all the functional areas. Use the DISCO format labeling for all lists and modules in the Hub. For larger Data Hubs, I will specify the functional area specific to certain sets of data. For example, I have functional areas called DATA-ORACLE, DATA-WORKDAY, and DATA-SFDC highlighting the sources. 

KBeltz_0-1661988477122.png

Quick note about naming conventions in this document

Module, list, and actions naming conventions are left up to the model builder. I like to streamline the groupings in alignment with DISCO: DAT## name/description, INP## name/description, SYS## name/description, CAL## name/description, and OUT## name/description. This simple structure allows for great efficiencies when navigating Anaplan and communicating with colleagues. Additionally, the naming convention balances what end users like (names and descriptions) and data scientists (short-hand classifications). Lastly, this structure will be able to grow as more data sources are added while you expand your Anaplan use cases.

Real-life example

Imagine this common scenario while model building in a multiple-use case model with 50 different data imports. Julia, on the data integration teams, reaches out to Shaki, the Anaplan solutions architect and says, “I’m having a problem with the historic sales data load”. Shaki will want to know what module, actions, and processes are associated with Julia’s question while they begin to troubleshoot. The model builder, Mikey, is unavailable.

An unorganized model:

In an unorganized model, Julia and Shaki will spend time searching for the relevant actions and modules via names and descriptions. Did Mikey spell and call it the same way each time given character limitations on names in Anaplan on the actions and modules? Did the model builder bundle the actions together? Minutes will be spent to find the actions, processes, and modules themselves.

An organized model:

In an organized model, Julia will reach out and says, “I’m having problems with DAT024”. The team will be able to quickly search “DAT024” in all the modules, lists, and processes. Valuable minutes and headaches will be mitigated when everyone is aligned on the language.

Setting up a new file

Each data file follows a strict labeling process to make actions, processes, lists, and modules easy to find and navigate when communicating about a data file. In this example, we will load in an automated file called Sales History.

  • Create a list called DAT024 SAP Sales History FL
    • Avoid subsets and properties. Sometimes integration apps need to map all required fields, and these are sometimes “required”. It can cause integration problems without additional coding. It is best practice to use a property module.
  • Create a module called DAT024 SAP Sales History Properties and add the respective line items from the file, parsing logic demonstrated in Data Hubs: Purpose and peak performance, and organizational line items utilizing a Style
    • Dimension the module by DAT024 SAP Sales History FL
    • For organization, I like to use line items with a No Data format and Styling, for example: —DATA LOAD--, --PARSING--, and –OTHER HELPERS—
  • Create the action to import into the list 
    • Map the UID field to DAT024 SAP Sales History FL and to the code
    • Rename the action to 10.01 DAT024 Populate SAP Sales History FL from Sales_History.csv
  • Create the action to import into the property module
    • The UID maps to the DAT024 SAP Sales History FL and column headers to the line items
    • Set the line items to “Match on names or codes”
    • Rename the action to: 10.02 DAT024 Load Sales History Properties from SAP_Sales_History.csv
  • Create the action to import into the data module
    • Only include data line items such as amount, units, and date, no list properties
    • Rename action to: 10.03 DAT024 Load Sales History Data from SAP_Sales_History
  • Create a process
    • Name the process: DAT024_Load_SAP_Sales_History
    • Place three actions in the process. First is populate, second is load properties, and the third is loading the data
    • Avoid special characters in all process names. Special characters can cause unintended functions in API codes that would require extra backend work so its best to avoid them altogether
    • Use underscores instead of spaces everywhere an integration software could touch
    • If building any additional lists inside the data hub (such as a hierarchy needed in the Data Hub) include in the process as well.
      • Example: 2.2 Populate P1 Engine from DAT02 SV

KBeltz_1-1661988477127.png

  • Put notes in the actions regarding steps they should fall in the process and if any data is required to be loaded in prior like metadata.
  • The last step we need to do is setting the Admin files in Anaplan. The Admin file is just the file with the field names with no corresponding rows of data. This is important for system integrations.
    • When the API calls the process, it will still upload the newest file when the file name and structure match. 
    • When the structure does not match, the API uploads the file with no data rows as the default. 
    • This will prevent stale data from being loaded into Anaplan, you will be able to troubleshoot bugs easier, and the Anaplan load logs with “0 records uploaded” should alarm your integration team via notifications and logs.
  • Manage Import Data Sources→ Upload New File→ set to Admin

KBeltz_2-1661988477129.png

Create a mass delete module and action

There will be a need, especially in early cycles of testing, to wipe and replace data in the Data Hub. Sometimes, there will be cases to wipe and replace data each time data is loaded (On hand inventory, in-transit, open orders). I’ve wasted enough time going into lists to delete members (awfully slow interface!) and/or creating delete one-off Line Items and actions to delete members. To avoid this, take an extra minute and set up the SYS01 Mass Delete module and mass delete actions. You will thank yourself later!

  • Create a module called SYS01 Mass Delete
    • Add a line item for each list you have that is loaded from a data file
    • In this case, make a line item called DAT01 Sales History
    • Make each line item a subsidiary view
      • This is an acceptable use of subsidiary views, the module has one purpose and the formulas are all auditable
    • Set the formats to Boolean
    • Set the formulas to “TRUE”
  • Create a “Delete from List using Selection” action
    • Call it DAT024 Delete All Sales History
  • This module is a subsidiary view exception

Map metadata to catch exceptions

In an ideal world, data should be cleansed prior to reaching the Data Hub. From my experience, this is not always possible. Simple steps and reports can be set up in the Data Hub to catch these exceptions and alert administrators and the integration teams to faulty data.

  • Whenever possible, load in metadata as their own data files. In this example, we are loading in three metadata sources: DAT010 Item Master, DAT011 Location Master, and DAT012 Customer Master. Build the actions and processes of metadata the same way as DAT024 SAP Sales History.
  • In the DAT024 SAP Sales History module, finish out the parsing of the Code. In this use case, we care about the Item, Location, and Customer codes. For each parsed text, use the FINDITEM function.
  • Under the section –OTHER HELPERS--, create a line item called Exception Filter. 
    • The idea of this formula is to catch all instances where there may be bad data. 
    • In this use case, we will check if Posting Date, DAT02 Item Master FL, DAT03 Location Master FL, DAT04 Customer Master FL have values. 
      • If any of the fields do not have a value, then Exception Filter = TRUE

KBeltz_3-1661988477139.png

  • Create a saved view called “Exception Grid” that shows the Code, parsed text, FINDITEMS, any other data points that you are checking for, and the Exception filter line items.
  • When sending data to spoke models, do not send any data with exceptions (see Spoke Model Filter Stage)
  • Publish the Exception Grid to a Worksheet or Board

KBeltz_5-1661988477148.png

Other bits of advice

  • Never set up the Data Hub in the weekly time scale if other use cases require monthly. 
  • I prefer UTF-8 file formats
  • In the spoke models, I will use the same DAT naming convention for the landing modules
  • Data vs System for Master Data? It’s a grey area. My general rule of thumb is that if the module is uploaded via an external file then its Data.
  • I don’t create hierarchies in the Data Hub unless there is a specific reason to do so - like filtering the data before it hits the spoke model, manual hierarchies or lists, or in some cases data validation. I will have saved views in the respective master module for each level.

Final word

I understand there are infinite ways to model and organize. This is one way that has worked for me and my teams. I hope you can take away one or two things to try when you add your next data source to your Data Hub or stories to write for tech debt. Feel free to reach out or comment with any disagreements, questions, etc.!