OEG Best Practice: Data Hubs: Purpose and peak performance
@rob_marshall @JaredDolich thank you both for the prompt feedback. we follow the same approach. the only concern is that if you have 10 sources , you have to maintain 10 extra high level recon imports to make sure the other 10 low level imports are working properly. so the kind of concern is - how do i make sure my recon imports are also working correctly. i hope you understand what i mean !
and i also do not see any other option at this point of time.
Not knowing exactly what you have, but could you have a list with the Data Sources and then have a module by Time (monthly or quarterly or yearly) where the high level transaction total from the data hub gets loaded to? From the Data Hub side, you could have the same Data Source list where you sum the data into a module, create a view, then use that view for the import to spoke/target model.
hi @rob_marshall !
in my example i load data from 10 end models to data hub, consolidate data in hub and send to other system.
if i want to include recon process in this flow , i have to set up 30 imports:
- 10 imports from each end model to list on low level
- 10 imports from each end model to module on low level
- 10 imports from each end model on high level for recon directly to recon module
i will always have to set up an additional recon import for each source which creates an extra opportunity for errors.
but clearly there is no much other options inside anaplan.
hope this clarifies what i mean 🙂0
It is very possible I am missing something and if you would like to talk about it, please send me a email ([email protected]) to work this out. What I was trying to convey in my previous post:
- In the data hub, create a module by Time and create 10 line items, one for each source. Create a formula for each one to pull the totals.
- Create a view of this data and use this as an import to the spoke model. This way, it will be one action for all data sources.
- On the spoke model side, have a list of all 10 data sources. Create a view of this Data Sources list by Time and import the data from the data hub to this module.
Again, please reach out if you would like and we can jump on a quick call.
@rob_marshall thanks for the detailed article
What would be the best practice on using filtered view from data hub vs creating subsets in the spoke models?
I've seen both but curious what is the recommended usage for subsets.
I am bit confused by your question, but let me try and answer it. In the data hub, really you shouldn't have subsets because you can use filtered views as the source for spoke models. With that said, using a properties module in the data hub to define the subsets in the spoke model is absolutely ok. Remember, the data hub should be as basic as possible (flat lists, little to no analytical modules, no hierarchies, etc.) but that doesn't mean it can't supply that information to the spoke models via filters from "property" or "attribute" modules.
Does that help?
thanks for the answer. my questions was geared towards which option to choose between creating a subsets in the spoke model vs use filtered view from a properties module...0
Yea, I totally misunderstood your question and to be honest, I am still not understanding the question with 100% certainty. I am thinking the question is creating a subset vs. using a boolean in a properties module, is that correct? If so, it depends. Great answer, huh? But it depends on what you are using it for. A subset is great to get totals and to render list members for a selected number of list members. You can do the same with a boolean in a properties module, but the total could be off because it is only filtering the list members, not the overall total. With that said, it you need to check to see if a member is in a subset for a formula, it is best to use a properties module.
Here is a good trick, so you have the best of both worlds. Many folks will do a finditem() to see if a member is within the subset on the master list properties module (think Employee is the master list and i have a subset of Active Employee's). On the Employees properties module, create a line item formatted as a boolean with it hardcoded to True. Then, in the Applies To for that item, change it to the subset. With this done, you can use this line item for formulas and the subset for reporting (dashboard or apps).
Hopefully this was the question you asked. If not, we can try again.
Hi @rob_marshall , got a question on transaction details.
I have several attributes in Dat01 (sku, store, account etc..). What would be the best way to dimensionalize Dat02 data in the spoke model with these attributes?
I am not exactly sure I understand your question. If you DAT01, diminensionalized by transactional list (concatenated code of your attributes and Time) which holds your data that changes over time, and another module DAT02 which is also dimensionalized by your transactional list (but this time without Time), you can go just about anywhere from here. Just because you have this data at this level in your Hub model, doesn't mean you have to bring it over at the exact same. granularity to your Spoke model.
So, it is hard for me to understand exactly what you need in your spoke model. If you need a module which is dimensionalized by Account and SKU by time, you can do that from DAT01 and DAT02 (see the "Good Way" from above).
Hope this helps, if not, we can give it another go.
I have my properties module defining transactional key/list + other attributes as below.
and the data module dimensionalized with the same transaction list and time.
So If I want to bring in the data to spoke model dimensionalized by Grouping2, do i need to setup a hierarchy in the spoke model with transactional list and Groupping2?
thanks for help.0
Ahh...No, you don't...Let's walk through two different scenarios.
Scenario 1: My spoke module has Grouping 2 by Time. The great thing about Anaplan and the import is it can aggregate upon import. So, in this case, in the DAT01 (the one with time), you would create a lineitem named Grouping2, changes the Applies To to remove Time - basically setting Time to Not Applicable. This creates a subsidiary view, but it is needed.
The formula would point to your DAT02 (the properties module without Time) to get the correct Groupings. Now you can create a view. In DAT01, create a view with the transactional list and Time in the rows, with your line items on the column axis.
In the spoke model, understand what you want your module to be (let's say SKU, Grouping, and Time). In the spoke model, go into the target module and map the Groupings2 line item (from the import) to the Groupings2 list (in the import), Time to Time, and the value or monies to the line item in the target module. Again, the import will automatically aggregate on the fly.
In the spoke model, I have a hierarchy with SKU rolling up to Location and I would like to show this in a module, by Time and Groupings2. Since SKU you most likely will have the same SKU's across multiple locations, the SKU level will need to be a numbered list. Now, it depends on your naming convention, I would use the code of Location concatenated with the code of the SKU with a delimiter between them. This logic should be in DAT02 in the Hub and you would follow the same exercise in Scenario 1, but this time, instead of this line item being list formatted, it should be the code of the hierarchy. Prior to the import to the spoke model, you will need to create the hierarchy in the spoke model from views (L1 Location and L2 SKU) from DAT02 in the Hub. Create a module with the hierarchy, groupings2 list, and Time defined with a line item named value. Then import the data.
I hope that clears things up. If not, please DM me and we can get this solved.
I had a chance to try it out today and worked really nice. thanks for this detailed explanation.0
just read through your interesting article above. So, in case of two available workspaces and application of 1 data hub, what would you recommend as best practice in terms of performance: to put the data hub on one workspace and the working models on the other? Given that **** compliance is already handled on very detailed level here.
Thx for your opinion.
Yes, put the data hub in it's own workspace and the spoke models in a different workspace. This will help with segregation of duties but also when loading the data hub with large transactional lists.2
Great article @rob_marshall! I'm currently working through our models to rebuild with best practices, including the Data Hub because we currently don't use one at all (EEK! I know). How would you handle the following scenario?
We have a list of bonds that is manually tracked by another department in Excel. They provide us the spreadsheet and it is uploaded to our budget model once a year for budget planning. As I rebuild, the user would like to add items to the bond list manually on a dashboard moving forward along with the associated data, as there are only a handful added each year, and formatting the provided file for upload is problematic.
Moving forward, I want the bond list to be preserved in the Hub after new items are added and the budget is published, but it must also be available in the budget spoke model in real time for budget planning. Where would you build the "add new list item" functionality?
If it is in the Budget model, how would we send it backwards into the Hub for preservation since that is not encouraged, but there is no other source to pull it from? I'm concerned about building it into the Hub directly because users shouldn't be encouraged to go in there to do stuff, and an extra step would be needed to send each update back to the budget model for real-time analysis.0
Good morning and thank you for the kind words. As with everything, rules or best practices don't work 100% of the time and this might be one of those cases. If the bonds are not coming from a true source and the Budget spoke model is in effect the "system of record" for Bonds, then I would have the users add the bonds in the spoke model and then have an action populating the "Bond Flat" list in the Data Hub. You are correct in that users really shouldn't be in the Data Hub and also correct in that the data should flow in one direction (left to right or Hub to spoke), but this might be a case where that rule should be broken.
One thing to be careful about, please make sure the Bonds have a unique code. This is where having users "master" the data becomes tricky and why a true source system would be best.
Thanks @rob_marshall ! I appreciate your insight, I do agree that I have found one of the rare cases where data will need to flow backwards to the Hub, until the department can be persuaded to move their data to a more permanent system. And thank you for the reminder about the unique code! I may try to use the spoke model's numbered list unique code to my advantage with NAME(ITEM('Bond#')) in the export module perhaps? Good food for thought.0
Please don't use the unique code of the numbered list as that number means nothing and if the list is Production Data list and you create a copy of the model, those ID's get reset. Instead, try to figure out something else to make it unique.
@rob_marshall I'm referring to pulling the data into a standard list in the Hub, therefore the unique code from the source model, when pulled into a text line item with NAME in the export module, could become a unique standard list identifier. I have since learned that a unique ID does exist so this won't be necessary but it was a helpful method we used for our COVID expense tracker.0
As a newbie to Anaplan, this really helped me!
Could someone please explain how the DAT02 module would eventually be loaded to say a Reporting Module that has Store, SKU and Time as the 3 dimensions? Do I need a 3rd module combining DAT01 "Lists" and DAT02 "Transactions" to create a view that mimics a flat-file?0
Glad you liked it...To load the data from DAT02 to spoke/reporting model, simply create new line items for Store and SKU, which reference DAT01 Transaction Details. The format of these line items should be list formatted and the Applies To should be modified to remove Time (so set the Timescale to Not Applicable. This will create a subsidiary view which is ok for exporting data. Lastly, create a view with the transactional list as well as Time in the rows with the line items in the column.
Hope this helps,
I had a question regarding this best practice for a specific situation. Let's say we have umbrella company Company A.
This company has multiple subsidiaries each with their own Anaplan use cases - Companies B, C, D, and E.
Currently, Companies B, C, D, and E each have their own Data Hubs and Spoke Models.
They are in the process of consolidating their contracts/licenses with Anaplan. From a technical perspective there may be a desire to unify the Data Hub and Data Integrations at the Company A level.
Is this considered good practice or is segregation of duties a more effective way to manage Data Hubs?0
Hey bud, sorry it has taken me so long to answer. But, the "answer" is a bit difficult with what you laid out above as there are too many unknowns like:
- are the data hubs at the same time granularity?
- do they share master data?
- do they have issues with data segregation as in can admins from the Companies B, C, and D see data belonging to Company A?
- who is going to take care/administer this data hub?
- are the data sources for these data hubs different or will they be consolidated as well?
- is the data from the sources on a similar schedule?
- how big are the current data hubs?
- What shape (healthy or not) are these data hubs? Do they follow best practices or will it be better to rebuild correctly now that they know more about Anaplan?
- Are the use cases for the spoke models the same or completely different?
Just a couple of questions that should be asked and answered before knowing to consolidate to one or keep them separate.
Hope this helps,
Hi @rob_marshall ,
Really appreciate some guidance here.
We are setting up a transaction module in the data hub for let's say 80 entities, which would like to refresh their transactional data in different cadency. Some of them like to do it ad hoc, some of them monthly, some of them quarterly etc. We would like to make sure that the data refresh activities (not just incremental refresh but more like zero out and reload) by different entities do not interfere each other.
We are using OneCloud as data integration tool.
In this case, would you recommend setting up 80 modules, or one module dimentionalized by entity?
Thank you so much for your help.0
Hey, been a long time since we have spoken, hope all is well. Short answer, I would not set up 80 modules because that is not scalable (what if you increase/decrease your entities?). There is another thought of setting up modules based on refresh (Monthly module, Quarterly, etc.), but that might be too complicated as well, and again, what if one entity decides to switch?
In loading data to the data hub as stated above (80 modules as well as modules based on refresh rates), creates more integrations, and having to maintain those will not be fun, unless you are into that sort of thing. I would suggest loading the data to the data hub with one pull from the source so it is always updated. Then, create a mapping table based off the entities of when they want their data refreshed using Booleans. Then, use that Boolean for views to pull the data to the spoke model.
Also, please try to stay away from the wipe and reload of the transactional data. Data will automatically get updated from the feed IF you have a unique key, which you should always have.
If you want to talk about this further, we definitely can and I am happy to do so.
Thank you so much for your guidance, @rob_marshall ! I hope all is well with you as well!
I have not only read the data hub article for almost 10 times, but also read through all the Q&A comments, which covers so many use cases. Thank you again for your quick reply.
Got you. With respect to the data refresh, as the source sometimes remove the whole records, along with the unique key, I plan to create a Boolean to check if a unique key has been removed from the source. If yes, I will then delete those unique key list member from the list using the Boolean check.
Hope you have a good weekend!1
A quick question - what does it mean by 2 data files
Lets say, I have 1 source data file which doesn't have any unique code. So, are you suggesting that we should manually create 2 files from the original file - one with Store + SKU "unique" code & other with SKU unique codes + monthly figures
I'm not sure whether my understanding is correct, however if we need to create 2 separate files, then shouldn't it be more time consuming & manual work (which is prone to error)
Yes, your understanding of the two files is correct, but this should not be a manual process, it should be automated from your source. When requesting data from the source system, request two files:
- A single file with unique members, this will be for your list
- A single file for your transactional data by said members from the first bullet.
As for time consumption, actually, it will not take longer, you are only uploading one extra file but you will have the same number of actions (load members to list, load data to transactional module).
Hope this clarifies your question.