Add a new Data Hub
What are best practices when is needed to create a new Data Hub due to workspace space limitation to 130GB?
I have already built the Data Hub which contains historic data and data validation processes. I still need to load further years. The only option I have, without deleting the data already loaded is to create a new Data Hub model on a different workspace and feed the models from both DHs.
I am aware of best practices, why Anaplan should not be a DW, but besides this, I am sure there are clients which are using multiple Data Hub due to space limitation of one workspace and I would like to know the challanges and best practices on setting this up.
@Manuela_Apostol before you add another workspace, and if you haven't already done so, you might consider reading @rob_marshall data hub for peak performance and @DavidSmith planual starting at page 54. I had the same problem with a client. Once I implemented DISCO I cut my module space down 40% simply by dimensionalizing time in my transaction files. I also moved the text line items that parse the UID to a separate module and used it as a control module. This took out another 20%. With flat modules the size you're talking about every line item should be heavily scrutinized - avoid text if you can.
If you've attempted all these and are committed to building a new workspace, you might also consider using ALM to keep the models in sync. ALM won't affect the data, just the structure. You can implement ALM across workspaces - just remember the target model must have the most recent revision tag in the source model. I've done this once before and only had two issues. First, I had to reset all the import data sources when I moved the model to a different workspace. Second, ALM is a model level migration, not an object level.
Just some ideas for you.2
if space is taken mostly by transactions then you don't have much options to reduce the size.
I don't see any issues with having 2 data hubs, many are doing it.
As your is a time cut off, here's how I would do the transition:
1 use the current DH for one last time to populate all spoke
2 make a copy of the DH, clear it
3 load new transactions
4 in the spoke models, go the models and change the source of the data hub to the new one
5 that makes all your actions using the new DH, no need to do more.
I'd prefer that solution, not knowing everything as in theory you don't need to use the "old" data hub anymore. If in case you need, you can temporarily re-change the source or recreate one action.3
Thank you so much for your reply @JaredDolich . The article and PLANUAL are amazing and went through many times. I've been also doing a lot of improvements and work with the client to deliver data in the best format. So steps have been already taken, however, still need more space.
Regarding ALM, this is only available for Enterprise clients, so not an option.
The biggest challenge is the amount of data needed. We are loading transactional data, every month containing about 4-5 millions of transaction some months even 12million and each transaction brings about 30-35 fields which are needed. Business is growing so we expect to even more next year 🙂1
@Manuela_Apostol you may want to schedule some time with your Anaplan BP.
If you are using standard edition (the only one that doesn't have ALM) you may not even be able to connect workspaces (also, affects standard edition)
Candidly, I don't know what cross workspace data-link means but if you can't link workspaces then your ability to expand your data hub wouldn't work without an upgrade.
Here's a link with the features.2
130GB is a very big but not the biggest i've seen!
I have a few questions which could help to solve your issue:
How many different data areas / logs does the data hub contain - for example how many unique files / sources - could these be split into their own data hubs?
At any one point does all the data need to be in the datahub? - could time ranges be used and copies of the datahub be archived?
Does the client have a database / data lake? - then it could be possible to run uploads of the required data to the datahub?
Does your data validation create an exception report (all the changes) which is then refreshed in the orginal data source?
It is possible to have multiple data hubs and this is something to consider.
I look forward to hearing from you.
Thank you @usman.zia for your help. See bellow my answers:
How many different data areas / logs does the data hub contain - for example how many unique files / sources - could these be split into their own data hubs? ---> There are two main data sources - History and daily (which contains transactional data) and few other very small loads for set up lists' structure
At any one point does all the data need to be in the datahub? - could time ranges be used and copies of the datahub be archived? -> yes, for now we I need the 2 years of history plus on going months
Does the client have a database / data lake? - then it could be possible to run uploads of the required data to the datahub? -> yes
Does your data validation create an exception report (all the changes) which is then refreshed in the orginal data source? -> no, data validation only for data sent to spoke models0
Thank you ! It is definetely a good time to re-start the discussion of ALM functionality !0
That is a very good short term solution given the circumstances!