What are best practices when is needed to create a new Data Hub due to workspace space limitation to 130GB?
I have already built the Data Hub which contains historic data and data validation processes. I still need to load further years. The only option I have, without deleting the data already loaded is to create a new Data Hub model on a different workspace and feed the models from both DHs.
I am aware of best practices, why Anaplan should not be a DW, but besides this, I am sure there are clients which are using multiple Data Hub due to space limitation of one workspace and I would like to know the challanges and best practices on setting this up.
@Manuela_Apostol before you add another workspace, and if you haven't already done so, you might consider reading @rob_marshalldata hub for peak performance and @DavidSmithplanual starting at page 54. I had the same problem with a client. Once I implemented DISCO I cut my module space down 40% simply by dimensionalizing time in my transaction files. I also moved the text line items that parse the UID to a separate module and used it as a control module. This took out another 20%. With flat modules the size you're talking about every line item should be heavily scrutinized - avoid text if you can.
If you've attempted all these and are committed to building a new workspace, you might also consider using ALM to keep the models in sync. ALM won't affect the data, just the structure. You can implement ALM across workspaces - just remember the target model must have the most recent revision tag in the source model. I've done this once before and only had two issues. First, I had to reset all the import data sources when I moved the model to a different workspace. Second, ALM is a model level migration, not an object level.
Thank you so much for your reply @Jared Dolich . The article and PLANUAL are amazing and went through many times. I've been also doing a lot of improvements and work with the client to deliver data in the best format. So steps have been already taken, however, still need more space.
Regarding ALM, this is only available for Enterprise clients, so not an option.
The biggest challenge is the amount of data needed. We are loading transactional data, every month containing about 4-5 millions of transaction some months even 12million and each transaction brings about 30-35 fields which are needed. Business is growing so we expect to even more next year 🙂
Thank you @usman.zia for your help. See bellow my answers:
How many different data areas / logs does the data hub contain - for example how many unique files / sources - could these be split into their own data hubs? ---> There are two main data sources - History and daily (which contains transactional data) and few other very small loads for set up lists' structure
At any one point does all the data need to be in the datahub? - could time ranges be used and copies of the datahub be archived? -> yes, for now we I need the 2 years of history plus on going months
Does the client have a database / data lake? - then it could be possible to run uploads of the required data to the datahub? -> yes
Does your data validation create an exception report (all the changes) which is then refreshed in the orginal data source? -> no, data validation only for data sent to spoke models