Single or Multiple Data Hubs with ALM?
Can I ask the Community what the current best practice is with Data Hubs and ALM? In particular, should we just have one Data Hub in PROD, or take the Data Hub through the ALM process as well?
I can see arguments for and against, but I can't see the current best practice definitively recorded anywhere on the Community
I don't think there is a consensus but maybe we can make it here ? 🙂
Personnally, I have not put my data hubs through ALM in the past.
Usually, the DH is a relatively simple model without much logic, where the impact of the changes you have to make is easily identifiable, handled by the most proficient builders only.
If you ever have some strong logic in there and need to make a change to it, you can always temporarily use ALM (make a copy, deploy the original, dev in copy, test, sync, un deploy).
EDIT: As others have mentioned here, segregation of duty can be a reason why you will have to use ALM on a data hub. This way contractor can develop on the DH and never acces the prod data.
In any case, I believe it is your duty as the builder to make the model ready for ALM, even if it is not used right away. Because clicking lists to production is very easy for the builder (takes you probably les than 3 minutes), less so for people who haven't built it.1
There are no best practices for it because "it depends"
1. If your Data Hub contains a lot of complexity, and data validation, you might want to ensure the logic is kept in a development model. The general rule for all Developments is that you shouldn't be making wholesale changes in a production model (and this is especially true in the single source of data!!). If your data hub is simple then you probably don't need to worry about ALM-ing it
2. We often have multiple data hubs for different reasons, but let's assume they all share the same structure and the split is a size/geographic one, from 1. above, it makes sense to use ALM
3. Segregation of duties and security is a prime candidate for splitting Dev, Test and Prod, and even keeping them in different workspace. That applies to Data Hubs as for other models, since Data hub is just a model anyway!!
Explained by Professor himself @DavidSmith4
I have worked with Single Data hub and ALM.
There are less frequent changes in Data Hub Models as it is mostly to store the Transaction Level Data or to integrate it with source system.
We Moved the spoke model to production but didn't moved Data hub to Production. It was still the one only and Data team was only handling the Data Hub and no one else had the access to edit or upload the data.
If we create two Data Hub one Dev and other Production than the testing also becomes a bit challenging as you have to sync two Models now every time you make changes.
I will Suggest to have one Single Data Hub Only.
I agree with @Misbah that a DEV-model is very valuable when there is a lot of data validation/internal logic. But it should only be done in that case, if it is a straight forward data repository: don't bother.
However: I would never split up a DH-over geographies. IF size becomes a significant issue I'd prefer to split it over logical blocks. As some models would else have to connect to multiple data hubs to compose 1 list... The best practice should be 1 data hub if possible.
All the best,
One counter though What happens when we are live with first project (FP&A) having only one Data Hub with no ALM. Now as part of connected planning there is a need to do (let's say) HR planning. Its understood that HR data will not be fed until the development is complete in existing Data Hub but it ultimately exposes the Finance Production Data that was already there in the Hub for FP&A Implementation to external vendors/consultants.1
well in case if you have many data imports to run from external systems like daily jobs of loading transactional data and updating or key hierarchies and if size of production data is high, it would be suitable to keep production hub separate and in case of any design changes, do in a dev copy of the hub which can be then synced using alm, as you wont need that volume of data in the dev model and you an test functionality with a sanitized set of data.0
I think some of the responses have gone off topic, but I would agree with @Misbah as well as @nathan_rudman in that it really depends. One of the major things to look at is segregation of duties. For instance, what if certain people are not allowed to see true production data, like contractors? In this case, you would most definitely need to take the Data Hub through the different lifecycles. Also, by having a Dev/UAT/Prod data hub, you can minimize the volume footprint of the data hubs. Now, you will need to be careful when doing this because you could create some line items that work in Dev, but not in Prod (think isfirstoccurrence() where you have a threshold), but if you keep this in mind, you should be ok.
Hope this helps,
My vote goes to ALM. Initial setup might be tough but its worth so that you could do lot of testing before the code is in PROD HUB in order to ensure there is no corrupt data loaded and fool proof imports from various Sources.
HUB is the Source of Truth to all Planning models and we should care them with high respect 😁0
@sjbows I think this question is at par with the question "what is the meaning of life" and I believe the answer is "it depends"
Thanks @nathan_rudman @Misbah @rob_marshall and everyone else who chipped in. These are exactly the issues I was grappling with, so it's good to know we're all thinking the same thing.
I have marked @nathan_rudman 's response (just one Hub in PROD, but un-deploy and take back to DEV/TEST if a major change is required) as a solution, because I think it captures the best of both worlds, and will work in 90% of cases.
There are some organisations which will have segregation policies that will not allow developer access to PROD under any circumstances, in which case (and only in this case) you should follow the more laborious approach of keeping a permanent DEV version and following the same ALM approach as you would with the other models.
Can we all agree on this? At least for now? 🙂
I have added it to my answer then0