1. Why do we create a concatenated list to load transactional load into data hub? Instead we can create individual lists as dimensions and load data.
2. Why do we create hierarchy lists as flat lists in Data Hub?
Excellent question. I'll answer these directly but first I HIGHLY recommend you read this data hub best practice article first by @rob_marshall . In this article @rob_marshall answers both your questions and a lot more that you're likely to be considering as you pursue data hub processes. You might also consider reading @DavidSmith article on the truth about sparsity. Even if you have to dimensionalize time with which may cause a bunch of empty cells, you're performance AND space will likely be smaller than if you don't dimensionalize time.
Hope this helps. Try to read @rob_marshall and @DavidSmith articles. I promise you'll walk away with a completely new perspective.
Firstly, @JaredDolich has provide a first rate response but Im sensing that the explanation may be a little to long and somewhat too technical.
The primary reason comes from the nature of multidimensional data.
Imagine for every data point in a list that point is duplicated by the number of list items in any list which is added as a dimension. Therefore, a module can become massive in size very quickly as each dimension added increases the number of possible combinations by a factor equal to the number of items in the new dimension.
The problem with this is that for the vast majority of possible combinations of data points across multiple dimensions data will never exist. Unlike other applications Anaplan will retain this in memory even if no data is added.
This is what we mean when we refer to sparsity. There is too many spare data reference points. To minimise sparsity we create a unique reference for each actual combination and create a list only for those combinations of dimensions that will ever contain any data. We achieve this by concatenating all the relevant codes to create a unique master code for that single reference. We compliment this by creating list properties which are themselves format as the required lists and populated this if the individual list items for those dimensions.
Hierarchies are not created in the data hub as we do not run any modelling in the hub so the functionality affording to us from their use is redundant. Hierarchies consume more memory than flat lists due to the need to aggregate up them and as they are not required they should not be created.
I hope this adds to @JaredDolich explanation and provides you with more context.
Hello @tflenker and @abhaymalhotra This post directly answers the question we raised about concatenating the key for transactional data in the DH.Also, read @rob_marshall article about the datahub (particularly the sections under: