Data Hub to Spoke Model Performance Question

ebarlow21
edited December 2022 in Modeling

We are building a budgeting and reporting model based on transactions incorporating our chart of accounts. We have built a Data Hub to house the lowest level transactions. Currently we budget using 6 categories with several layers of hierarchy totaling a count of 6,475 items. Summarizing monthly transactions we are looking at loading approximately 374,619 rows of data lines from csv files into the Data Hub annually and accumulating three years. We then will pull that into our spoke model to report on and allow for budgeting and forecasting. We are concerned that our cell count is too high and are struggling with how to lower it. Do you have recommendations to lower this in our spoke model? We have tried to keep our formulas simple and follow best practices. Is it reasonable to have the quantity of categories, hierarchies and data lines in our spoke model in order to complete our budgeting at the lowest level hierarchy? 

We’ve included a screenshot of our spoke model including the module showing 398,697,600 cell count.

Thank your assistance with this.

ebarlow21_0-1666383501552.png

 

Tagged:

Best Answer

  • rob_marshall
    edited January 17 Answer ✓

    @ebarlow21 

     

    Wow, you may have hit the trifecta on potential "opportunities".  First, 781M cells is not that big as each line item has a threshold of roughly 2.1B cells.  With that said, that is what you "can" do, but that does eat into your overall footprint.  I would suggest a couple of things:

    • do you HAVE to have the lowest level of detail within this model?  The reason I ask is usually the detailed data doesn't, but it is used for validation.  With the way the UX is built, this detailed data can be in another model for people to view, it would just be another page within the app (assuming you are using the NUX).
    • a ton of space is often eaten up due to summaries of line items being turned on (they are turned on by default).  Often, these summaries are not needed, so I would review where they are turned on and see if they can be turned off.
    • Another article that may help is Model Optimization which gives you things to look for (link)
    • Speaking of splitting up the modules, you don't have to have everything in one module, think of this as progressive disclosure, only use the dimensions that you really need.  For example, if you are doing variance analysis, you really don't need the Versions dimension, you can use line item subsets and line items for the versions (link).
    • Lastly, I will leave you this article which is really good and talks about sparsity (link)

     

    Hope this helps, and Happy Holidays!

     

Answers

  • abhay.kanik
    edited December 2022

    Do you need the data at transaction level in spoke model for planning? As per my experience transactional data isn't necessary for planning its the aggregated data which can be used adn if thats the same case with you i would recommend to aggregate the data in data hub as per your required hierarchy and then import it into the spoke model.

    also ~400 mil isnt too big for a model.

  • ManjunathKN
    edited December 2022

    Hi @ebarlow21 

     

    If you want to do budgeting only at the lowest level then there should be no problem with your hirearchy as you are going to keep your summary NONE(list/line item summary). Understand the reporting requirement if the values are required at any hirearchy level use that hirearchy to a reporting/output module.

     

    If all of your levels in hirearchy is required to display then it is better you use the calculation module as the output module. Sizing should never be an issue if best practices are followed. 

     

    Thanks,

    Manjunath 

  • MarkWarren
    edited December 2022

    I'd echo this great point from @abhay.kanik , why do you have the transactional data, especially the flat list in the spoke model? What is the data hub doing?
    It should be processing/aggregating the flat data into what is needed in the spokes...

  • ManjunathKN
    edited December 2022

    @ebarlow21 

     

    Your tans01 module is having org list and accounts and why do you named it as trans01 module. 

     

    Firstly, you should not bring the flat list(transaction list) values/properties(line items) again in spoke model. You should do a import from data hub through a saved view of that transaction module. When you bring data from data hub, you should bring it in 2 different modules, one for properties(which does not change by time) and other for data with time.

     

    For calculation purposes you should use data module and for mapping purpose you should use properties module.

     

    Now i see 2 potential mistakes (I cannot confirm without seeing blueprint view) 1. you might be bringing the data for trans01 module from dat01 module, which is wrong and you should do like I explained above.

    2. You might be having/included the text formatted line items in trans01 module with time dimension. 

     

    https://community.anaplan.com/t5/Chapter-2/2-01-13-Separate-Transaction-data-from-Attribute-Property-data/ta-p/80736 

     

    Thanks,

    Manjunath 

  • ebarlow21
    edited December 2022

    Thank you for the previous information. Can you further explain how to aggregate and disaggregate data in the Data Hub?

     

    In addition without using the Transactional Flat list to import, how do we create the import when we have 6 independent (non-hierarchical) lists that we have to map to in the spoke model?

  • rob_marshall
    edited December 2022

    @ebarlow21 

     

    Have you read this article, it is pretty good on explaining the use of Data Hubs.

     

    https://community.anaplan.com/t5/Best-Practices/Data-Hubs-Purpose-and-peak-performance/ta-p/48866

     

     

  • ebarlow21
    edited December 2022

    Hi @rob_marshall ,

    We have read this article and implemented it in our Data Hub, which is working well. Our spoke model is still causing us some issues due to size. We've read all the best practice articles, incorporated many of the recommendations to reduce sparsity and size, but still looking at a very large model. 

     

    Our user story requires account details by organization at the lowest hierarchy level and funds at a higher level. The org list is what is driving the cell count as far as we can tell. We have time and the 3 lists set up so that accounts are in the rows, orgs and funds are in the pages as drop down items. Versions is in Columns - budget actual variance and forecast. 

     

    We have tried to split the funds and orgs up into separate modules, but can't figure out the best way to combine them back for ultimate use in the UX page. We built the model with Orgs and Accounts only and the UX page worked great but we need to add the Fund list in and that gets us upwards of 781M cells. 

     

    Do you have any advice or other articles that may get us over this hump?

    Thanks and Happy Holidays.

    Erin Barlow