I’ve been running the Model Optimisation training in recent weeks and one of the benefits of Data Hubs we talk about is that Model to Models imports are faster than exporting to a file and re-importing from a file. The question is always “by how much?”, so these are the details of a test we did in April
Performance test of 700k rows, 12 data points - loading data from a file vs a module view, The file was already in the cloud, so this is a “like for like” test File = 36s, Module = 5s
So, this backs up the Best Practice of using Data Hubs and Staging models to minimise the import times to production models. Also, the format of the data has a slight difference: The previous test was 12 data points for 700k rows. This test was 1,003,966 rows for a single data point File vs Module Boolean = 9s vs 5s Date = 9s vs 5s Formatted List = 12s vs 6s Number = 9s vs 4s Text = 10s vs 5s The difference is all in the read time - gathering the data from a file is slower than from a module; the actual import time is pretty much the same between the types
I would finally add that it is critical to make sure you are using a saved view on the source module and it is efficient; that has a direct impact on the import time. Every module to module import "opens" the saved view before transferring the data to the target. So ensure you only have what you need in the source; don't have 50 line items in the source if you are only importing 3. and ensure you are using efficient filters (one boolean per axis)
Always great to see the clear and concise analysis you present @DavidSmith
It would be great to see a step by step Model Optimisation guide for existing models using the best practice outlined in the Planual at some point in the future, outlining areas of the "biggest wins", such as saving model space, time on imports etc. backed up by this type of analysis!
David, something we learned in the modeling certification 2 was to make sure the common dimensions are in the same order so the hypercube indexes them efficiently. Will this be a factor when importing from module (saved view) to module (target) or does this only apply to line item formulas? If so, how do we know if the saved view is in the right order?
Thanks @DavidSmith. I've been using the Tabular Multiple Column when I have more than 3 dimensions for export out of Anaplan but I never considered it for a module to module import. I'm going to give that a try. Is there a reason why we can't use a boolean filter on the time dimension? I have a client that wants to module to module import but only unelapsed days. I haven't found a good way to do that since there are 4 dimensions: date, 30 minute time increment, product, and partner. Because I have to import all days it's quite a big module 6GB and takes 25 minutes to import all that data.
Incidentally, this would be a great topic for the modeling cert 3.