Import from File vs Module

I’ve been running the Model Optimisation training in recent weeks and one of the benefits of Data Hubs we talk about is that Model to Models imports are faster than exporting to a file and re-importing from a file.
The question is always “by how much?”, so these are the details of a test we did in April

 

Performance test of 700k rows, 12 data points - loading data from a file vs a module view,  The file was already in the cloud, so this is a “like for like” test
File = 36s, Module = 5s

So, this backs up the Best Practice of using Data Hubs and Staging models to minimise the import times to production models.
Also, the format of the data has a slight difference:
The previous test was 12 data points for 700k rows.  This test was 1,003,966 rows  for a single data point
File vs Module
Boolean = 9s vs 5s
Date = 9s vs 5s
Formatted List = 12s vs 6s
Number = 9s vs 4s
Text = 10s vs 5s
The difference is all in the read time - gathering the data from a file is slower than from a module; the actual import time is pretty much the same between the types

 

I would finally add that it is critical to make sure you are using a saved view on the source module and it is efficient; that has a direct impact on the import time. Every module to module import "opens" the saved view before transferring the data to the target.  So ensure you only have what you need in the source; don't have 50 line items in the source if you are only importing 3. and ensure you are using efficient filters (one boolean per axis)

David

Answers

  • Always great to see the clear and concise analysis you present @DavidSmith

     

    It would be great to see a step by step Model Optimisation guide for existing models using the best practice outlined in the Planual at some point in the future, outlining areas of the "biggest wins", such as saving model space, time on imports etc. backed up by this type of analysis!

     

     

  • @DavidSmith 

     

    Thanks for Sharing!

  • @CallumW 

    Thanks for the feedback

     

    To your point, yes, it is something we should put together, although quite often the priorities "depend"

     

    But, we'll aim to get a guide together

    David

  • David, something we learned in the modeling certification 2 was to make sure the common dimensions are in the same order so the hypercube indexes them efficiently. Will this be a factor when importing from module (saved view) to module (target) or does this only apply to line item formulas? If so, how do we know if the saved view is in the right order?

  • @JaredDolich 

     

    If you look at the initial mapping screen, it "looks" like a file, in a column format

    So, for imports, the dimension order is not relevant, because there is no "calculation" happening on the import

    The important thing, as mentioned is the saved view efficiency

     

    I would add, that sometimes the view needs to be nested, and this can be slow to render.  In this case, look at using the third export option

    2019-11-08_14-17-50.png

     

    You might find that this is quicker. - we have seen significant improvements due to the slow rendering of the view

    This is an exception, but an option if needed

    David

  • Thanks @DavidSmith. I've been using the Tabular Multiple Column when I have more than 3 dimensions for export out of Anaplan but I never considered it for a module to module import.  I'm going to give that a try. Is there a reason why we can't use a boolean filter on the time dimension? I have a client that wants to module to module import but only unelapsed days. I haven't found a good way to do that since there are 4 dimensions: date, 30 minute time increment, product, and partner. Because I have to import all days it's quite a big module 6GB and takes 25 minutes to import all that data.

     

    Incidentally, this would be a great topic for the modeling cert 3. 

  • You should be able to use a filter for the time dimension

    Nested rows can be quite heavy to render, but if you are just going module to module, you might not need to

    Often the native, unfiltered module view is really quick to render, so unless you've got a lot of data to filter, it might be quick just to use that

    As I said, open the view in Anaplan.  If that takes a long time then your import will also take a long time.

    If you can get the view to open efficiently, then your import will be faster

    Hope that helps

    David