Import from File vs Module
I’ve been running the Model Optimisation training in recent weeks and one of the benefits of Data Hubs we talk about is that Model to Models imports are faster than exporting to a file and re-importing from a file.
The question is always “by how much?”, so these are the details of a test we did in April
Performance test of 700k rows, 12 data points - loading data from a file vs a module view, The file was already in the cloud, so this is a “like for like” test
File = 36s, Module = 5s
So, this backs up the Best Practice of using Data Hubs and Staging models to minimise the import times to production models.
Also, the format of the data has a slight difference:
The previous test was 12 data points for 700k rows. This test was 1,003,966 rows for a single data point
File vs Module
Boolean = 9s vs 5s
Date = 9s vs 5s
Formatted List = 12s vs 6s
Number = 9s vs 4s
Text = 10s vs 5s
The difference is all in the read time - gathering the data from a file is slower than from a module; the actual import time is pretty much the same between the types
I would finally add that it is critical to make sure you are using a saved view on the source module and it is efficient; that has a direct impact on the import time. Every module to module import "opens" the saved view before transferring the data to the target. So ensure you only have what you need in the source; don't have 50 line items in the source if you are only importing 3. and ensure you are using efficient filters (one boolean per axis)