Question

I want to create an S3 connection that will have monthly data stored in individual csv files that is connected to ADO. Around month end, data will be uploaded multiple times, replacing existing csvs. When this happens I want to replace the existing data for that month with the new file. I don't want to have to create a primary key for every dataset in order to do an incremental load but wanted to check if doing a replace will reload all of the data for every month or just the files that have been uploaded?

In addition, if we do have to create a primary key for the data, can it be a combination of columns or 1 single column

JonFerneau · Accepted Answer

Hi @SHayes ,

If you use a Replace in ADO, it will wipe the entire dataset table and load only the new data from S3 into that ADO dataset table. Incremental will load only what has changed based on Primary Keys and Cursor columns. If you do not want to create a Primary key for each data set, perhaps you can look into doing an Append in ADO. This would maintain all of the data in the ADO dataset table and add all of the new rows from your new file in S3. This will of course scale and take up more storage in ADO so it may not be the best solution.

If possible, I would work with a monthly cadence where you load a month file from S3 into ADO dataset table. Then load this table into your planning models. When the next month file comes into S3, do a replace on the ADO dataset table with the new month's data and again send to the planning models. The idea is that ADO will not be the data lake/data warehouse where data persists forever, but instead is the preliminary landing zone for data as it goes through to Anaplan planning models.