Data Import Integration

edited April 2023 in Modeling

Hello, I have a question for my understanding on data import integration. When data is being imported from a database, source, etc how does Anaplan remove duplicate data from WS02 and are there any specific requirements on the sequence of data, or does that depend on the organization?

How does Anaplan remove duplicate data?


  • Hi @wbroughton

    I hope this helps. When loading data into Anaplan, regardless the data source, you have to specify a "primay key" or unique code. This Key or Code is used either to create the list items or to map the data in a module to the correct row (list item).

    Now, focusing on Loading Data into a List. When creating a list, for instance, that Key or Unique Code is either

    1. the name (if it is not a numbered list)
    2. the code (best practice)
    3. in some extreme cases, a combination of properties

    Which one is used by Anaplan to uniquely identify each item in the list if define by you during the data load process.

    It is important to keep in mind the above, so we can answer your question: How does Anaplan remove duplicate data?

    Anaplan does not "remove" duplicated data. When loading data into a list, Anaplan either rejects the duplicated data or updates it. It depends on what you understand as duplicated data:

    1. If duplicated data means "in the source data, the same record (aka, the same Key or Unqiue Code) is present more than once on the data source", then Anaplan will reject data. It can either reject the entire data set or the specific instances (specific rows) where the Key or Code is duplicated. In the latter case, Anaplan will pick (Create in the list) the first occurrance (from the top to the bottom of your data source) and reject the rest of the repeated codes (but not "remove" them). Again, this behaviour can be set during the data load process.
    1. If duplicated means "the Anaplan list already contains a Key or Unqiue Code that we are loading from the source", then what happens is that ANaplan updates the attributes attached to that code. FOr instance, lets say you have the list "product SKU" in Anaplan, and you already have there the SKU with code 1001, called "Coffe 250g" belonging to the parent item "Coffe". Now, you load a new data file including a product SKU with code 1001, but named "Milk 500ml" and parent "Milk". What happens here is that the list item "1001" gets updated and renamed as "Milk 500ml" and moved under the new parent.

    There are a few more nuances to this, but I hope the explanation above helps you to better understand how Anaplan deals with duplicated data (in any list from any workspace).

  • It's also worth noting that for module data imports, duplicate data is treated differently depending on the format of the line item addressed - for numeric data, any matching values are summed together whereas for other data types having multiple values for the same cell results in an error.