Data Integration Challenge: Seeking Guidance on Anaplan Platform



I am currently working on a project that involves integrating external data sources into the Anaplan platform. recently I completed splunk certification and I am facing challenges with efficiently importing and synchronizing large datasets.

Here are a few key points to consider:

  1. Data Volume: The datasets I'm dealing with are substantial, and I'm looking for recommendations on best practices for handling large volumes of data within Anaplan.
  2. Integration Tools: What integration tools have you found most effective for connecting external data sources to Anaplan? I'm exploring different options and would appreciate insights from the community.
  3. Performance Optimization: Are there specific techniques or strategies for optimizing performance during data imports? Any advice on improving the efficiency of data synchronization processes would be valuable.
  4. Error Handling: How do you manage errors during the data integration process? I'm interested in learning about robust error-handling practices to ensure data accuracy and integrity.
  5. Real-time Updates: Is it possible to achieve near real-time updates when integrating data into Anaplan? If so, what approaches or technologies have you successfully implemented?

Thank you in advance!


    • @emma02 : below are some considerations.
    1. Data volume: consider building a Datahub model and, if possible, put the Datahub model on a different workspace (also for performance considerations). Data Preparation it's quite important and try to have as clean data as possible before importing in Anaplan. If possible, try not to use Anaplan as a "cleaning data" tool, even Anplan has these capabilities. Even though Anaplan can handle quite a large amount of data, do not consider Anaplan as a Data Lake. Consider Anaplan as a multi-dimensional CPM/EPM application where the Data Lake/datawarehouse is a data source for Anaplan. However, the principles to import data into Anaplan are similar to the approach to how data is imported into a datawarehouse.
    2. Integration tools: I think one of the most robust (also in error handling) is Anaplan Connect. Just take into account that whatever external sources are, the import into Anaplan is made via a CSV/TXT file. All the integration tools connect to the data source and transform the source dataset into a CSV/TXT file in order to upload and import data into Anaplan. If the data source is a Cloud solution, you can consider CloudWorks. Except for CloudWorks, all the other integration tools need to be triggered outside of Anaplan. Consider organizing the import actions using the Anaplan processes. Usually, the import of data into Anaplan is just a first step that needs to trigger additional steps/imports in the Anaplan model.
    3. Performance optimization: follow the best practices of the building formulas in Anaplan. Take into account that Anaplan is calculating on-the-fly applying all the formula dependencies of the imported data. Consider creating incremental data mechanisms to decrease data imported volumes at a moment in time. Try to have a clear understanding of how Anaplan import actions work and make good use of the flags like "ALL Items" and "Matched Item"
    4. Error Handling: I think the trickiest errors to handle are the data rows discarded from an external file. This is where Anaplan Connect has a built-in functionality that can help a lot. The import errors or warnings do not leave any trace in the Anaplan history logs. Once data is in Anaplan, it can be built some integrity checks of data between imported data and masterdata. In order to be able to show these kinds of errors, the imported data need to be handled only as basic data types (like numbers and texts) but this adds the need to transform text into the other data types that can slightly decrease the performance. I think it is needed to understand how REST API works and what errors return the import actions. Just be aware that, for example, the "ignored cells" imported in Anaplan are not considered errors and cannot be very well traced.
    5. Real-Time updates: take into account that import actions, when launched in Anaplan, block the model for ALL the users until the import action finishes and all the formula dependencies are calculated. Consider periodically scheduled imports triggered outside Anaplan.

    Hope it helps