Unlocking High-Efficiency Data Uploads in Anaplan: A Multithreading Approach

AnaplanOEG
edited March 8 in Best Practices

In cloud-based business planning, the ability to swiftly and effectively upload large datasets is an important aspect of maintaining up-to-date planning. This post introduces the concept of multithreading to enhance data upload speeds to Anaplan significantly. This approach, however, is specifically tailored for data uploads and does not extend to the execution of import actions.

Optimizing Performance

The essence of this initiative is to expedite uploads in a smart and efficient manner. By adhering to proven best practices, we achieve quicker data uploads while ensuring resources are used optimally. The strategies employed include:

  • Multithreading: This technique enables the simultaneous parallel upload of data chunks, significantly boosting upload speeds.
  • Data Chunking: The Anaplan API supports data chunks from 1 MB to 50 MB. We segment the upload dataset into manageable chunks within this spectrum, facilitating smoother uploads.
  • Compression: Post-segmentation, these chunks are compressed using GZip. This step substantially reduces data size, further enhancing the upload pace.

These methodologies are crucial in optimizing the upload process, ensuring the rapid and efficient transfer of extensive datasets to Anaplan.

The settings.json file

This project has a settings.json file, which contains key configurations necessary for customizing and optimizing the upload process. Highlights of this configuration include:

  • Workspace & Model IDs: Specify workspaceId and modelId to direct data to the correct location in Anaplan.
  • Authentication Mode: Options include Basic, Device-based OAuth, or Cert Auth, with a strong recommendation for OAuth for enhanced security. Refer to our guide on Device-based OAuth for more details.
  • Thread Count: Set the number of concurrent uploads with threadCount. Given Anaplan's limit of 200 concurrent threads per tenant, a cap of 199 is advised to reserve a thread to refresh the access token.
  • Compression Toggle: compressUploadChunks allows users to witness the impact of compression on upload speed.
  • Upload Chunk Size: Determine the size of upload chunks in megabytes through uploadChunkSizeMb.
  • REST API Retries: retryCount ensures API resilience by retrying failed API calls up to the specified limit.
  • API Base URIs: uris configuration facilitates compatibility with new Anaplan data centers.

Benchmarking

In benchmark comparisons with Anaplan Connect, specifically using a 500MB CSV file, the results are compelling, revealing a threefold increase in upload speeds, primarily due to the implementation of multithreading. Additionally, the table underscores the considerable performance enhancement brought about by compression.

Tool

Chunk Size

Upload Time

Compression

Threads

Anaplan Connect

10mb

32 seconds

On (Fixed)

1 (Fixed)

Multithreading Python Project

10mb

12 seconds

On

10

Multithreading Python Project

10mb

175 seconds

Off

10

Dive Deeper

For those keen on implementing this solution, visit the GitHub repository: anaplan-multithreading-example. Here, you'll find the complete project alongside a detailed README for setup instructions. Your feedback and contributions are highly welcomed!

Important Considerations

While we are excited about the potential of this project, it's crucial to acknowledge that Anaplan does not officially support this code. Users should understand that, although highly beneficial, the project operates independently of Anaplan's support framework.

Beyond Uploads

Though focused on uploads, the principles and methodologies applied here can similarly enhance the download process following an Anaplan Export Action. By adopting a parallel approach to data handling, both uploads and downloads can achieve substantial speed improvements.

Author: Quin Eddy, @QuinE - Director of Data Integration, Operational Excellence Group (OEG)

Comments

  • Awesome!!! This would definitely help in improving the overall data load process for organizations with larger data set.

  • Really great example on efficient data load to Anaplan!