Data Integration - cheat sheet / decision tree
I went through the various materials available on the topic of Data integration and completed the related courses.
Nevertheless, with minimum practical experience in the field of data integration, I still feel a bit lost.
What I am looking for is some kind of summary of plus and cons of the individual data integration methods, or some kind of decision
tree to be able to select the best integration option for the customer.
Do you guys have some materials on this topic in a structured form? If not, it'd be great if you just outlined your approach on how
do you approach this topic when engaging with client practically.
Thank you so much!
You know, the more I work with Anaplan, the more I lean towards building out a center of excellence, especially for matters related to data integration.
If for no other reason than to keep your data hub clean and the interactions between the data hub and spoke applications efficient. These can get out of control very fast without a strategy.
I think the first conversation I would have with the client is to talk about how to build a best practice data hub.
Start with @rob_marshall brilliant best practice post on data hub performance .
Data integration will be a natural evolution to the discussion and that might be when you introduce the idea of a center of excellence.
Start with manual data integration. As Anaplan likes to say, manually loading data will not stop your project but bad data or problems with data integration will.
Just have a strategy. Plenty of good best practices out there.
@szechovsky let's keep the conversation going. You will find some of the brightest minds are on this Community Site. @jnoone, @ben_speight, @alexpavel, @scott.smith, and @kavinkumar are data integration pros and consistently get me out of a tight spot with data integration. Search on their names to see the articles and posts they've written. @jesse_wilson and @chase.hippen are Python pros. Read their best practice articles when you're ready.
When you're ready for a checklist for study let us know. I can drop about 20 links for you to get you going.0
thanks for your quick reply. Can't agree more that something like CoE would be an appropriate way to help Anaplanners tackle topics like this.
In the meanwhile, however, I'll check your link on DataHub and will be looking forward for responses from the data integration pros you mentioned.
I'm also happy to check the links you proposed to drop here. Please, go ahead. 🙂
Here's a reading list to get you on your way.
There's a lot to learn - so please, rely heavily on this Community site for nuances and any challenges you face. It's truly a gift that so many people here are willing to help.
On Demand Videos
As you begin to practice, pay particular attention to imports and exports. They're nuanced especially when you are required to manually create the chunks. Practice these until you get them right. For a benchmark, I probably invested 20-30 hours each really getting the hang of importing and exporting. Try different strategies like using basic authentication to start then start using a certificate. The certificate is the right way in my opinion because you don't have to worry about userid's and expiring passwords.
- Introduction to data integration
- Data Integration Basics (303)
- Anaplan Connect (301)
- Hyperconnect (if you use Informatica) Great Video
From there you should start learning how to use the API. This is the fundamental logic all non-manual data integration methods use.
- Start by learning Postman - it's a free download and it will help you learn how to use API's without having to do any programming.
- @Jason_C wrote examples for Postman in this post. He's brilliant - I used these to learn myself and it was perfect.
- Once you master Anaplan connect and Postman, it's time to move on to some really fun api work with Python. I would start with this post by @chase.hippen. This has to be one of the best Python posts out there. I pay homage to Chase every day I use Python.
The third thing I would recommend is to then follow the Master Anaplanner Coursework on data integration. Click on this link and scroll down, you'll find 10 links to the data integration topics that all master anaplanners must understand. Some of them use ETL tools that are hard to come by but inside the user's guide are some amazing tips on how to leverage the API's. So it's definitely worth it!
Lastly, use this Community site. Most data integration topics have already been answered - so you can start by searching this site. But if you're in a hurry, or you just can't find the topic your looking for, ask! You'll probably get 3-4 answers in the first hour! If you discover anything new or something you think others would benefit from knowing, post it! You'll get tons of kuddos.1
From my experience Anaplan data integrations is categorised into four main categories
Manual :- This is where users will upload / download data into and from Anaplan manually via a dashboard button aur via anaplan provided excel add ins
Anaplan Connect :- It uses windows batch files for uploading downloading data normally automated using windows scheduler .
Third Party Connectors :- Multiple connectors available like mulesoft connector , informatica connectors etc.
RESTful API :- Anaplan has surfaced API points which can be used to import / export data in and out of Anaplan .
In terms of choosing which method :- It depends on many things like customers capabilities , scope, budget, scalability requirements, security policies etc. For eg. if your use case is working with multiple models and lots of data points segregated geographically ,things like manual or anaplan connect are not the best options . On the other hand if your use case is simple but your company policy mandates that it has to be through let say informatica then you have to use informatica connector or RestAPI. If you are doing a Proof of concept (POC) then its quicker to do manual etc.
It is not an easy question to answer but from my exp
- I always prefer to go automated rather than manual .
- prefer to do most of my data transformations outside anaplan
- where applicable , always try to have an ETL later before anything goes into or outside anaplan
- Always try to have Single source of truth for data coming in and going out
API is picking up a lot as it can be used by many tools and even programming scripts like python etc. With every thing moving to cloud , Use of API's is becoming very common .
hope this helps1
Here is a diagram I put together for basic Integration decision making. Hope it helps0