Clearing/ updating old records while uploading data into Anaplan without knowing what to clear?

Hi All,

 

I have an issue which needs to be addressed for a client. I do have a solution however would like to reach out to amazing community and get a different viewpoint on this if possible .  

 

Problem statement : we need to clear an item from a module however there is no simple way to identify the item to clear

 

Below is a screenshot of what we get let say today  ( I have also attached an excel version) 

 

Import file :- This is the file we get from source system , It goes into data staging module . in Datahub

Data staging :- In this module we create an id concatenation of customer store product date  and price as below (i know price should not be part of id however there is no other way to achieve uniqueness as price changes within  day and we need to capture QTY / price ) 

Data final :- This is a list where we load the ID's . So essentially we keep history of old records . We do have processes to remove old records however lets not go there ( in spoke model) 

 

clipboard_image_2.png

 

Tomorrow we receive below 

Please notice that #3 transaction has updated price so think of this as yesterday there was an error and today its fixed . We cant have a flag which says what has been updated or ref to old record by IT in the file . 

 

clipboard_image_6.png

 

Issue :- We need to some how clear the transaction which was loaded yesterday i.e 

A_1_Y_4/12/2019_30

 

Solution : 

 

clipboard_image_5.png

 

Any suggestion ideas will be really appreciated . 

 

Karan

 

Best Answer

  • I think this post has gone onto a different tangent . I will mark this as close for now . 

     

    I do want to share a sentiment though .Best practices are Best practices and we should stick to them wherever possible to get the best outcome . But guess what ? the world is not perfect . As a consultant i can advise on best practices but i cant impose best practice as a solution .

     

    Thanks for everyone's input  

Answers

  • @karank ,

     

    Crazy question, but why do you need to the history of what was loaded on the day?  If the values have been updated, then why is the history important?  Honestly, not only should you not be concatenating the transactional data with the Customer_Store and Product Date, but really shouldn't be concatenating the Product Date with the Customer_Store as this inflates your list leading to poor performance. Instead, create a module with the Customer_Store list by Time at the day level with two line items: Price and Quantity.  By doing it this way, not only will your model be smaller (because the Customer_Store list will be way smaller), but the model will perform better because the data is dimensionalized instead of a long list.  Additionally, you wouldn't have to delete old data.  

     

    As for the delta load, you actually don't really need IT to give that to you (although that would be the best route), but you can create three new line items in the above module:

    • Previous Amount
    • Previous QTY
    • Load? which is a boolean having this formula: Amount<>Pervious Amount or QTY <> Previous QTY

    The process to load data would then have the following actions:

    • Copy Amount to Previous Amount and QTY to Previous QTY where the Load? line item is true
    • Load the new data to the module

    Now, you know exactly what has changed and can use this to load to your spoke model.

     

    Hope this helps,

     

    Rob

  • @karank,

    Just a side note.

    @rob_marshall makes an informed point about dimensionalizing time. I have implemented Anaplan for some retailers with very large product lists and once I read @DavidSmith truth about sparsity it changed a lot for my clients, not least of which is a faster performing model.

    And of course, @rob_marshall data hub purpose and peak performance. Both posts will serve you well - and probably answer your question above.

    Check it out!

  •  Hi @rob_marshall 

     

    Let me give you some context 

     

    Data in question is by store by group( customer in this case ) by PRODUCT by DATE ( its not product date  but PRODUCT and DATE) . 

    In a day a store can sell  10 quantities of product A at price 20 let say till 1PM . Now after 1PM the price goes up to 30 because some cap was exhausted so we need to know the quantities sold at that price . Group by price and quantities is performed in data integration layer 

    data file coming to Anaplan today is as follows 

     

    #1 date 4/12/19 customer A store 1 product X price 30 quantities 10 (please note the # list id, this is generated by data integration layer )

    #2 date 4/12/19 customer A store 1 product X price 32 quantities 20 (please note the # list id, this is generated by data integration layer )

     

    Tomorrow store realises that 32 price was incorrect it should have been 33 so they resend this data to us as follows . 

    #1 date 4/12/19 customer A store 1 product X price 30 quantities 10

    #2 date 4/12/19 customer A store 1 product X price 33 quantities 20

     

     "Customer_Store list by Time at the day level with two line items: Price and Quantity" will not work .  This is not unique i need to store quantities by both prices in this solution it will override

     

    Previous Amount
    Previous QTY
    Load? which is a boolean having this formula: Amount<>Pervious Amount or QTY <> Previous QTY

    This will also not work because i do not have previous values  notice the numbered list id's 

    today 

     

    #1 date 4/12/19 customer A store 1 product X price 30 quantities 10 (please note the # list id, this is generated by data integration layer )

    #2 date 4/12/19 customer A store 1 product X price 32 quantities 20 (please note the # list id, this is generated by data integration layer )

     

    tomorrow 

    #1 date 4/12/19 customer A store 1 product X price 30 quantities 10

    #2 date 4/12/19 customer A store 1 product X price 33 quantities 20

     

    If i snapshot price and quantity and compare it will not work for scenarios where i have received a completely different line . I cant change this to non numbered list as price cannot be a list.  it can change to anything.

     

    Only thing i can think of is if we get the lowest level from data warehouse and do the group by in Anaplan  but that is lot of data and i want to avoid this if possible. 

     

    I hope i have made the problem statement clearer 

  • Thanks @JaredDolich 

     

    I agree with you re. @david and @rob's article on sparsity and data hub respectively.  Indeed very informative and insightful  .

  • @karank ,

     

    I would think there has to be something in the source system which distinguishes the "morning" numbers from the afternoon.  How does the source system deal with this?

     

    Rob

  • If i go to source system ID level data volume goes 10 folds besides i dont need it at that level . To avoid it we did group by price in data integration layer . Upside -data volume reduced a lot but downside i dont have a uniqueID .

    Surely i am not the first one to face this . There has to be a better way of doing this .
  • @karank ,

     

    I wasn't actually advocating bringing in the system level ID data, it was a question of how the source system makes both records unique, the morning prices vs. the afternoon prices and whether you can use that instead of the Date and transactional data.

  • Source system is a transnational system so i would assume it will have some sort of unique transaction Id . May be below should clarify things further . 

     

    Yellow transactions are the ones which changed/updated  price .You could argue how can the transaction price change but that another story which doesn't belong on this forum . 

     

    clipboard_image_0.png

     

     

  • @rob_marshall 

    I think for the source system to make yesterday vs today records unique, it will still need date and transactional data to form surrogate key.  In that case, it will be the same as Anaplan creating a combined customer, product, date and transactional data.

  • @andre.lie ,

     

    Obviously, I don't know the source system and how it is structured, but in my 25+ years of database design and having worked with PeopleSoft, SAP BW, SAP HANA, Remedy, JD Edwards, and others (I am showing my age here), having the transactional data being part of the foreign key is not considered best practice.

  • @rob_marshall,

     

    I did not mean to doubt your thought or experience that it is not a best practice to include transaction data as part of key in source system.  I thought that it might be a feasible workaround in Anaplan if anything else could not be changed.