Group by/sum alongside multiple attributes
I'm facing the below problem (probably there is an easy solution, however I'm too much immersed into it that prevents me to find it). 🙂
I have an excel database where I have several attributes and a value in the columns. I created a unique key manually in order to upload it into Anaplan. So now I have in Anaplan exactly what I have in the excel. I made some transformations (i.e.: in the excel I have the lowest cost element level, while in Anaplan I need a few lever higher cost element category), up till now it was an easy task.
Now, I would like to group/sum the values from some of the lines where all the attributes are equal (i.e.: values in column A/B/C/D are the same for each lines) and leave the rest as it is.
I tried to create a unique key by concatenating the different attributes into one field, then putting them into a list in order to make the aggregation possible using the 'sum' function. My issue is that the key created like this is quite long, thus I cannot create a list from this key and I cannot figure out any other solution to do this type of aggregation.
I attached a simplified example, maybe it's more talkative than my explanation above.
Any help would be highly appreciated!
In your excel, create a column to indicate row number and use it to create a number list with the row number as code. Then load the data into your module. Once loaded, I think you need to split the unique key into two lists.
Create two new line items Text 1 and Text 2 in the module. Text 1 is concatenation of column 1-4 and Text 2 of column 5-9, for example. Use these line items to create List 1 and List 2. Once created, add additional two list formatted line items 'Text 1 List' formatted as List 1, 'Text 2 List' as List 2. Finally, use SUM and LOOKUP to get the group total line item. You can then use ISFIRSTOCCURRENCE function to display the valid rows
This is a useful short video about creating Codes for attributes
Let me start by asking ...Have you created lists in Anaplan or you just uploaded the data in the excel sheet in a module in Anaplan?0
Well, I'd say partially: still in the excel, I created a unique key for all the lines, then I created a list in Anaplan, however I uploaded the data from the excel into a module where I have the properties (columns) as line items and having as dimension only this list.
Afterwards, some of the uploaded properties are converted into list formatted line items (e.g.: cost centers, cost elements, posting time scope), the rest remain text formatted.0
Thank you for the info. Let's think about this in a multidimensional way. First, we need to differentiate between structural data (metadata like Cost Center) and attributes.
At first glance, one would assume that you have the following entities or metadata (your A/B/C/D columns as you mentioned)
A. Cost Centers
B. Cost Element
D. DocuHeader Text(??) what is that? is it for formatting?
What's the significance of NameOffsetAccount and Assignment?0
In these respects, I'd say all the columns are metadata/structural data. NameOffsetAcc and Assignment are also properties that the customer wants to see (also the DocuHeaderText). So, none of them are for formatting purposes, but all of them keep useful information for the end users.
Let's assume that each line refers to an invoice, and the different columns are storing data from the invoice (CC, CE where it has been posted, vendor, purchase order number, etc.), but there are for example accruals where multiple lines can be grouped into one line, because all the properties are the same for them.0
I highly doubt that all the columns are structural data 🙂
I'll give you the short answer but I still would like to help you change the way you are approaching this.
The "Group By or Sum" as you asked, in Anaplan is achieved by using the SUM function.
Please read about it here.
You will need to remove the date from the Key you generated along with anything that is an attribute and load this transactional data in a module dimensioned by time.
Then create your summary module (where you will have the summed data) dimensioned by time as well as any item we identified as a structure or metadata (you will have to create and populate these lists in advance)
Then use the SUM function as described in the Anapedia description above.
Back to the way you are setting the data ...
I see that data in columns such as Name, NameOffsetAccount, and Assignment (Columns G/H/I) are most likely attributes; meaning that for each "Time Period, Cost Center, Cost Element L5, Vendor, and DocuHeader Text" combination you will have 1 value for these columns. That means they are attributes, not "Structures", and don't need to be represented in a list and become a separate dimension.
Does this look like a statement that describes the use case?
We have different Cost Centers each has the same (or almost the same) list of Cost Elements.
We also have a list of vendors who submit invoices for different time periods.
We need to summarize the "value" or the amount in these invoices by Cost Center, Cost Element, Vendor and DocuHeader text (whatever that is)
As is you have 4 dimensions/lists not including time. That is a lot
Question .... are vendors expected to submit invoices under any Cost Center and Cost Elements or there is a different group of vendors for Utilities other for maintenance,... etc0
And this article of why removing Time from the code is so important:
Actually, that’s the way how I finally solved my problem. 🙂
Unfortunatelly, I’ve spent hours on it and see your post just after it. Nevertheless, the point is that it works now. 🙂
And this one also looks to be a possible solution. I guess, it works even if I split the unique key into 2...3... lists? Because some of the fields in the columns might have data up to 50 characters (as far as I know the maximum length of a list item is 60 characters).0
Yes, indeed. However, the principle itself is clear, but the source data includes cca. 8000 lines/month, that I can reduce to 6500 lines by doing the groupping that was in the original post. The problem is that in each and every month there are maximum 5-10 lines that are identical and repeating, therefore by adding time as a dimension, I’ll have a lot of empty cells (every month cca. 6500 new empty cells).0