ECE - calculation issues when using sum
I've started to test the ECE, created a model with some very large lists for a price-volume-mix analysis, which involves sku x customer level detail, previously have had to fold a lot of dimensions for this.
To test, I created some very large modules (44 trillion cells) and it worked great - but when I started to do some calculations I started having performance issues - e.g. trying to do a sum formula my model crashed. Anyone else having issues with SUM or other functions?
Comments
-
How did you set up your lists? Not sure if you did this or not, but you don't have to set them up the same way you do in the current engine with the delimiters (CodeA_CodeB_CodeC). Instead, just a create a module dimensionalized by List A, LIst B, and List C. This may help in decreasing the size of the lists.
Hope this helps,
Rob
0 -
Hi Rob
I'm not sure this is the issue. I have a module that is dimensionalized by two lists - SKU (150k members) and Customer (18k members), plus two line items (will need to add more though). The lists are both flat lists (for simplicity, for now), not numbered, and loaded like this: Name= Code_Description, Parent: none, Code: Code . When I created the module, it created fine (large, but ECE worked). Then I went to bring in sales by customer x sku from an invoice staging module, and the sum in that formula is what crashed my model and I had to have it rolled back. Is there anything I can do to prevent this?
Thanks
Aly
1 -
what's the dimensionality of the target module?
David
0 -
The target module is SKU and Customer. The source module is Transaction ID, which includes line items for customer and sku.
Thanks
Aly
1 -
I have spoke with a number of the team on this, and think it would be good to make clear some of the issues you have run into here Aly:
Currently x: [SUM: y] is functioning correctly in ECE, but it is not yet optimized to work sparsely. This is why you saw the very slow performance you did with SUM. This highlights a difference between sparse storage and sparse calculation. What you can see at the moment is that ECE has sparse storage in that the sparse data you have in a 44 quadrillion cell Line Item is stored very efficiently. ECE is intended to store and calculate natively sparse data efficiently, however at the implementation of SUM does not take into account that most of the cells aren’t populated. That means the SUM calculation is actually iterating over 44 quadrillion cells rather than just the populated ones.
We plan to re-work the implementation of SUM in ECE to optimize the calculation for sparsity. We expect that to make a very significant difference in the scenario you are attempting, but this will take some time. We are going to re-prioritise this though, as you and others have hit this (this is exactly the kind of things we want to find in Alpha), and as a result we think this is more important than some of the other performance and functionality work that we had planned. This will take some time to implement though - we can update here when we have more info on timescales.
(By the way - it is worth mentioning that the dimensionality is 44 quadrillion and not 44 trillion as you said above - this is probably because the blueprint view column width is limited, so you can’t see the size of the number - something that we are seeking to fix in time)
5 -
out of interest, how does it perform if you import from the source module to the target, rather than use the SUM?
this highlights the differences of approach to housing data - we could import directly into a large multi-dimensional module
David
2 -
I haven't tried it yet, but that would be the best workaround - I've not had any problems with data imports so far so it should work. It just means that there's an extra step in the ETL process to bring data into Anaplan, as the source data is not structured that way. Thanks for the suggestion!
1 -
thanks - let us know how you get on.
We do expect large SUMs to be used, but in some (??) cases, a direct import might be the most appropriate structure for the new engine.
Part of the Alpha program will be to highlight differences in approach, tips and tricks, pros and cons for different approaches etc,
Thanks for the ongoing participation; it is so valuable
David
1 -
I was also one of the users who encountered the challenge with the SUM formula. The second formula I tried within my highly dimensioned module included a Boolean logic where an ITEM formula was nested within a PARENT formula. It got hung up similar to the SUM formula.
For me, there will be two primary functions I would see as ideal...
(1) The ability to use SUM calcs. SUM calcs will be the most common way I would engage having this much dimensionality. As a user, I would want to upload a raw data table and then assign and aggregate dimensionality to it via a system module
(2) I would want to use logic and DCA to control what users see and don't see. It would need to be done with logic rather than running imports and such and will include with high level of dimensionality.
Thanks. This is an exciting direction for the product and I'm hoping the formulas component gets worked out.
0 -
Hi
If you split the ITEM out from the parent formula, do you still see the same issue?
Also, I'm interested in "I would want to upload a raw data table and then assign and aggregate dimensionality to it via a system module" - Is it not an option to import it directly into a highly dimensioned data module?
The raw data table is how we currently have to support that level of data but ECE will allow a different approach.
Cheers
David
1 -
Good point ECE will be a big shift in thinking. I can't wait we have had to do some serious model gymnastics to get some driver calculations working with a 5 dimensional concatenated list at Orkla. It would be easy to do in ECE.
0 -
Hi David,
Let me address your questions...
Q - If you split the ITEM out from the parent formula, do you still see the same issue?
A - Yes, I just tested and am running into the same issue. It's been churning for over 45 minutes.
Q - Is it not an option to import it directly into a highly dimensioned data module?
A - To a degree, yes... but the data set is incomplete.. The data will not come out of our source system with all of the required attributes/dimensionality and, ideally, we would assign these using a SYSMOD combined with a SUM formula. For example, my source data will provide the basics like Time, Amount, Company Code, Cost Center and Account but would not have characteristics like Function and several other company-specific ways in which we dimension our data. The dimensionality would allows users to perform an "any by any" kind of query where they can set the criteria for each dimension. It also allows us to set up any report fairly easily and these reports can take advantage of level-selection.
I'm guessing that the issue is the zeroes are all "counted" when formulas come into play and it is too much to process.
Hopefully that helps. Happy to connect if you'd like to discuss. I meet with one of the engineers tomorrow.
Regards,
Brad
1 -
A workaround would be to load into a transactional table so validation and error correction could be performed then have a process to load the data into a multi-dimensional module from the transactional one rather than using sum
0 -
Hey Brad. I've restored your model to the previous state. Looking forward to chatting to you tomorrow.0
-
For the benefit the rest of the group, this plays into the following best practice of defining the formula at the lowest level of dimensionality as appropriate
Formulas using ITEM will only recalculate on the list in question, so that part of the formula should be housed in a module only dimensioned by the list
the following article describes this; it talks about text joins, but the same applies for Boolean conditions
https://community.anaplan.com/t5/Best-Practices/Formula-Optimization-in-Anaplan/ta-p/41663
David
0 -
@aly.mccourt - and others in this thread (incl @CommunityMember113856)
Please note @jason.**** 's post here: https://community.anaplan.com/t5/Enhanced-Calc-Engine-Alpha/ECE-updates/ba-p/87547 which mentions an update we have made to the ECE Alpha today. We have updated the Alpha environment with a new implementation of SUM which should work much more efficiently for SUM with large target dimensionality. You shouldn't need to do anything different to what you were trying before (which was locking up). Please let us know how it goes.
Dave.
1