Module sparsity leading to size and performance impact. We are looking for alternative approach.

Question

Dear community,

I am having issues with modules I created to analyze the planning results. Due to the high sparsity of the data model, I get a lot of cell combinations, most of them empty but leading to high memory and performance impact. I will explain the model setup that leads to the issue, hoping someone knows how to enable the analysis without having the impact performance.

The model is used to assign Sales Representatives to Customers and Product Groups in the form of:

Customer A, Product Group 1 --> Sales Rep X

Customer A, Product Group 2 --> Sales Rep Y

Customer B, Product Group 1 --> Sales Rep X

Customer B, Product Group 2 --> Sales Rep Z

and so on.

The module 1 for the planning therefore applies to the lists Customer and Product Group and has the Sales Rep as a line item.

So far, so good. Now after the assignments have been done, we want to give the planners the possibility to show the workload per Sales Rep in the form of:

Sales Rep X, Customer A, Product Group 1 --> Sales Value, Count

Sales Rep X, Customer B, Product Group 1 --> Sales value, Count

Sales Rep Y, Customer A, Product Group 2 --> Sales value, Count

Sales Rep Z, Customer B, Product Group 2 --> Sales value, Count

and so on.

To enable this analysis, we had to create a module 2 which applies to the lists Sales Rep, Customer and Product Group and has the invoiced value as well as the Count (whether this combination is assigned into Module 1). Then we filtered the output of this module to all entries which have a count of greater than 0.

However, this leads to module 2 generating a lot of combinations which are empty, because of course not all Reps are serving all customers and not all custoers are offering all products.

In the example above, we have the following line counts:

Module 1: 2 Customers * 2 Products = 4 Cells

Module 2: 2 Customers * 2 Products * 2 Sales Reps = 8 Cells

Now in our case we have a lot of customers and Reps and also some more product groups, leading to the following numbers:

Module 1: 74 Mio. Cells

Module 2: 3,700 Mio. Cells

The model is now at about 5 billion cells in total, consuming about 20 GB mostly due to the high sparsity module 2 described above. This is dangerously close to the internal limits of Anaplan and also impacts performance.

I currently do not see a way to enable the analysis without having a module which multiplies all my lists producing high sparsity. Therfore I reach out to the community for alternative ideas.

Thanks for any suggestions!

gespet · Answer

Yes, I thought about this as a first step since the product group / customer assignment is quite stable. I think we would then loose the functionality of automatically filtering when selecting a customer or product group from the numbered list or being able to use customer or product group as a page selector. But this might be ok since we could work around this topic.

The bigger driver for the sparsity issue is the relation Customer / Sales Rep. We have 112,000 customers and 900 Reps. Each customer has 1 - 3 Sales Reps assigned only which would lead worst case to 300 k combinations but in the current setup already to more than 99,000 k.

But as written before, reducing the Product Group / Customer combinations will also make an impact and might on its own save us 1/3 of the current cell combinations.

rob_marshall · Answer

@gespet,

Maybe you can do a combination of what you already have as well as what @Michelle Sutherland stated by having the Products role up to Customers in a numbered list (which is defined by the porducts the customer actually has) and then dimensionalize that by the assignments.  In doing this, you will reduce the sparsity of not having all products intersecting all customers because not every customer will own every product.  So, instead of having all 3 lists in the dimension, you will only have the two.  This way, you will still have the assignment "reporting" be dynamic, but still lowering the overall size of the model/module.

Hope this helps,

Rob

gespet · Answer

Thanks for the suggestion. I already thought about this and I am also using Numbered Lists a lot in other models. The reason why I did not use it here is because in the current setup, module 2 is automatically updated whenever an assignment in module 1 is changed. With your solution, I would have to set up a process that either runs automatically each defined timeframe or give the users the possibility to execute it themselves after they performed the changes. I was hoping to keep this from them. But if this is the only way to keep the module size at an acceptable level, I will have to evaluate this for the next target setting round.