The Truth About Sparsity: Part 2

iStock-834501076.jpg

In part one I defined sparsity, shared how combined hierarchy lists have been a common technique to avoid it, and dispelled some of its associated myths. I also discussed some issues you will encounter when using combined hierarchies. In part two I will discuss the modeling considerations for handling sparsity and the best approach to take to deliver an efficient model.

Let’s start by looking at a few considerations of using multi-dimensional structures rather than combined hierarchies.

Simplicity

Quoting Tim Peters from the "Zen of Python," simple is better than complex. Anaplan is designed to be simple to use, so we should aim to keep the modeling experience as simple as we can. 

Consider the effort and complexity of creating a combined hierarchy list, mapping data to and from this list, and running actions to create and update the list anytime a valid combination changes (see Dynamic). Compare that with the simplicity of creating a multi-dimensional module. We should look to use native functionality wherever possible—my adage is model naturally.

Flexibility

Multi-dimensional modules are flexible. Using filters, end users can view all of the products sold by customers or all of the customers that sell each product, just by pivoting the data. 
Flexibility.png

However, when you create a combined hierarchy list, you have to decide which is the most common element that the users want or need. You are forcing the user to view the data in a defined format. To replicate the functionality from above, you will have to create a second combined hierarchy list to display the data in the other way. This means duplication of data and modules, leading to models that are larger and more complex.

2019-04-25_15-20-29.png

Efficiency

Anaplan is designed to work with large data sets in a multi-dimensional form. It is very efficient in calculating across common dimensions, especially where the order of dimensions is the same (see Best Practice for Dimension ordering for more detail). This is why we advocate the D.I.S.C.O. methodology (Best practice for Module design), which groups like structures together.

Moving away from this approach increases model inefficiency: combined hierarchy lists often require mapping to and from other parts of the model, leading to a greater number of calculations and decreased efficiency of the model as a whole. They are not natural structures. 

Size

In general, yes, multi-dimensional modules can be larger than those using combined hierarchy lists (although, in Part 1, I gave an example where this isn’t true). So it’s very likely that, where you have large data sets, you’ll not be able to have multi-dimensional modules everywhere (see Balanced Approach for further discussion). However, here are more of my adages: 

  • Sparse and fast is better than dense and slow.
  • Big and fast is better than small and slow.

One of my colleagues says, end users never complain about a model being too big, they complain about performance. As I mentioned earlier: small models can often be slower due to the calculation efficiencies mentioned above. 

From an end-user perspective, a fast model is better than a slow model. It’s better to have multiple models that are very fast and efficient than a single, smaller and inefficient model delivering a poor user experience through slow performance.

With Application Lifecycle Management (ALM) and workspace-to-workspace connectivity, it is now easier to manage and maintain models that are split than it was in the past. 

There are many other techniques that can optimize the speed of calculations (Reduce calculations for better performance) and affect the size of a model, but that’s not the focus of this article. Managing sparsity through combined hierarchies is one technique, but it shouldn’t be the default approach—explore other options first. I will discuss this further in the  Balanced Approach section below.

Easy to Maintain

With a multi-dimensional module, when a new item is added to or removed from a list (or subset), the module is automatically updated: there are no additional steps to take. However, with a combined hierarchy list, each time a new combination occurs or when a combination is no longer valid, an update process must run to ensure the list (or lists) are up to date.

This leads me to the last benefit of multi-dimensional modules: they are dynamic.

Dynamic

Anaplan is a very powerful modeling platform enabling users to plan and analyze data in real-time. Multi-dimensional modules enable elements of the calculations to be turned on and off with the selection of one checkbox. 

Consider this example: a planner has various promotions for a product which they want to model against different scenarios to see the effects.  

The modules are dimensioned by promoted products, scenarios, and time. Not all promotions are applicable for all scenarios, so there’s some sparsity in the structure. However, the results of the what-ifs can be re-calculated and viewed in real-time by turning them on or off, as desired.

Dynamic 1.png

Now consider the alternative where a combined hierarchy list is created: Scenarios > Promotions > Promoted Products

Dynamic 2.png

Yes, this will be smaller, but what happens when the user changes their mind and wants to add a promotion to, or remove a promotion from, the scenario? A user-driven action would have to be run to restructure the hierarchies every time. This would be very inefficient, disruptive for the end users, and lose the dynamism that makes Anaplan so special. 

Balanced Approach

Every circumstance and set of requirements is different, so the answer to the question about the best approach to use is often, “it depends.” I mentioned a balanced approach, and this is where judgment plays a part. 

Creating combined hierarchies to handle model size is okay for some calculations but is not appropriate for flexible and dynamic multi-dimensional modeling. Neither is having many large multi-dimensional modules that are difficult to navigate and make models unnecessarily large. 

Appropriate is an important word—remember, Anaplan can perform calculations using the appropriate data structures for different elements of the model—the same structures or hierarchies don’t need to be used throughout the entire model. 

We recommend using the D.I.S.C.O. methodology for building models (Best practice for module design), using different structures for inputs, calculations, and outputs. This is where the balanced approach comes in. You need to decide the most appropriate structure for input, calculation, and output modules, considering performance, model size, usability, and maintenance (PLANS - This is how we model).

It may be necessary to optimize a Calculation module using combined hierarchies (possibly with all summary options turned off). This could deliver a poor user experience, but if the combined hierarchy structure is fairly static (e.g. only updated at the start of the planning cycle) and if the end users are not expected to use this module on dashboards, then it is the correct approach to use; the result would be a small and efficient module. Sometimes, to deliver a flexible, intuitive end-user experience, you need to build input and output or reporting modules that are sparse and multi-dimensional. 

Booleans are your friend

I want to re-emphasize that Booleans are the most efficient format of data you can use. Computers are built on 1s and 0s, and Booleans take up 1/8th of numbers or text. Using Booleans in multi-dimensional structures is a very efficient way of modeling in a simple, flexible way without disproportionately increasing the size of the model.

Summary

To build a large model well, you will have to use a combination of the techniques described here and in part one.

Please pause and take a breath before modeling — don’t revert to the default behavior of eliminating sparsity at all cost. Model naturally, in line with Anaplan’s strengths. Consider flexibility; the end user experience; and the impact updates to structures using actions will have on this experience. Remember, it’s your end users who are using the model!

Hopefully, the points outlined above will help you make the best decisions when building the most efficient, flexible, and powerful model for your end users. Don’t be scared of multi-dimensionality— it’s Anaplan DNA!

Tagged:

Comments

  • Great post - the "include" configuration between scenarios and products is very powerful and gives end users much more flexibility. This method of boolean configurations can be applied across so many use cases (allowing users to toggle between formulas, customize their own hierarchical rollups quickly, etc.) and also facilitates faster buy-in from users accustomed to the unstructured nature of Excel, since they still maintain control over the model.

  • The More I started using Boolean in the way we are driving calculations, I can definitely see a great improvement in the way Models are performing.

     

    Very nice Article David.

  • @sunil_maddi 

    Thanks for the feedback - Excellent news.  Keep it up!

  • What a great article David!

    Indeed, we always hear about "killing sparsity at all price" but I think we sometimes just forget about the most important aim : the user experience.

    Many thanks for that 

  • @Romain_Colin 

    thank you

    That is exactly the point, so I'm glad that came across.  "it depends" is something I say so often and consideration of all aspects, pro and cons, are key to good design

    David