The Truth About Sparsity: Part 2
In part one I defined sparsity, shared how combined hierarchy lists have been a common technique to avoid it, and dispelled some of its associated myths. I also discussed some issues you will encounter when using combined hierarchies. In part two I will discuss the modeling considerations for handling sparsity and the best approach to take to deliver an efficient model.
Let’s start by looking at a few considerations of using multi-dimensional structures rather than combined hierarchies.
Quoting Tim Peters from the "Zen of Python," simple is better than complex. Anaplan is designed to be simple to use, so we should aim to keep the modeling experience as simple as we can.
Consider the effort and complexity of creating a combined hierarchy list, mapping data to and from this list, and running actions to create and update the list anytime a valid combination changes (see Dynamic). Compare that with the simplicity of creating a multi-dimensional module. We should look to use native functionality wherever possible—my adage is model naturally.
Multi-dimensional modules are flexible. Using filters, end users can view all of the products sold by customers or all of the customers that sell each product, just by pivoting the data.
However, when you create a combined hierarchy list, you have to decide which is the most common element that the users want or need. You are forcing the user to view the data in a defined format. To replicate the functionality from above, you will have to create a second combined hierarchy list to display the data in the other way. This means duplication of data and modules, leading to models that are larger and more complex.
Anaplan is designed to work with large data sets in a multi-dimensional form. It is very efficient in calculating across common dimensions, especially where the order of dimensions is the same (see Best Practice for Dimension ordering for more detail). This is why we advocate the D.I.S.C.O. methodology (Best practice for Module design), which groups like structures together.
Moving away from this approach increases model inefficiency: combined hierarchy lists often require mapping to and from other parts of the model, leading to a greater number of calculations and decreased efficiency of the model as a whole. They are not natural structures.
In general, yes, multi-dimensional modules can be larger than those using combined hierarchy lists (although, in Part 1, I gave an example where this isn’t true). So it’s very likely that, where you have large data sets, you’ll not be able to have multi-dimensional modules everywhere (see Balanced Approach for further discussion). However, here are more of my adages:
- Sparse and fast is better than dense and slow.
- Big and fast is better than small and slow.
One of my colleagues says, end users never complain about a model being too big, they complain about performance. As I mentioned earlier: small models can often be slower due to the calculation efficiencies mentioned above.
From an end-user perspective, a fast model is better than a slow model. It’s better to have multiple models that are very fast and efficient than a single, smaller and inefficient model delivering a poor user experience through slow performance.
With Application Lifecycle Management (ALM) and workspace-to-workspace connectivity, it is now easier to manage and maintain models that are split than it was in the past.
There are many other techniques that can optimize the speed of calculations (Reduce calculations for better performance) and affect the size of a model, but that’s not the focus of this article. Managing sparsity through combined hierarchies is one technique, but it shouldn’t be the default approach—explore other options first. I will discuss this further in the Balanced Approach section below.
Easy to Maintain
With a multi-dimensional module, when a new item is added to or removed from a list (or subset), the module is automatically updated: there are no additional steps to take. However, with a combined hierarchy list, each time a new combination occurs or when a combination is no longer valid, an update process must run to ensure the list (or lists) are up to date.
This leads me to the last benefit of multi-dimensional modules: they are dynamic.
Anaplan is a very powerful modeling platform enabling users to plan and analyze data in real-time. Multi-dimensional modules enable elements of the calculations to be turned on and off with the selection of one checkbox.
Consider this example: a planner has various promotions for a product which they want to model against different scenarios to see the effects.
The modules are dimensioned by promoted products, scenarios, and time. Not all promotions are applicable for all scenarios, so there’s some sparsity in the structure. However, the results of the what-ifs can be re-calculated and viewed in real-time by turning them on or off, as desired.
Now consider the alternative where a combined hierarchy list is created: Scenarios > Promotions > Promoted Products
Yes, this will be smaller, but what happens when the user changes their mind and wants to add a promotion to, or remove a promotion from, the scenario? A user-driven action would have to be run to restructure the hierarchies every time. This would be very inefficient, disruptive for the end users, and lose the dynamism that makes Anaplan so special.
Every circumstance and set of requirements is different, so the answer to the question about the best approach to use is often, “it depends.” I mentioned a balanced approach, and this is where judgment plays a part.
Creating combined hierarchies to handle model size is okay for some calculations but is not appropriate for flexible and dynamic multi-dimensional modeling. Neither is having many large multi-dimensional modules that are difficult to navigate and make models unnecessarily large.
Appropriate is an important word—remember, Anaplan can perform calculations using the appropriate data structures for different elements of the model—the same structures or hierarchies don’t need to be used throughout the entire model.
We recommend using the D.I.S.C.O. methodology for building models (Best practice for module design), using different structures for inputs, calculations, and outputs. This is where the balanced approach comes in. You need to decide the most appropriate structure for input, calculation, and output modules, considering performance, model size, usability, and maintenance (PLANS - This is how we model).
It may be necessary to optimize a Calculation module using combined hierarchies (possibly with all summary options turned off). This could deliver a poor user experience, but if the combined hierarchy structure is fairly static (e.g. only updated at the start of the planning cycle) and if the end users are not expected to use this module on dashboards, then it is the correct approach to use; the result would be a small and efficient module. Sometimes, to deliver a flexible, intuitive end-user experience, you need to build input and output or reporting modules that are sparse and multi-dimensional.
Booleans are your friend
I want to re-emphasize that Booleans are the most efficient format of data you can use. Computers are built on 1s and 0s, and Booleans take up 1/8th of numbers or text. Using Booleans in multi-dimensional structures is a very efficient way of modeling in a simple, flexible way without disproportionately increasing the size of the model.
To build a large model well, you will have to use a combination of the techniques described here and in part one.
Please pause and take a breath before modeling — don’t revert to the default behavior of eliminating sparsity at all cost. Model naturally, in line with Anaplan’s strengths. Consider flexibility; the end user experience; and the impact updates to structures using actions will have on this experience. Remember, it’s your end users who are using the model!
Hopefully, the points outlined above will help you make the best decisions when building the most efficient, flexible, and powerful model for your end users. Don’t be scared of multi-dimensionality— it’s Anaplan DNA!