OEG Best Practice: Best practices for module design
Hopefully, by now you've heard of the D.I.S.C.O. methodology for model and module design, but do you fully understand how to put it into practice? D.I.S.C.O. is part of the wider P.L.A.N.S. standard for Anaplan model building and falls under the L for Logical. D.I.S.C.O. provides a logical structure for one to use when designing Anaplan models.
Before we dive deeper into the details, let me just take a moment to recap the P.L.A.N.S. standard.
P.L.A.N.S. stands for:
It is the standard by which all Anaplan models should be built. It goes deeper than best practices because these are principles that should be adopted throughout the whole model design and build. They are the cornerstones to the “way we build" and should be considered at all times throughout the project. However, there will be exceptions, so in a small number of cases it may be necessary to deviate from the principles. This is okay, but this should only be done after due consideration of all other options. The answer to many design questions is “it depends.” There will always be trade-offs and compromises, but considering the principles discussed below will help you present the pros and cons of the different approaches to your end users during the design phase.
So...D.I.S.C.O. You may have heard people talking about “D.I.S.C.O.ing” their models, so let’s have a look further into what this actually means.
At the heart of Anaplan models are line items, which contain the information needed within the model. All calculations are performed at the line item level. Modules are, in essence, containers to hold these line items. There are no performance implications for having a single module with thirty line items, or three modules containing ten line items. What is important is like-for-like structures; Anaplan works at its best when calculations are performed over the same dimensionality. While this is not possible all of the time, aligning the calculations as much as possible will optimize the performance of the model.
This is the essence of D.I.S.C.O. You should look to create groups of modules having a sole purpose and a common structure—a logical split of the information within the model. Each of these types have different characteristics, so combining all of the different types of structure into the same module makes it difficult to understand and can lead to duplication and inefficiencies within the calculations. A common saying is, “just because you can, doesn’t mean you should,” and this applies here. Yes, you can group all sorts of calculations in all sorts of modules; Anaplan will let you, but should you?
When we think about models, we often think of three main parts of the model: Inputs, Calculations, and Outputs. This is the flow of data from left to right, or top to bottom.
Users enter data, some calculations are performed, and results are output.
In most models, there is also a fourth element of data: “Where does the information start from?” Finally, there is a set of information holding the model together (the "glue"), also known as the System information.
So, D.I.S.C.O. stands for:
Before building the models, think about the structures needed within the model, and group them together within those five categories. These are not functional areas but are like sub-functional areas.
Let’s have a look into these in more detail:
Most models need to have data on which to perform calculations. Data modules support Inputs, Calculations, and Outputs. Data modules may contain the latest chart of account data, employee roster, list of opportunities or assignments, or transactional history.
It is unlikely end-users will be viewing the data directly (unless there is a requirement for transactional analysis), so one should consider the optimum dimensional structure to hold this data. Does it need to be user-friendly? Do we need summary totals? Do we actually need hierarchical structures? Does it even need to reside in the model itself? Although Data modules can reside in the model, it is best practice to have commonly used data and structures in a data hub. There are many advantages to having a data hub, as discussed in the article, "Data Hubs: Purpose and peak performance."
Don’t daisy chain
The data should be referenced wherever needed, but try to avoid referencing the data and then referencing the result. For example:
Let’s say we have a line item called Volumes and we feed this into an Input module for reference. We also reference Volumes in the Calculation modules and Output Reports.
You can see that the Volumes line item in the Outputs module is dependent on the Calculations module line item, which is in turn dependent on Input, and so on. It is a long dependency chain that has to be calculated in sequence.
Now, let’s consider the following:
You can see that all three downstream line items reference the Data module and can thus be calculated in parallel, reducing the dependency chain and maximizing the efficiency of the Anaplan calculation engine.
It should be noted here, that this is a guiding principle of efficient model design: store or calculate once and reference many times.
These modules are initial interaction for end users. The modules should be designed for ease of use and flexibility of calculations, and then optimized for use on dashboards. If possible, limit the number of “result” calculations within these modules. Sometimes there will be no alternative, but if you can present the results on the same dashboards but in a different module, that will be better. End-user flexibility may require multiple dimensions that are sparse. There will often be a trade-off between module size and user flexibility. On the one hand, combining dimensions will be more optimal in terms of size and calculation speed, but it can give the user a less intuitive experience. On the other hand, native multiple dimensions give the user ultimate flexibility and simplicity but have the downside of size and sparsity. Sometimes a flexible, multi-dimensional approach can make it frustrating for a user to navigate through multiple “unwanted or invalid” combinations. The effect of the latter can now be mitigated using Dynamic Cell Access, but (as mentioned previously) there is no real right or wrong answer—just alternatives. However, the structures you decide on for Inputs are not necessarily the same as those you will need for the Calculations (see below for more details).
The concept of system modules is probably the most unfamiliar to most. It is perhaps apt that these modules provide the S in D.I.S.C.O., located perfectly in the middle. System modules are the glue holding the model together. Earlier, I talked about how Anaplan calculations are optimized through similar dimensionality. That is all well and good, but what about when the dimensionality differs (which it inevitably will)? System modules hold the key to this transformation, providing the modules to map structures between one another. As with Data modules, System modules should be referenced by all other modules.
Without going into a lengthy debate on list.property, versus module.line item, the current best practice, wherever possible, is to use modules and line items in preference to List Properties, except when specific functionality dictates.
Examples of System modules are:
Create a System module for each key hierarchical list and create line items for the key attributes. These should include line items for all of the parents within the list. Consider the following formula buried in another module:
The latter is much simpler to understand. Whenever any static details are needed for P4 Products, the System module should be the place to look and reference.
These are some of the key modules that are needed to link, transform, and map data between modules. SUM and LOOKUP formulas should use these modules. It is very likely the same mappings will be needed time and again, so have the mapping in one place and reuse it.
Time settings modules
The first module you should build is a time module. Anything to do with time should reside in this module, or similar modules if there are multiple granularities of time within the model. It is very likely that within calculations you will need to know which periods are historic and which are future. You may need to filter out quarter totals, or only show the next three months or the last six months. If this information is contained in one place, it is easy to control and amend, and it increases efficiency.
It is also the key to not over calculating, as discussed in the article "Reduce Calculations for Better Performance."
It should go without saying that the fewer the number of calculations that need to be performed, the faster the model will be. Calculations should be performed once and referenced many times. This leads nicely onto the heart of the model: Calculations.
This is where the magic happens! Calculation modules are very different from Input modules. Users rarely need visibility into the detail of the calculations. These modules should be optimized for calculations. Two easy techniques are combining dimensions and turning off summary options whenever possible. In fact, when creating a calculation module, it is recommended that you turn off summaries whenever you add or create line items. Most of the time, calculations are done at the detail level and are referenced by other detailed items. You can read more about this in our article about reducing calculations.
When designing the model, think about the common calculation structures. This will help you define the calculation modules needed. It is likely you will need a number of calculations of a similar structure, but it is easier to audit and understand if these are broken up into separate modules
It is worth mentioning at this point that it is best practice to use subsidiary views sparingly. In modules where the majority of the line items are subsidiary views, this is very confusing for the user to understand what is actually being calculated. There are often groups of calculations with the same functionality that would be better suited in a module on their own. One point to re-emphasize is that there is no negative performance impact in moving these line items into a separate module. In fact, it is likely the calculations are repeated in other modules because there is no clear structure to hold the calculations, so separating the line items into separate modules is likely to benefit performance.
The article "Formula Structure for Performance" outlines the performance benefits of breaking up calculations.
There are also other benefits that fit into the P.L.A.N.S. standard. The calculations are more logical, they are easy to audit and understand, only the necessary calculations are created (no duplication), and finally, the calculations and model becomes more sustainable and easier to maintain: one place for calculations, one place to check, amend and add new ones.
Remember: calculate once, reference many times.
So, on to the right-hand side, or the bottom of the model map: the Outputs. Similar to Input modules, these are, for the most part, designed for end users to consume the calculations. To provide the best user experience, these modules may include the separate dimensions (that may have been combined in the Calculation modules). Using the optimization techniques above gives you the capacity to provide this flexibility. In an ideal world there should be no dataflows out of these modules; remember the daisy chain example from above. Output modules can also provide an export format to be consumed by other applications. In these cases, ensure that the structures are optimized for the specific requirement. Don’t include data that is not needed and make sure any filters set are efficient.
Here are a few final considerations:
Keep the module (and line items names) as short as practical; this will help keep formulas readable. Include alphanumeric, for ease of reference, for modules and hierarchical lists.
For current guidance on naming convention best practice, visit the Anapedia article on the subject.
Can you explain the module purpose or line item calculation in one short sentence? If not, consider splitting the formula into smaller pieces. In any case, use the Notes fields to explain the purpose and the ”why.” This will help others (and you later) to understand the calculation logic.
Below is a table summarizing the most likely (and ideal) characteristics of the different module types:
End User (View only)
End User (rarely)
Most Common Dimensionality
(Hierarchy + Line items)
Preferred Summary Options
Inputs Calculations Outputs
Adopting the D.I.S.C.O. methodology for your models should give you a well-designed model that is easy to follow, understand, and amend at a later date.
Author David Smith.