OEG Best Practice: Inside the Hyperblock
- What are blocks?
- How blocks are calculated – General Lists
- How blocks are calculated – Native Time/Native Versions
- Hyperblock performance
- Summary block counts
- Should I worry about blocks?
- A reason why we shouldn’t focus on just the block count
- Focus Points
We recommend this article to Experienced Model Builders and Solutions architects.
What are blocks?
The Hyperblock is an in-memory calculation engine that can index and understand the dependencies between the model objects and calculations, based on the connections they share. When a user enters or changes a value, the Hyperblock understands which calculations need to be updated and in what sequence. In addition, the Hyperblock can perform calculations in parallel, meaning large, complex calculations can be performed within seconds.
The Hyperblock is built up from individual blocks, each of which contains the cells for one combination of custom lists, time periods, and/or versions. Each block can contain up to 2,147,483,647 (2^31 -1) cells at the most detailed level.
Cell count limit on line item blocks - Anaplan Technical Documentation.
How blocks are calculated – General Lists
Within the Hyperblock all calculations are done in blocks, at the line item level. Think of blocks as levels in a hierarchy where all detailed (lowest level) list members form one block, while summary/aggregated levels form their own block, individually.
For example, if you have a line item dimensionalized by a list consisting of seven levels, all members at the most detailed level (L7) form a single block, all members in L6 form their own individual block, all the members in L5 form their own individual block, and so on up to the very top level member.
Level 7 block: Total blocks = 1
Level 6 block: Total blocks = 131
Level 2 block: Total blocks = 9
As you go up the hierarchy, you can count the total number of blocks created by the line item.
When you combine multiple lists together in a line item (the Applies To is L7 Cost Center, L4 Geographies), it works in the exact same manner where the detailed members of both L7 Cost Center and L4 Geographies create one block, and then every aggregated member in each list creates its own block.
How blocks are calculated – Native Time/Native Versions
Native Time and Native Versions are different than custom lists in that every detailed member is a block (Actual, Forecast, Budget, 2021 Jan, 2022 Feb, etc.) as well as the aggregated members (Quarter, Half Year, FY Year, All Periods). One of the reasons for this is to avoid circular references with time-based calculations as well as having Opening and Closing balances.
Also, this allows the use of certain time functions (NEXT, PREVIOUS, LAG, LEAD, OFFSET) as well as certain Version functions (NEXTVERSION, PREVIOUSVERSION).
To calculate the number of blocks each member in the timescale must be accounted for. For example, let’s say we have the following module:
- Timescale: 3 years at the Week level
- Versions: 4
- List: L7 with 631 total members, but only 406 at the detailed level
- Line items: 20, with summary turned on
Let’s break this down by lists:
- Time: 214 blocks by multiplying 71*3 + 1 (All Periods)
- Versions: 4 versions = 4 blocks
This one module will have 193,456 blocks per line item. If you have 20 line items with the same dimensionality, you will have created 3,869,120 blocks.
If we use the same dimensionality, but use a custom list to represent Time, the number of blocks will decrease. When using a custom Time list, every detailed member, in this case at the week level, is contained in a single block.
Again, using the same dimensionality but substituting a custom Versions (more information on custom versions) list versus Native Versions and using Native Time, our blocks will decrease even more.
And finally, if we use a custom list for both versions and time, our total blocks will decrease even more, but with that, our blocks will be very large because we are storing the exact same amount of data.
So, is having fewer blocks always the goal which will correlate to a more performant model?
The Hyperblock can perform calculations in parallel at the block level; it will calculate its blocks with Hyper-threading to create as many parallel calculations as possible based on cell count.
Note: Exceptions are the functions running single-threaded: RANK, RANKCUMULATE, ISFIRSTOCCURENCE, and CUMULATE with 3 parameters (using a list).
The more cells a block has, the more threading will be applied up to the physical limitations of the CPU. If blocks have very complex or large calculations, they will cause those threads to do more work and therefore take longer. Meaning if we create blocks with too high a cell count and a complex formula the performance may be affected – however, the Hyperblock takes care of this in most scenarios by doing a selective calculation on only the cells affected by the change instead of all the cells in the line item (and so on through the DAG - Directed Acyclic Graph).
The threading is based on cell count, blocks with a very small cell count will not have as many threads applied to them. Therefore, a block with a single cell, such as a top-level summary on a single dimension, would have no threading applied. If that cell must do a lot of work summarising values over a large dimension, it could take a long time to complete (because no threading is applied). This is why we have the Planual rule for 1.05-07 Avoid Top Level for large flat lists
A general rule with Anaplan is that the duration of a calculation is proportional to the number of blocks. It simply means that the more work we have, the longer it takes. Within those blocks, some will take a lot longer to calculate than others (the larger blocks); but on the whole, when averaged out, this rule applies. This also applies to how complex or inter-connected the model is; the greater number of connections among line items means a longer time spent working through the DAG (the method Anaplan uses to determine a calculation chain).
Summary block counts
Line item summaries can dramatically increase the cell count of a line item and the block count.
Adding a summary on a line item can increase the work needed to be done by the Hyperblock and leads us to Planual rule 2.03-01 Turn Summary options off by default
Here is a quick example to illustrate why adding a summary generates so many blocks.
A module is dimensioned by Account and City lists and Versions. The lowest level of City and Account form the biggest block, and there are separate blocks for the aggregated levels at Country, Region, and Channel for the two hierarchies and finally the two top levels of the hierarchies. As seen in this example, having a summary method applied increases the block count 12-fold for each native version defined.
There are 11 summary blocks for the main Account/City block, when we add a version we duplicate that many blocks for each version; doubling the number of summary blocks.
A more detailed example of the impact of Versions on block count, many of which will be summary blocks, comes from the article To Version or Not to Version?
Should I worry about blocks?
No, the Hyperblock will handle calculations efficiently in most scenarios and create an efficient size and number of blocks. The building of a model should follow the PLANS methodology, with a lot of emphasis on the N, Necessary. Try to calculate as little as possible to achieve the desired outcome. To do this you will need to use S in DISCO to create reusable System modules to avoid repeated calculations and potentially duplicated blocks. Simplify calculations so each block calculates quicker.
We often suggest splitting out complex calculations into new line items to reduce complexity. You will quite rightly argue that this creates more blocks and more blocks mean more work to do. This is correct but the benefit of simpler calculations means that extra volume of blocks gets done quicker than fewer more complex blocks. Not all blocks are created equal!
Do worry about summary blocks though and get into a habit of turning the summary off when you create a line item. This way they will only get added when you know they are needed. Try to avoid top levels on lists where possible, by that we mean don’t just add it by default to all lists, consider its use.
A reason why we shouldn’t focus on just the block count
The thing to focus on is how does the calculation perform when the complex formula is split out. If it takes less time, does it matter there are more blocks created? There’s always a balance to be struck between size and performance. As always, test to see if it has a benefit, and test in isolation where possible. Again, for an example see the article To Version or Not to Version?
That said, we should be cautious about the number of blocks or cells that some calculations have to work over, especially calculations that have to execute in a sequential way. POST, CUMULATE, LAG, LEAD, OFFSET are examples of formulas that will operate over a timeline in a specific order having a lot of time periods means a lot of blocks to execute that calculation over.
If summaries are also applied, they additionally execute across that time range after the cells calculate. To keep excess summary blocks in check, try to keep time ranges as small as possible, calculate over fewer time periods or at a higher granularity, avoid unnecessary historical data.
Another aspect to consider is the dimension order, try to keep them in similar orders between source and target. This will align the block indexes in similar orders so that data is read in a predictable sequence when the processor pre-fetches the data into cache. Using the default system applied order of Applies To is the best way to do this. This is shown in more detail in the article Dimension Order - Anaplan Community
The key points are to understand what the calculation is doing and how it operates, how many cells it will calculate over so that you can make the right decision on what is absolutely necessary so that the minimum number of calculations are done. PLANS - This Is How We Model - Anaplan Community
- Avoid unnecessary summaries
- Turn off when creating line items
- Reduce summaries
- Smaller time-range
- Fewer Periods
- Flat lists
- Use System modules to avoid repetition
- Do calculations once, reference many times
- Reduce complexity
- Split out complexity, more but simpler blocks
- Be careful with time series functions
- Smaller time-range
- Fewer Periods
- Months not Weeks
Got feedback on this content? Let us know in the comments below.