OEG Best Practice: To version or not to version?
One of the most common questions I get asked is when to use native versions versus using a general list to replicate versions. As is often the case, the answer is: it depends. Let me run through some key areas to help you with the decision.
Functionality
What functionality do you need? We always advocate using native Anaplan functionality where possible, if that provides the most logical, simple, and understandable formulas and structures.
So, what built-in functionality does using versions give us?
- Switchover functionality: automatic rolling forecasts using the latest actuals
- Version-specific functions: CURRENTVERSION(), PREVIOUSVERSION(), NEXTVERSION(), ISCURRENTVERSION(), ISACTUALVERSION()
- Formula scope: to allow the formula to only apply to actual version, all except actual version, current version
- Version formula: allowing specific formulas for different versions. This uses line item subsets and is only applicable for numeric formatted line items.
- Edit to/from dates: to allow read/write access to ranges of dates within a version
- Bulk copy: copy from one version to another for the whole model (although this applies to all lists, too)
- Formulas: within the versions list (although we don’t recommend this)
- Compare: lets you compare data values against a base case. You can choose what to compare the data against—a different version, a previous version of the current module, a list, or a time period.
- Structural data: versions are synchronized to deployed models as part of Application Lifecycle Management (ALM)
You can also use versions for iterative calculations. Each version is contained in its own calculation block, so there is a built-in intelligence of dependencies for previous and next (in a similar way to time) so you can use the formulas from above to iterate through the calculation if needed. If you need to use versions in this way, it is best practice to set the Actual? flag to true for the first version, and the Current? flag to true for the last version.
When referencing a source module with versions as a dimension in a target module without versions, the “current” version is automatically returned. This is why you should set the current version as the last version when using the iterative calculation technique described above.
Finally, an additional benefit to using switchover with versions is a reduction in cell count for the line item(s) using the switchover functionality. The cells for the “actual” periods are not counted as part of the calculation thus reducing the line item size and memory calculations needed.
Limitations
There is a lot of functionality that versions bring, but there are also some limitations (mostly due to the block structure, which I will explain later):
- You cannot set versions as a format for line items.
- Linked to the above, you cannot use LOOKUP or SUM when referencing a version.
- Similarly, you cannot SUM out of a module by version (as per time, due to the block constructs).
- You cannot create version subsets.
- Versions are structural as so cannot be deleted in development models without affecting production models (as part of ALM). This can potentially lead to larger than desired development models.
In summary, versions bring functionality that lists do not have, but there are limitations that do not apply to general lists.
Performance
The first aspect of PLANS is "performance", so we shouldn’t neglect that. Probably the biggest consideration here will be the number of versions. This is where the lack of subsets can have a big impact on size and performance. There are no parent levels within versions, but adding versions to modules with other large dimensions (and timescales) will add to the aggregation points and memory blocks needed. Remembering D.I.S.C.O., if all of the calculations and data points do not need all of the versions, then there will be a lot of extra calculations, memory, and size being consumed.
As with most aspects of PLANS and the Planual, we’ve done some testing to give you the “lab” results to quantify the impact. We tested the calculation volume and time for a simple model with native versions and a second model replacing versions with a general list of “fake” versions.
Test model details
Timescale: 1 year by week with a monthly and full-year total.
General Lists:
- Region: 200 Cities>20 Countries>5 Regions (with a top level)
- Channel: 100 Accounts>10 Channels (with a top level)
Versions: 50
Modules: City by Account by time by version/fake version
Line Item Formula:
- Data
- Cumulate (Data) – summaries on
Results
This shows how big the calculation is or how much work the engine needs to do. The results are quite similar, with native versions having a smaller calculation volume overall.
However, when we look at how long it took to perform the calculations, the results are very different.
Why is that? When we look more deeply and analyze how many data tasks the engine processed, the results show clearly why the native versions model took longer to fully calculate on model opening.
The native versions model has nearly 50x more tasks to perform than the fake versions model. What is causing there to be such a big difference?
The answer lies in the block structure of the calculation engine. Each version (and time) member is contained in its own block. When we break down the other hierarchies, they also split into blocks. The lowest level of City and Account form the biggest block, and there are separate blocks for the aggregated levels at Country, Region, and Channel for the two hierarchies and finally the two top levels of the hierarchies.
The blocks are made up as follows:
- Cities: 27 = 1 for the lowest level of City, 20 for Country, 5 for Region, 1 for top level
- Accounts: 12 = 1 for Accounts, 10 for Channel, 1 for top level
- Time: 65 = 52 weeks, 12 months and a full year
- Versions: = 50
The total number of blocks is the product of all of the blocks from the lists, time, and version blocks.
You can see that in the fake versions model—because there are no versions—there is only one block for fake versions (no parents), so the block count is 50x smaller; the same multiplier as the number of tasks.
This multiplier is also backed up when we look at the number of sub tasks created as part of the calculation. A large block can be split into smaller tasks that run in parallel over many CPU threads. The smaller blocks for the versions can only be split once. This is a factor in the efficiency of the calculation.
Although the volume of calculation for the fake version model is slightly larger, it can be executed more efficiently across the processing threads of the engine (this currently stands at 144 as the maximum threads on the hardware). Think of the tasks as boxes and only one box can be moved at one time. It's much quicker for 144 people to move 2,549 boxes than 124,853 boxes, even though the combined weight of the boxes is the same.
Decision time
So how do we decide? It depends on how critical the built-in version functionality is. It is possible, in almost all cases, to replicate the functionality using lists, Boolean flags and alternative techniques such as Dynamic Cell Access. These will require more modeling and complexity. Also, one needs to consider just how many versions are needed. If only a small number of versions are required (e.g. less than five), then the performance impacts described above will be minimum and the functionality benefits and simplicity may sway you towards native versions. However, a large number of versions can have a detrimental impact on performance, particularly if used in conjunction with long timescales and deep hierarchies. In this case, a fake version list would be preferable. Finally, one must consider the maintenance and updates of the versions (specifically where ALM is in use). If versions are added regularly and the number of versions starts to increase, this will increase the size of the development model. However, versions are not likely to be the largest list within the model and it is better to cut down other, larger production lists to keep the development model small.
So, in conclusion, the decision will rest on balancing the simplicity of built-in functionality versus complexity, efficiency, and performance. With this in mind, the answer could well be a mix of both. One of Anaplan’s big strengths is the ability to use appropriate and different structures in the same model. It is also a fundamental element of Anaplan's best practice (D.I.S.C.O.), so using both native and fake versions within a model is fine.
As with most Anaplan modeling problems, there is no one right or wrong answer, but I hope this has helped give you some guidance and insight into some of the inner workings of an Anaplan Hyperblock engine.
Author David Smith.
Contributing author Mark Warren.
Comments
-
Often we "save" into a module with fake version the key output metrics and compare the current version to that line item but for the actuals/main forecast it's version.
As you say it really does all depend - when we had a small workspace (and more novice) tried to not have versions at all rather the planning was done using additional line items for inputting. But now can see how best to merge versions and no versions across the model landscape whilst still allowing users to be able to select and save comparatives almost at will.
7 -
Great write up David. Thanks for producing this...it is a nice representation of how often there isn't a 'right' answer, just pros and cons to be considered.
3 -
Great write up. The other consideration we’ve run into a few times is ITEM functionality not working on the native Versions. The same goal can be accomplished with lookups or other system modules, but in a pinch custom Versions can support that without additional setup.
4 -
That is very enlightning. I always thought native versions were more efficient, albeit ended up giving you a bigger model because less flexibility.
I usually do a mix a both, basing my model on the "dynamic/fake" versions while using the native ones for a few input modules where input and formula on the same line item are required.
4 -
@adam_bimson_5 Exactly. PLANS is all about balancing Performance, Usability and Sustainability and understanding the pros and cons of each. What we are doing now is giving some of the performance implications to supplement the other two
@nathan_rudman @andrewtye Agreed, the "answer" is often a mix
Yes, I replied to a post on exactly that yesterday; the reason being you cannot set Versions as a format
3 -
We use a combinaison of both. Most of the time we work with "Alt Versions". But we use Standard Versions for some import because you can filter imported data based on the date thanks to the "Edit From" and To".
Also we call them "Alt Version" instead of Fake because, in France, people often mispronounce "Fake".
7 -
Thanks for this information and performance tests between native and fake versions. We use both of them in different models. Now if I get questions why I use fake versions from my colleagues I can show this article.
2 -
David,
As per the comments above these detailed analyses are so useful. I would be really interested in you doing an investigation in a similar vein but looking at real time vs dummy time.
I have a particular interest on the impact of daily time-scales, as there is a point at which using these on real time caused real performance issues, but the tipping point is not that obvious so it would be helpful to have some analysis on this.
Thanks,
Sean
4 -
Great question, leave it with me. With CPX coming up, it might be a few weeks, so bare with me
I suspect we will see something similar because Time blocks are structured the same as Versions. This is also why long, daily timescales can cause performance issues.
The challenge is that the specific Time functionality (PREVIOUS, OFFSET, LAG) is only available through real timescales and unlike Versions, it is mostly impossible to replicate.
That said, again, for very long, granular timescales, I would consider using fake time for the majority of the modules, and then use real time, with Time Ranges for the "time specific" calculations.
As part of the new Learning journey, we are going to be adding a module on mapping time to fake time and back again (it's easier than you think to do and maintain).
But as with Versions, in a lot of cases, a mix of the two techniques is probably the most optimal solution.
Also, as an aside, if you haven't seen it, take a look at this post where I analysed the differences between PREVIOUS, LOOKUP, OFFSET, LAG etc.
David
3 -
Hi,
I tend to use a combination of native Anaplan versions and fake versions (FV)
Real versions are one of the most powerful Anaplan features for me due to switchover (space saving, and just the general awesomeness of being able to actualise data with the change of a dropdown date), as well as the bulk copy.
I tend to only keep a limited amount of real versions (ACT, Live FC, Budget and possibly on previous FC). This is to limit the impact on size (as you know, there is no subset functionality for versions)
I then use FV in my reporting modules where I then add multiple "previous versions" without impacting the size of other modules where versions were used as well as have the ability to use the LOOKUP function for versions (assists in version comparison)
Andr
5 -
Thanks for posting David! I really loved this article as it is a question we have struggled with often. Most of our models leverage a non-native version as they did not have a need to leverage most of the built-in functionality (Switchover, Edit to/From, Version formulas). On our initial implementation we found that the non-native worked better for our needs as we needed to modules to create an output based on a designated version in order to interface data to our financial system. Using a non-native version allowed us to use LOOKUP formulas when calculating the output values. Kudos for the Idea Exchange to add the ability to select native versions: https://community.anaplan.com/t5/Idea-Exchange/Ability-to-select-versions-as-a-list-formatted-line-item-for-use/idi-p/40198 We also have instances where we are leveraging both the native and non-native versioning, this is so we can get the best of both worlds. We still need to use the LOOKUP to create our interface data, but we also want to take advantage of switchover and edit to/from restrictions.2
-
Hi,
Nice article and very useful insight!
We highly appreciated the power of the specific functionalities of the native versions ( i.e. formula scope, switch-over).
However, we used the native versions only when the specific native version was requested.
For historize data, variance reporting between different versions: always used non-native versions (normal lists).
The main reason to switch between native version to normal lists versions was the limitation of the usage of LOOKUP, even SELECT is available.
There were posted several echange ideas in order to change this. It would be highly appreciated if would be solved in some way! 🙂
Please kudo also if you are interested in this enhancement:
thanks
Alex
0 -
Hello,
Great and insightful article. Thanks for putting together.
We tend to exclusively use native versions because of the simplicity of switchover, version formulas, and bulk copy functionalities. As discussed above, we do tend to keep our # of versions at a minimum (5-6 max) because of space of processing speed, but we always archive models with previous versions so that, if needed, we can unarchive and pull older versions.
Reading through the some of the replies it seems like fake versions are used for 2 main reasons: 1. To be able to "save off" versions into a fake version list and 2. using lookups to do version comparisons.
In our models, we've used native version functionality by doing the following:
- In our final output modules, we designate all line items with 'Current Version' (the live forecast versions) formula scope and do a Bulk Copy. This ensures that those locked forecast outputs won't change regardless of forecast tweaks or formula changes.
- We use the 'Compare' functionality for version to version comparisons but this functionality is EXTREMELY limited. We can only set up these 'compares' within a module view and need to re-publish to a dashboard every time we want to compare against a different version. Please please please update this functionality to allow end users to use 'compare' on a dashboard!
Thanks,
Jared
1 -
thanks for sharing
I do like using formula scope with the Current Version to archive forecasts, as just switching the "current" flag removes the formula from the "previous current version" and leaves the data behind!
Good point about Compare - not widely used, probably because of the limitations!. Post your thoughts on ideas exchange
David
0 -
Great article @DavidSmith
The main use of 'fake' versions list in my practice have been where the users want to create a version every month and can end up with almost 15 versions in one planning cycle leading to a large increase in model size.
As the native versions list cannot be 'Subsetted' (if thats even a word) the use of the fake versions list allows this, allowing the users to maintain the model and inputs are far reduced size.
I really liked your explanations relating to how the blocks process calculations and possible performance issues that may arise.
Thanks,
Usman
0 -
Hi, Great Article, lots of thoughts after reading this one.
One question though : how does fake versions compare to native versions performance wise if we set all line items Formula Scope "Current Only".
My understanding would be that :
1) only the Current Version is then calculated and the 49 remaining ones are kind use volume only and not much calculation memory.
2) As a consequence the performance wont be an issue in that case.
1 -
Great question
I will test and find out
You can't utilise Formula scope with Fake Versions though so it wouldn't be a true like for like test, but we can test versions with all and versions with formula scope
David
1 -
@david.savarin Nice and acute observation. thanks!
Indeed when formula scope is used as "Current Version", the line items cells combined with the other versions become "input" cells. So, the expected behavior is that the CPU calculation power is used ONLY for the cells of the version ticked as "current version".
For all the other versions the only calculation that will apply is the summary property of the line-items which also need to be applied for the "Current version" as well.
As @DavidSmith noticed: "fake versions" do not have such "formula scope".
So, in my opinion, the expected result of the tests should reveal that for the same amount of cells, using native versions with formula scope "Current version" should be more performant than using "fake versions" where the formula is applied and calculated for every version.
Some observations:
- The "Current version" formula scope have sense to be used mostly when the other versions are the result of the historical copies of the "Current version".
- Take into account that the "Current Version" scope cannot be used for the line items where the summary property is "Formula" .
Alex
1 -
I have run the analysis using the above model and set formula scope for Current Version only
For the Fake versions model I used a single boolean to check and had a formula like this:
IF Fake Version Details.'Not Current?' THEN 0 ELSE CUMULATE(Data)
The results were almost exactly the same as above
The volume of the calculations for Fake Versions were 2x larger, but the duration of the calculations were 6x shorter
The checking for the "not current" prevents the calculation going further (a benefit of the early exit condition)
I also checked the effect of the early exit (checking for the most common condition of not current vs current)
As expected, checking for the IF "Not Current" THEN 0 ELSE ... was 1.5x faster than IF "Current" THEN Cumulate() ELSE 0
One other element to add to the overall discussion is size and switchover. Using Switchover will reduce the model size, as the "actual" periods don't take up space. So if size is a primary consideration, bear that in mind. As ever, if is a balance between performance, size, sustainability, usability etc.
I hope that helps
David
1 -
I was curious to see the real life impact of using fake versioning vs. native versioning on a completed model. So, I duplicated a model that had been built with fake versions, changed every module where fake versions had been used to native versions and created the same number of versions in each model.
The only difference between the two models was native vs fake versions.
Using the same number of "versions":
The Fake Versioned model has 2.7m cells and a size of 78mb
The Native Versioned model has 2.7m cells and a size of 1009mb
Clearly despite the size saving impact of switchover, there are significant size (i.e. memory) implications (as well as calculation time implications) when using native versioning.
Thoughts?
Tom
8 -
Except native versioning can be set to calculate on current version or actual versions. Doing so (should) drastically reduce calculation burden. I still miss statistics to make the point though
0