You may have heard about a model called a data hub, but perhaps you aren’t confident that you understand the fundamentals, primary functions, or considerations when architecting one. There are three main advantages to incorporating a data hub:
Single source of truth: Stores all transactional data from the source system.
Data validations: Ensures all data is correct and valid before the data gets to the spoke model(s).
Performance: It is always faster to load data from a model rather than a file.
Additionally, the administrator can ensure the correct granularity of data in the spoke model(s) when using a data hub. For example, the source system may only contain transactional data at the daily level, but the planners may need the data aggregated to the month. The data hub can summarize the data and export only the data needed.
The following information is designed to further define a data hub and support you in your journey of building your own.
Table of Contents
Definition of the Data Hub
First, we need to define what a data hub is. This can be split into four sections:
Use cases: The data hub should be the first model built, whether you have a single use or multiple use cases. The data should be automatically refreshed on a schedule, whether it is nightly, weekly, monthly, etc., from the source system—often an Enterprise Data Warehouse (EDW). All modules and views that create hierarchies or lists should be stored in the data hub, which enables your models in having one version of truth, as well as reducing the duplication of data.
Model connectivity: Anaplan Connect , one of our 3 rd party vendors (Informatica Cloud, Dell Boomi, Mulesoft, or SnapLogic), or our REST API can be used to automate the loading of data to the data hub from the source system, as well as transferring data from the data hub to the spoke model(s). Additionally, transitional data should not be loaded directly into the spoke module, especially if there is a large volume of data.
Functions: Often, simple ETL (Extract, Transform, and Load) functions can be utilized within your data hub to transform the data for the spoke model(s). This is helpful when consolidating data from multiple sources where you have different “codes” and need a mapping module to ensure the correct data gets mapped correctly.
Team: The management of the data hub should have a designated team of experts who understand what data is stored in the data hub (to ensure duplication doesn’t happen), as well as the how and when the data gets loaded.
Anaplan Architecture with a Data Hub
There are several ways your Anaplan architecture could look, depending on the number of workspaces you currently have and the type of security your company requires. The following are illustrations of common architectures.
Master Hub Model: Across Workspaces
The most common, and recommended, architecture is when the data hub is in its own workspace. Not only does this have the advantage of not interfering with the other models, but it also adds an additional security layer, with a segregation of duties. In this view, the Anaplan Workspace Admin(s) can limit the access to the data hub workspace to only the people who require it.
Master Hub Model: Within a Workspace
The simplest depiction is where your data hub is within the same workspace as your spoke models. While this can be accomplished, it is not best practice as there is no segregation of duties and there is a possibility, upon heavy loads from the source system, of performance issues. Additionally, when adding users, the Anaplan Workspace Administrator (Admin) would need to ensure users don’t have access to the data hub, as well as any users of the data hub not having access to the spoke models
Multiple Data Hubs
Finally, the data hub doesn’t necessarily have to be the only model in the workspace. You can have additional data hubs, if needed.
Factors to Consider When Implementing a Data Hub
There are six main elements to think about when architecting a data hub:
Exporting data to spoke model(s).
One of the cornerstones of The Anaplan Way is data (process, model, and deployment being the others), which is critical to all implementations. You will need to know what data is needed for a certain use case. Consider the following, common, data questions that need to be answered in order to be successful:
What granularity of the data is needed?
How much history is needed? How much history do you have?
Does the source system only have transactional data, but the use case needs the data at the month level? Can the source system do the aggregation for you?
After all data questions have been answered, shift your focus to the source system and consider the following:
Consider the source system. Where is the data coming from? What is the source system, and is it a trusted environment? Is it Excel? Typically, you should stay away from Excel as the “source” because Excel cannot be audited.
Define the data source owners. Who has access to this data? Who is preparing it? Are they part of the project? These are often-overlooked questions that are critical to success. Ideally, the data source owners need to be part of the project from the start to understand the file specifications and prepare the initial load of the data, as well as towards the end of the project to do a final load of the data.
Define file specifications. How many files will be needed? Typically, you will need master data, as well as transactional data. Instead of having one file with all of this data, determine if the data can be split between different files (one for transactional, one for the unique members of the master data). It will be better for Anaplan (for performance reasons) to split these to reduce warnings during the data load process.
Analyze the data. Understand what makes each record unique (date/period and transactional amounts should not be part of this), and make sure the data owners don’t give you everything (Select * From Employee) when you only need five columns. Remember, it is better to ask for additional columns midway through the project than getting all columns in the beginning and only using a select amount.
Consider custom codes in the source system. Find more on this in the transactional lists section. This is a great trick for transactional data. After you have analyzed the data to understand what makes each record/row unique, concatenate the “codes” of the metadata into one transactional code, but remember, you will need to be under the 60-character threshold.
Define the schedule. When is the data available? Is the data on a certain schedule? What is the schedule required with this use case?
Determine the ETL medium to be used. Will Anaplan Connect be sufficient, will one of our 3 rd parties be used, or will a more custom application be needed, such as REST API? Does your company already have this experience inhouse, or will training be required? These will need to be factored into all data stories.
Usually, the largest lists are those containing transactional data. There can be millions of transactional ID’s with several list properties defined. First, properties should not be defined on a transactional list (or any list, except for Display Name, as they do get accounted for in the workspace memory). Secondly, instead of loading metadata to list properties (Cost Center and Account as properties), try to figure out a way to incorporate them into the code. If the transactional data is defining a transactional amount at the intersection of Cost Center and Account for a particular month, attempt to use the code of the Cost Center and the code of the Account concatenated together (0100_57000). Not only will this decrease your list size, but it will also create a healthier model.
In the below example, the model builder did not create a custom code, but rather used a combination of properties to make the record unique, which included the date/period, as well as the transactional amount. Notice the original number of records vs. the number of records after a custom code was created.
By incorporating the date/time period, as well as the transactional amount, it inflated the list size exponentially based on the number of years that were loaded. Doing this not only caused the model to be bigger, but also caused poor model opening performance. See the Appendix for a simple worked example to explain further.
Learn more about sparsity in the two-part series The Truth about Sparsity: Part 1 and The Truth About Sparsity: Part 2 .
Similar to transactional lists, flat lists are not part of a hierarchy and are a series of records grouped in a list, like Products, Companies, Cost Centers, or Employees. These are your “legends” or “anchor” for all metadata about this unique record. Again, the only property that should be defined is a Display Name, if needed. It is best practice, from a model builders’ perspective, to suffix the name with “Flat” or “- Flat”. This helps identify whether the list is part of a hierarchy or flat list (Employee – Flat, Cost Center – Flat, Product – Flat). These lists can be used for data validation, which will be described later in this article.
Ideally, you should have three types of modules in the data hub:
Transactional: A Transactional module will store the transactional data by the time series, whether that be by day, week, month, quarter, or year. The only data, or line items, should be transactional data. No other line items should be defined. Additionally, to keep the size down, make sure the summaries on the line items are turned off, or None, as there is no reason to sum the data within the module.
System: System (SYS) modules, or the “S” in DISCO , do not have time associated with them and should only be dimensionalized by the same list (Employee Flat, Cost Center Flat, Product Flat). These modules store the metadata or attributes about the list item that doesn’t change over time, for example the employee’s start date. Another example of a SYS module would be any kind of mapping that is required, whether it be SYS Time Filter module or a mapping from one source system to another.
Export modules: If the data from the source system is being loaded at a lower granularity than needed in the spoke model(s), export modules can aggregate the data to the specified need (month, quarter, or year level), which will lead to more efficient data load performance to the spoke model(s). Additionally, it is better to only load the granularity of data needed instead of loading all data to the spoke model, but only using a portion of it.
Loading Data vs. Using Formula’s in SYS Modules
If you can devise a custom code where all of the attributes of the data are accounted for, you can greatly increase the performance of your data load, especially on very large data volumes. It is actually faster to use formulas to derive the data from the custom code than it is to load the data. Why? A couple of reasons. First, when data is loaded, the load is triggering the change log, and every change is being recorded in the model history . Second, loading data to another module is an additional action. If you didn’t need that action, you would save processing time.
In the example below, the exact same data was loaded four different ways:
Import Properties to a List: A list was created with all attributes, including the transactional data, and was loaded to list properties (not best practice and against DISCO).
Import to List, Attribute, and Trans: A list was created, the transactional data was loaded to a transactional module, and all of the attributes were loaded to a SYS Attribute module.
Import to List, Trans, Calculate Attribute: A list was created, the transactional data was loaded to a transactional module, but the SYS Attribute model was calculated using two different methods:
One Line Item: Using FINDITEM() with several functions parsing out the information from within the FINDITEM(). For example, FINDITEM(Cost Center, RIGHT(LEFT(Trans Details.Code, '2nd Group’), 3)).
Multiple Line Items: Parsing of the member spread across multiple line items and using FINDITEM() with only the list and code as the parameter. First, you do the parsing to get the correct piece of the code (one line item), and then the FINDITEM() of that code (2 nd line item).
Notice, the best performing data load was the last one, Import to List, Trans, Calculate Attribute (multiple line items), where the parsing out of the data was spread over multiple line items. This is due to the fact that the data load was able to take advantage of Anaplan’s multithreading capabilities. The worst performing data load occurred when data was loaded to the Attribute module because, due to the sheer size of the data, a save had to be performed.
Exporting to Spoke Models
One of the most important concepts to remember when exporting data is to use a view from a module. Lists should not be exported because you lose control over what you export. It is either all or nothing. By using views, you can employ a filter (should always be a Boolean) to render exactly which data needs to be exported. If you need more than one filter, combine both into one line item and use that line as the filter. You will have much better performance if you are only using one Boolean line item as a filter vs. having multiple filters defined.
Another important concept to remember is to only export detailed information, as there is no reason to export parent information (quarter, year, etc.). Not only will you get warnings when exporting parent information, but the performance of the export will decrease because the system will have to create a debug log. The goal is to make sure a debug log is not created, all green checks, so if there ever is an issue, you will know it truly is an issue that needs attention.
Line items in the data hub formatted as text should not be exported as text, but actually as list formatted line items in the spoke model (text->list formatted line item). The goal is to reduce the number of text formatted line items in the spoke model.
Some say they need to do validation in the spoke model, therefore they need to import the data as text. Actually, this is false, because the validation should have already been done in the data hub, so there should be no need to do the validation again.
Lastly, you should think about what really needs to be exported. Do you really need to export historical data that hasn’t been changed? Instead, just export the newly loaded data, or delta data. This can be accomplished by using one of two methods:
From the source system, request IT to only send the updated information, not the full load every time. Additionally, request IT to create a column in the source file with a hardcoded value of “TRUE.” This will tell Anaplan which row is new or has been updated and can be used as a filter for an export. Just know, before the import of the source data gets loaded, make sure the first action within the process clears out the previous true records (set this up via a view using a filter where the view only shows members with a value of true).
Utilize the current period function to only export the current period data. In the SYS Time Filter module, create a line item named Current Period with the formula CURRENTPERIODSTART(). In the export views, filter the data on this line item.
Tips and Tricks
A few of tips and tricks to be aware of include the following:
Hierarchies should not be in the data hub.
Analytical modules should not be in the data hub.
Do not delete and reload lists.
Data Validations Model
Why should hierarchies not be in the data hub? To answer that question, you need to understand why hierarchies are used in the first place. Essentially, hierarchies are only needed to aggregate data for analytical purposes, and since users will not normally login to the data hub, the lists essentially take up space. With that said, it is perfectly okay to create the hierarchies for testing purposes to ensure your actions from the meta modules are building the hierarchies correctly, but as soon as the actions are working correctly and have been verified, you can remove the list structures from the data hub. A case can be made that certain implementations may need the hierarchies created in the data hub for validation purposes of several sources. If this is the case in your implementation, just be sure to only use the hierarchies for validation purposes.
In addition to the above, there are two more reasons to not have hierarchies built in the data hub—cluttered data, and spoke models that pull data from the lists.
Data hubs need to be clean and clutter free to ensure optimal performance, which also makes it easier for the administrators to understand exactly what data is stored in the data hub. Additionally, when you have lists—especially hierarchical lists—spoke model builders will sometimes build their lists from the lists within the data hub instead of from a view. It is best practice to always build lists from views from within a module so the action can benefit from filters (there are no filters when importing from lists).
Analytical modules should not be in the data hub since end users don’t normally access the data hub. There really isn’t a reason to have products by versions by time in the data hub, that belongs to the spoke model. Remember, the data hub should only be used to store data from the source system(s).
Within your nightly data load process, do not delete and reload data, including the list structures. If you have a proper code, you shouldn’t need to do this. Additionally, not only does this impact the overall performance of the process (adding an additional action to delete the list, which then deletes all data associated with that list), but the process is essentially filling up the change log with the exact same data that it had before the delete. When a certain threshold is surpassed, the model will require a save, thus taking up even more time. Ultimately, you are forcing the model to re-aggregate all of the data, instead of just the new data.
Lastly, if you know you will have to do a lot of transformations on your data (consolidating multiple source systems or your data is not clean), think about creating a Data Validations model. This model’s sole purpose would be to clean the data and then feed the data to the data hub, thus keeping the transformations to a minimum in the data hub as well as keeping the data hub clean.
Use Case: Transaction Data is by Store and SKU and Month
The code for the Transaction list is a three-part code Store_SKU_Month
Attributes for Store, SKU and Month are imported as Text and matched against the Store list, SKU list and Time period respectively
An additional line item is needed for the Store and SKU code (for export).
This is the screenshot of the bad way:
Notice the repetition of the attributes. STR07 and SKU031 are repeated each month.
Two data files
Unique combinations of Store and SKU (two-part code)
Store SKU code by month for the quantity.
The transaction details are stored in a module dimensioned by Transactions
The Store and SKU attributes are calculated using the “_” delimiter
The quantity is stored in a module dimensioned Transactions and by month
The additional line item is needed for the Store and SKU code (for export). This is a subsidiary view in the module as it is not dimensionalized by Time.
These are the screenshots of the good way:
Below lists out the breakdown of the model in terms of List size, Line items and the associated member usage of the various structures. The main reasons for the improvement are because lists themselves account for approximately 500b for each member and also there is repetition of the attributes per “month” in the transaction data (as mentioned above).
Hopefully, this article has shed some light on data hubs, how they should be used, and what you can do to ensure they perform at their peak level. Remember, analyze the data to understand what makes the row unique and use that as the code. Every list should have a code—every list!
Line item subsets are one of the most powerful and efficient features in Anaplan, yet one of the least understood. The COLLECT() function is probably the only “black box” function within Anaplan as is it not immediately apparent what it is doing or where the source values are coming from. In the following article, I will cover how to understand line item subsets more easily, and also explain their many uses, some of which do not need COLLECT() at all.
For more information on creating line item subsets see Line Item Subsets in Anapedia.
A line-item subset is a list of items drawn from one or more line items from one or more modules. Put simply, it converts line items into a list on which calculations can be performed. There are some restrictions:
Line item subsets can only contain numeric formatted line items.
Only one line item subset can be used as a dimension in a module.
Although line items can contain formulas, the items in a line item subset can only aggregate to a simple subtotal.
Styles on the line items are not transferred over to the line item subset.
Line item subsets can be used for many different areas of functionality. For the examples used, I have based them on the final model from the new Level 1 training. Download the model and follow the instructions to practice on the same structures.
These examples are deliberately simplified, but I hope you find these insightful and easy to transfer into your models to simplify the formulae and provide more flexibility to your users.
Table of Contents:
Calculations on Calculations
This is the classic use of line item subsets. A source module contains line items, and subsequently, you need to perform additional calculations on these line items. While in some cases this can be managed through complex formulae, normally these workarounds break most of the best practice guidelines and should be avoided.
Use Case example:
The source module contains forecast data with line items for the profit and loss lines in U.S. dollars. We need to convert these values into local currency based on the Country dimension.
The source modules are as follows:
The first step is to create the line item subset, and for this report, we only want summary values.
In the settings tab, choose Line Item Subsets and click insert.
We recommend prefixing with LIS: the name of the module and simple description.
Clicking on the Line Item Subset header item (in settings) will display the Line Item Subsets screen.
Click on the newly created line item subset and the … and select the module(s) required; in this case, it is REP03.
Select which line items you wish to include in the line item subset.
Now that the line item subset has been created, it is available to be used in a module.
Create a module with the following dimensions:
LIS: REP03 Currency
Add the following line items:
In the Base Currency line item, enter the formula: COLLECT()
Note that the values are the same as those in REP03 and the line items are now shown in list format (no formatting). Also note that these values are from the Forecast version, as the target module does not have versions, so the Current Version is used as the source automatically.
Add the following formulae to the remaining line items to complete the calculation.
Exchange Rate = 'DATA02 Exchange Rates'.Rate[LOOKUP: 'SYS03 Country Details'.Currency Code]
Local Currency = Base Currency / Exchange Rate
Note that the Exchange Rate line item should be set as a Subsidiary view (excluding the line item subset from the applies to) because we are showing it on the report for clarity. If this display was not required, the calculation could be combined with the Local Currency formula.
You can also use a line item subset to help with the transformation between source and target modules.
Use Case example:
We want to summarize costs (from the reporting P&L) into Central and Locally controlled costs.
Create a list (Controllable Costs) containing two members.
Create a line item subset (as before) using just REP03 as the source module.
Create a staging module with the following dimensions:
LIS: REP03 Cost Reporting
Add a line item (Data) and enter COLLECT() as the formula.
Set the Summary method to None; we do not need subtotals in this module.
Create a mapping module, dimensioned by LIS: REP03 Cost Reporting.
Add a line item (Mapping) formatted as the Controllable Costs list.
Map the lines as applicable.
Create a reporting module with the following dimensions.
Add a line item called Costs.
Add the formula: 'REP07 Cost Reporting Staging'.Data[SUM: 'SYS14 Cost Mapping'.Mapping]
We use the SUM formula because the source dimension and the mapping dimension are the same. So, “If the source is the same, it’s a SUM.”
Multiple Source Modules
Line item subsets can contain line items from multiple modules. There is a caveat though; all modules must share at least one common dimension/hierarchy and/or have a Top Level set for non-matching dimensions.
Use case example:
Based on user-entered settings, we want to compare the values from two time periods for metrics from three different modules and calculate the absolute and % variances.
The source modules all share a common dimension:
REV03 Margin Calculation: G2 Countries, P2 Products, Month
EMP03 Employee Expenses by Country: G2 Countries, Month
OTH01 Non-Employee Expenses: G3 Location, E1 Departments, Month
Note: G3 Location has a G2 Country as a parent
The module for the user parameters is:
And the metrics required are:
We could solve this problem without using a line item subset:
Create a list (Reporting Metrics) containing the list items above.
Create a module with the following dimensions.
The formula for Month 1 is:
IF ITEM(Reporting Metrics) = Reporting Metrics.Margin THEN 'REV03 Margin Calculation'.Margin[LOOKUP: 'SYS11 Time Variance Reporting'.'Month 1'] ELSE IF ITEM(Reporting Metrics) = Reporting Metrics.Salary THEN 'EMP03 Employee Expenses by Country'.Salary[LOOKUP: 'SYS11 Time Variance Reporting'.'Month 1'] ELSE IF ITEM(Reporting Metrics) = Reporting Metrics.Bonus THEN 'EMP03 Employee Expenses by Country'.Bonus[LOOKUP: 'SYS11 Time Variance Reporting'.'Month 1'] ELSE IF ITEM(Reporting Metrics) = Reporting Metrics.Rent THEN 'OTH01 Non Employee Expenses'.Rent[LOOKUP: 'SYS11 Time Variance Reporting'.'Month 1'] ELSE IF ITEM(Reporting Metrics) = Reporting Metrics.Utilities THEN 'OTH01 Non Employee Expenses'.Utilities[LOOKUP: 'SYS11 Time Variance Reporting'.'Month 1'] ELSE 0
I won’t repeat the formula for Month 2, as it is effectively the same, just referencing the Month 2 line item in the source.
You can see, that even for a small set of metrics, this is a large complex formula, going against best practices. So, let’s not do that.
Create the line item subset as before.
For multi-module line item subsets, it is best practice to use Multi> to represent the various modules.
Open the line item subset and chose the three modules.
Create a staging module (this is best practice following the DISCO principle), with the following dimensions.
LIS: Multi>Variance Reporting
Add a line item (Data) and enter COLLECT() as the formula.
Set the Summary method to None; we do not need subtotals in this module.
Create a mapping module, dimensioned by Reporting Metrics.
Add a line item formatted LIS: Multi>Variance Reporting.
Map the lines accordingly.
In the reporting module from above, change the Month 1 and Month 2 line item formulae to.
'REP05 Variance Report Staging'.Data[LOOKUP: 'SYS11 Time Variance Reporting'.'Month 1', LOOKUP: 'SYS12a Reporting Metrics Mapping'.Mapping]
'REP05 Variance Report Staging'.Data[LOOKUP: 'SYS11 Time Variance Reporting'.'Month 2', LOOKUP: 'SYS12a Reporting Metrics Mapping'.Mapping]
Note, this time we are using LOOKUP rather than SUM because the source dimension doesn’t match the dimension of the mapping module.
I think you’ll agree that the formula is much easier to read and it is more efficient.
However, we can do even better; but note that there now are two ‘lookups’ in the formula. The more “transformations” there are in the formulae, the more work the engine needs to do. We can remove one of these by changing the target module dimensionality.
Copy the reporting module from above.
Remove the formulae for Month 1 and Month 2.
Replace Reporting Metrics with LIS: Multi>Variance Reporting as the dimension (applies to).
Add the following formulae for Month 1 and Month 2 respectively.
Month 1 = 'REP05 Variance Report Staging'.Data[LOOKUP: 'SYS11 Time Variance Reporting'.'Month 1']
Month 2 = 'REP05 Variance Report Staging'.Data[LOOKUP: 'SYS11 Time Variance Reporting'.'Month 2']
Note, only one lookup is needed in the formula.
Another use case that line item subsets can be used for is filtering. And this functionality has nothing to do with staging data or mapping modules. It is possible to filter line items and these can also be filtered based on other dimensions too.
Use Case example:
Based on user-entered settings, for the reporting module (REP03) we want to show different line items for each year and version.
We already have set up the Years to Versions filter module
We now want to set up the user-driven parameters. To ensure that the users’ settings do not affect each other, we need to use the system generated Users’ list.
Create a line item subset based on REP03
Select all line items
Create a new module with the following dimensions:
LIS: REP03 Filters
Add a single line item (Show?) formatted as a Boolean
Enter values as you wish
Note that Employee expenses and Other Costs are not available to check. This is because, in REP03, they are a simple aggregation and are shown as Parents of the other line items.
So, how do we resolve this? You can “trick” the model by turning these setting off.
The subtotals are now available to check in the filter module.
It is worth noting, be careful when doing this. If you are using the line item subsets as a dimension in a data entry module, the totals will not calculate correctly. See Final Thoughts for more details.
To set up the filter
In REP03, set the following filters
The module will now filter line items and years when the version page selector is changed.
Note the subtotals work correctly in this module because it is not data entry.
Dynamic Cell Access
Line item subsets can be used in conjunction with Dynamic Cell Access to provide very fine-grained control over data; again, without any mapping modules or COLLECT() statements
Use Case Example:
In the following module
The following rules apply:
Bonus % is set by the central team so it needs to be read only.
All metrics for Exec are not allowed to be edited.
Car Allowances are not applicable for Production.
Phone Allowances are not applicable for Production, Finance or HR, and the allowances for Sales should be read only.
To set up the access:
Create a line item subset based on EMP01
Select all line items
Create an Access Driver module with the following dimensions:
LIS: EMP01 DCA
Add two Boolean formatted line items
Enter the values as below
Now in EMP01 assign the Read Access and Write Access drivers to the module
The module now looks like this:
Line Items Subsets with Line Item Subsets
I mentioned at the outset that you can lose formatting when using a line item subset. However, in some cases, it is possible to keep formatting along with calculations
Use Case Example:
Using the values from REP03, we want to classify Sales and Costs and then calculate a cost % to Sales. Yes, we could do this in the module itself as a separate line item, but we also want to be able to reclassify the source line items from a dashboard using mappings rather than change the blueprint formula. We also want to maintain formatting.
For this example, I have just changed the styles to illustrate the point
Create a line item subset based on REP03.
Create a staging module with the following dimensions:
LIS: REP03 Cost%
Add a line item call Data and enter COLLECT() as the formula and set the Summary method to None.
Create a second line item subset based on REP10 (the target module).
Create a mapping module dimensioned by the LIS: REP03 Cost%
Create a line item formatted as LIS: REP10
Map the lines accordingly
In the target module set following formula for both Sales and Costs line items (Yes, it is the same formula for both!)
'REP09 LISS vs LISS - Staging'.Data[SUM: 'SYS20 Cost% Mapping'.Mapping]
Note the formatting is preserved.
Finally, I want to mention a piece of functionality that is not well known but very powerful; Version Formula. Utilizing line item subsets in conjunction with versions, Version Formula extends the ‘Formula scope” functionality. It is possible to control formulae using Formula Scope, but there are limited options.
Use Case example:
Let’s assume that we have actuals data in one module, the budget data in another and we want to enable the forecast to be writeable. The current version (in the versions setting) is set to Forecast
For this example, there is only one line item in the target module, but this functionality allows the flexibility to set different rules per version for each line item
Create a line item subset based on the above and select the line item(s).
Now in the blueprint view of the target module click Edit>Add Version Formula.
Now choose the Version to which the formula applies.
You will now see a different formula bar at the top of the blueprint view.
Enter the following formula:
'DATA01 P&L Actuals & Budget'.Revenue
Repeat the above for Budget with the following formula:
'REV03 Margin Calculation'.Revenue
Note that now at the top, you can see that there is a Version Formula set.
We mentioned the aggregation behavior and the ‘Is Summary’ setting earlier. Let me show you how this and the construction of the formulas affect the behavior of the line item subset
We will use the following module as an example. This module is only used to set up the line item subset, so no dimensions are needed.
Note that the subtotal formulae are simple aggregations.
This means the subtotal lines:
Calculate correctly when used as a dimension in a module.
Are not available for data entry.
The following module is dimensioned by the line item subset to highlight 1. and 2. above.
If we decide we don’t want the Employee costs in the line item subset, two things happen:
The indentation changes for the detailed cost lines because they are now not part of a parent hierarchy on display.
The Costs subtotal doesn’t calculate. This is because the Costs subtotal needs the intermediate subtotals to exist within the line item subset.
To mitigate the latter point there are two remedies.
Include the subtotals and hide them – The lines are still calculating and taking space.
If possible, adjust the formula structure.
Remove the subtotals formula.
Add in the Costs formula as to use the detailed items; no intermediate totals.
Re-add the subtotal formulas.
Note the 'Parent' and 'Is Summary' settings, the Costs subtotal now calculates correctly.
If we change the formulae to be something other than simple addition, you will see that calculation is fine in the source module,
but not in the line item subset module.
Why is this?
Remember the 'Is Summary' setting we changed in the Filters section when we adjusted the formula the 'Is Summary' is now unchecked
This means that the line item subset doesn’t treat the line as a calculation, hence the data entry 0 shown instead.
If your costs need to be positive (as in this example), it is possible to calculate correctly using a ratio formula. This works for normal line items/lists as well as line item subsets. See Changing the sign for Aggregation for more details
How do we keep our users in the Anaplan platform to do their work which requires a high level of advanced customization, faster and more easily than their previous Excel environment? The solution is called “Smart Filters”. Check it out !
Thinking through the results of a modeling decision is a key part of ensuring good model performance—in other words, making sure the calculation engine isn’t overtaxed. This article highlights some ideas for how to lessen the load on the calculation engine.
Formulas should be simple; a formula that is nested, or uses multiple combinations, uses valuable processing time. Writing a long, involved formula makes the engine work hard. Seconds count when the user is staring at the screen. Simple is better. Breaking up formulas and using other options helps keep processing speeds fast.
You must keep a balance when using these techniques in your models, so the guidance is as follows:
Break up the most commonly changed formula
Break up the most complex formula
Break up any formula you can’t explain the purpose of in one sentence
Formulas with many calculated components
The structure of a formula can have a significant bearing on the amount of calculation that happens when inputs in the model are changed. Consider the following example of a calculation for the Total Profit in an application. There are five elements that make up the calculation: Product Sales, Service Sales, Cost of Goods Sold (COGS), Operating Expenditure (Op EX), and Rent and Utilities. Each of the different elements is calculated in a separate module. A reporting module pulls the results together into the Total Profit line item, which is calculated using the formula shown below. What happens when one of the components of COGS changes? Since all the source components are included in the formula, when anything within any of the components changes, this formula is recalculated. If there are a significant number of component expressions, this can put a larger overhead on the calculation engine than is necessary.
There is a simple way to structure the module to lessen the demand on the calculation engine. You can separate the input lines in the reporting module by creating a line item for each of the components and adding the Total Profit formula as a separate line item. This way, changes to the source data only cause the relevant line item to recalculate.
For example, a change in the Product Sales calculation only affects the Product Sales and the Total Profit line items in the Reporting module; Services Sales, Op EX, COGS and Rent & Utilities are unchanged. Similarly, a change in COGS only affects COGS and Total Profit in the Reporting module.
Keep the general guidelines in mind. It is not practical to have every downstream formula broken out into individual line items.
Plan to provide early exits from formulas
Conditional formulas (IF/THEN) present a challenge for the model builder in terms of what is the optimal construction for the formula, without making it overly complicated and difficult to read or understand. The basic principle is to avoid making the calculation engine do more work than necessary. Try to set up the formula to finish the calculations as soon as possible.
Always put first the condition that is most likely to occur. That way the calculation engine can quit the processing of the expression at the earliest opportunity.
Here is an example that evaluates Seasonal Marketing Promotions:
The summer promotion runs for three months and the winter promotion for two months.
There are more months when there is no promotion, so this formula is not optimal and will take longer to calculate.
This is better, as the formula will exit after the first condition more frequently.
There is an even better way to do this. Following the principles from above, add another line item for no promotion.
And then the formula can become:
This is even better because the calculation for No Promo has already been calculated, and Summer Promo occurs more frequently than Winter Promo.
It is not always clear which condition will occur more frequently than others, but here are a few more examples of how to optimize formulas:
FINDITEM formula The Finditem element of a formula will work its way through the whole list looking for the text item, and if it does not find the referenced text, it will return blank. If the referenced text is blank, it will also return a blank. Inserting a conditional expression at the beginning of the formula keeps the calculation engine from being overtaxed.
IF ISNOTBLANK(TEXT) THEN FINDITEM(LIST,TEXT) ELSE BLANK
IF BLANK(TEXT) THEN BLANK ELSE FINDITEM(LIST,TEXT)
Use the first expression if most of the referenced text contains data and the second expression if there are more blanks than data.
LAG, OFFSET, POST, etc. If in some situations there is no need to lag or offset data, for example, if the lag or offset parameter is 0. The value of the calculation is the same as the period in question. Adding a conditional at the beginning of the formula will help eliminate unnecessary calculations:
IF lag_parameter = 0 THEN 0 ELSE LAG(Lineitem, lag_parameter, 0)
IF lag_parameter <> 0 THEN LAG(Lineitem, lag_parameter, 0) ELSE 0
The use of formula a or b will depend on the most likely occurrence of 0s in the lag parameter.
Booleans Avoid adding unnecessary clutter for line items formatted as BOOLEANS. There is no need to include the TRUE or FALSE expression, as the condition will evaluate to TRUE or FALSE.
IF Sales > 0 then TRUE ELSE FALSE
The Planual provides a systematic set of standards for model building on the Anaplan platform. The rules in it are designed produce the most efficient, usable, and scalable Anaplan models, while dramatically increasing performance for models in all contexts. We highly recommend that all model builders familiarize themselves with these standards, and start incorporating them into their model-building practices. (The results will be significant!)
Reducing the number of calculations will lead to quicker calculations and improve performance. However, this doesn’t mean combining all your calculations into fewer line items, as breaking calculations into smaller parts has major benefits for performance.
Learn more about this in the Formula Structure article.
How is it possible to reduce the number of calculations? Here are three easy methods:
Turn off unnecessary Summary method calculations.
Avoid formula repetition by creating modules to hold formulas that are used multiple times.
Ensure that you are not including more dimensions than necessary in your calculations.
Turn off Summary method calculations
Model builders often include summaries in a model without fully thinking through if they are necessary. In many cases, the summaries can be eliminated. Before we get to how to eliminate them, let’s recap on how the Anaplan engine calculates.
In the following example we have a Sales Volume line-item that varies by the following hierarchies:
This means that from the detail values at SKU, City, and Channel level, Anaplan calculates and holds all 23 of the aggregate combinations shown below—24 blocks in total.
With the Summary options set to Sum, when a detailed item is amended (represented in the grey block), all the other aggregations in the hierarchies are also re-calculated. Selecting the None summary option means that no calculations happen when the detail item changes. The varying levels of hierarchies are quite often only there to ease navigation, and the roll-up calculations are not actually needed, so there may be a number of redundant calculations being performed. The native summing of Anaplan is a faster option, but if all the levels are not needed it might be better to turn off the summary calculations and use a SUM formula instead.
For example, from the structure above, let’s assume that we have a detailed calculation for SKU, City, and Channel (SALES06.Final Volume). Let’s also assume we need a summary report by Region and Product, and we have a module (REP01) and a line item (Volume) dimensioned as such.
REP01.Volume = SALES06 Volume Calculation.Final Volume is replaced with REP01.Volume = SALES06.Final Volume[SUM:H01 SKU Details.Product, SUM:H02 City Details.Region]
The second formula replaces the native summing in Anaplan with only the required calculations in the hierarchy.
How do you know if you need the summary calculations? Look for the following:
Is the calculation or module user-facing?
If it is presented on a dashboard, then it is likely that the summaries will be needed. However, look at the dashboard views used. A summary module is often included on a dashboard with a detail module below; Effectively, the hierarchy sub-totals are shown in the summary module, so the detail module doesn’t need the sum or all the summary calculations.
Detail to Detail
Is the line item referenced by another detailed calculation line item? This is very common, and if the line item is referenced by another detailed calculation the summary option is usually not required. Check the Referenced by column and see if there is anything referencing the line item.
Calculation and staging modules
If you have used the D.I.S.C.O. module design, you should have calculation/staging modules. These are often not user-facing and have many detailed calculations included in them. They also often contain large cell counts, which will be reduced if the summary options are turned off.
Can you have different summaries for time and lists?
The default option for Time Summaries is to be the same as the lists. You may only need the totals for hierarchies, or just for the timescales. Again, look at the downstream formulas.
The best practice advice is to turn off the summaries when you create a line item, particularly if the line item is within a Calculation module (from the D.I.S.C.O. design principles).
Avoid Formula Repetition
An optimal model will only perform a specific calculation once. Repeating the same formula expression multiple times will mean that the calculation is performed multiple times. Model builders often repeat formulas related to time and hierarchies. To avoid this, refer to the module design principles (D.I.S.C.O.) and hold all the relevant calculations in a logical place. Then, if you need the calculation, you will know where to find it, rather than add another line item in several modules to perform the same calculation.
If a formula construct always starts with the same condition evaluation, evaluate it once and then refer to the result in the construct. This is especially true where the condition refers to a single dimension but is part of a line item that goes across multiple dimension intersections. A good example of this can be seen in the example below:
START() <= CURRENTPERIODSTART() appears five times and similarly START() > CURRENTPERIODSTART() appears twice.
To correct this, include these time-related formulas in their own module and then refer to them as needed in your modules.
Remember, calculate once; reference many times!
Taking a closer look at our example, not only is the condition evaluation repeated, but the dimensionality of the line items is also more than required. The calculation only changes by the day, as per the diagram below:
But the Applies To here also contains Organization, Hour Scale, and Call Center Type.
Because the formula expression is contained within the line item formula, for each day the following calculations are also being performed:
And, as above, it is repeated in many other line items.
Sometimes model builders use the same expression multiple times within the same line item. To reduce this overcalculation, reference the expression from a more appropriate module; for example, Days of Week (dimensioned solely by day) which was shown above. The blueprint is shown below, and you can see that the two different formula expressions are now contained in two line items and will only be calculated by day; the other dimensions that are not relevant are not calculated.
Substitute the expression by referencing the line items shown above.
In this example, making these changes to the remaining lines in this module reduces the calculation cell count from 1.5 million to 1500.
Check the Applies to for your formulas, and if there are extra dimensions, remove the formula and place it in a different module with the appropriate dimensionality .
The following is a g uide for the new Statistical Forecasting Calculation Engine Models (monthly and weekly). It includes enablement videos, practice data import exercise, model documentation, and specific steps when using the model for implementations .
1. Enablement Videos & Practice Exercise
Intro and Overview Video
Model overview and review of new key features.
Initial Model & Data Import Steps
Steps on how to setup model, product hierarchy, customer list and multi-level forecast analysis.
Practice Exercise—Import data to setup stat forecast
Two sets of load files included to practice setup for single level product set or multi-level product set w/ customers, product and brand level.
Start on "Initial App Setup" dashboard and load either Single OR Multi Level files into model, and use Import video as guide if needed.
.Zip File Attached
Lucidchart Process Maps
Lucidchart Process Map document includes High-Level process flow for end-user navigation and detailed tabs for each section.
**Details & links also on "Training & Enablement" dashboard.
High-Level Process Map PDF
High-level process map PDF format.
Forecast Methods PDFs
High-level version with forecast algorithms list and overview.
Detailed version which includes a slide for each forecast method, m ethod overview, advantages/disadvantages, equation and graph example output.
**These slides are also included on "Forecast Methods Overview & Formulas" dashboard.
3. Implementation Specifics
Training & Enablement Dashboard
Training & Enablement dashboard contains details on process map navigation.
Initial Model Setup
Initial Setup: current model staged with chocolate data from data hub, execute CLEAR MODEL action prior to loading customer-specific data.
Changing Model Time Scale— align Native & Dynamic Time Settings
If a Time Settings change is required, need to review Initial App Setup dashboard to align Native Time with Dynamic Time setup in model.
Monthly Update Process
After initial setup, use Monthly Data History Upload dashboard to update prior period actuals and settings .
Single Level vs. Multi-Level Forecast Setup
Two implementation options & when to use:
Single Level Forecast: Forecast at one level of product hierarchy (i.e. all stat forecasts calculated at Item level). Most use cases will leverage single level forecast setup.
Multi-Level Forecast : Ability to forecast at different levels of the product hierarchy (i.e. Top Item | Customers, Item and Brand level can all have stat forecast generated). This requires a complex forecast reconciliation process, review "Multi-Level Forecast Overview" dashboard if this process is needed.
Follow troubleshooting tips on Training & Enablement dashboard if having issues with stat forecast generating before reaching out for support.
Model Notes & Documentation
Module Notes—includes DISCO classification and module purpose.
"Do Not Modify" Items
Module notes contain DO NOT MODIFY for items that should not be changed during the implementation process.
User Roles & Selective Access
Demo, Demand Planner, Demand Planning Manager ro les can be adjusted
After Selective Access process run on Flat List Management dashboard; then users can be given access to certain product groups/brands etc.
Details on daily batch processing and how to prepare a roadmap of your batch processes – files, queries, import actions/processes in Anaplan (see attachment).
Intro & Model Intro and Overview Video.
Data Import and Setup Steps.
5. Model Download Links
Monthly Statistical Forecasting Calculation Engine
Weekly Statistical Forecasting Calculation Engine
PLANS is the new standard for Anaplan modeling—“the way we model.” This covers more than just the formulas and includes and evolves existing best practices around user experience and data hubs. It is a set of rules on the structure and detailed design of Anaplan models. This set of rules will provide both a clear route to good model design for the individual Anaplanner and common guidance on which Anaplanners and reviewers can rely when passing models amongst themselves.
In defining the standard, everything we do will consider or be based around:
Performance – Use the correct structures and formula to optimize the Hyperblock
Logical – Build the models and formula more logically – See D.I.S.C.O. below
Auditable – Break up the formula for better understanding, performance, and maintainability
Necessary – Don’t duplicate expressions. Store and calculate data and attributes once and reference them many times. Don't have calculations on more dimensions than needed
Sustainable – Build with the future in mind, thinking about process cycles and updates
The standards will be based around three axes:
Performance - How do the structures and formula impact the performance of the system?
Usability/Auditability - Is the user able to understand how to interact with the functionality?
Sustainability - Can the solution be easily maintained by model builders and support?
We will define the techniques to use that balance on the three areas to ensure the optimal design of Anaplan models and architecture.
As part of model and module design, we recommend categorizing modules as follows:
Data – Data hubs, transactional modules, source data; reference everywhere
Inputs – Design for user entry, minimize the mix of calculations and outputs
System – Time management, filters, list attributes modules, mappings, etc.; reference everywhere
Calculations – Optimize for performance (turn summaries off, combine structures)
Outputs - Reporting modules, minimize data flow out
Why build this way?
Fewer repeated calculations
Optimized structures and formulas
Data and calculations reside in logical places
Model data flows can be easily understood
Model structure can be easily understood
Simplified formula (no need for complex expressions)
Formulas and structures are not repeated
Data is stored and calculated once, referenced many times, leading to efficient calculations
Models can be adapted and maintained more easily
Expansion and scaling simplified
Formula Optimization in Anaplan
Formula Structure for Performance
The Truth About Sparsity: Part 1
The Truth About Sparsity: Part 2
Data Hubs: Purpose and Peak Performance
To Version or Not to Version?
Line Item Subsets Demystified
Best Practices for Module Design
Data Hubs: Purpose and Peak Performance
Formula Structure for Performance
Reduce Calculations for Better Performance
Formula Optimization in Anaplan
Dynamic Cell Access Tips and Tricks
Dynamic Cell Access - Learning App
Personal Dashboards Tips and Tricks
Time Range Application
Ask Me Anything (AMA) sessions
The Planual Rises
The process of designing a model will help you:
Understand the customer’s problem more completely.
Bring to light any incorrect assumptions you may have made, allowing for correction before building begins.
Provide the big-picture view for building. (If you were working on an assembly line building fenders, wouldn’t it be helpful to see what the entire car looked like?)
Table of Contents:
Understand the Requirements and the Customer’s Technical Ecosystem when Designing a Model
When you begin a project, gather information and requirements using a number of tools. These include:
Statement of Work (SOW): Definition of the project scope and project objectives/high-level requirements.
Project Manifesto: Goal of the project – big-picture view of what needs to be accomplished.
IT ecosystem: Which systems will provide data to the model and which systems will receive data from the model? What is the Anaplan piece of the ecosystem?
Current business process: If the current process isn’t working, it needs to be fixed before design can start.
Business logic: What key pieces of business logic will be included in the model?
Is a distributed model needed?
High user concurrency.
Security where the need is a separate model.
Regional differences that are better handled by a separate model.
Is the organization using ALM, requiring split or similar models to effectively manage development, testing, deployment, and maintenance of applications? (This functionality requires a premium subscription or above.)
User stories: These have been written by the client—more specifically, by the subject matter experts (SMEs) who will be using the model.
Why do this step?
To solve a problem, you must completely understand the current situation. Performing this step provides this information and the first steps toward the solution.
Results of this step:
Understand the goal of the project.
Know the organizational structure and reporting relationships (hierarchies).
Know where data is coming from and have an idea of how much data clean-up might be needed.
If any of the data is organized into categories (for example, product families) or what data relationships exist that need to be carried through to the model (for example, salespeople only sell certain products).
What lists currently exist and where are they are housed.
Know which systems the model will either import from or export to.
Know what security measures are expected.
Know what time and version settings are needed.
Document the User Experience
Front-to-back design has been identified as the preferred method for model design. This approach puts the focus on the end-user experience. We want that experience to align with the process so users can easily adapt to the model. During this step focus on:
User roles. Who are the users?
Identifying the business process that will be done in Anaplan.
Reviewing and documenting the process for each role.
The main steps. If available, utilize user stories to map the process. You can document this in any way that works for you. Here is a step-by-step process you can try:
What are the start and end-points of the process?
What is the result or output of the process? What does each role need to see/do in the process?
What are the process inputs and where do they come from?
What are the activities the user needs to engage in? Verb/object—approve request, enter sales amount, etc. Do not organize during this step. Use post-its to capture them.
Take the activities from step 4 and put them in the correct sequence.
Are there different roles for any of these activities? If no, continue with step 8. If yes, assign a role to each activity.
Transcribe process using PowerPoint ® or Lucid charts. If there are multiple roles, use swim lanes to identify the roles.
Check with SMEs to ensure accuracy.
Once the user process has been mapped out, do a high-level design of the dashboards.
Include: Information needed.
What data does the user need to see?
What the user is expected to do or decisions that the user makes.
Share the dashboards with the SMEs. Does the process flow align?
Why do this step?
This is probably the most important step in the model-design process. It may seem as though it is too early to think about the user experience, but ultimately the information or data that the user needs to make a good business decision is what drives the entire structure of the model.
On some projects, you may be working with a project manager or a business consultant to flesh out the business process for the user. You may have user stories, or it may be that you are working on design earlier in the process and the user stories haven’t been written. In any case, identify the user roles, the business process that will be completed in Anaplan, and create a high-level design of the dashboards. Verify those dashboards with the users to ensure that you have the correct starting point for the next step.
Results of this step:
List of user roles.
Process steps for each user role.
High-level dashboard design for each user role.
Use the Designed Dashboards to Determine What Output Modules are Necessary
Here are some questions to help you think through the definition of your output modules:
What information (and in what format) does the user need to make a decision?
If the dashboard is for reporting purposes, what information is required?
If the module is to be used to add data, what data will be added and how will it be used?
Are there modules that will serve to move data to another system? What data and in what format is necessary?
Why do this step?
These modules are necessary for supporting the dashboards or exporting to another system. This is what should guide your design—all of the inputs and drivers added to the design are added with the purpose of providing these output modules with the information needed for the dashboards or export.
Results of this step:
List of outputs and desired format needed for each dashboard.
Determine What Modules are Needed to Transform Inputs to the Data Needed for Outputs
Typically, the data at the input stage requires some transformation. This is where business rules, logic, and/or formulas come into play:
Some modules will be used to translate data from the data hub. Data is imported into the data hub without properties, and modules are used to import the properties. Reconciliation of items takes place before importing the data into the spoke model.
These are driver modules that include business logic, rules.
Why do this step?
Your model must translate data from the input to what is needed for the output.
Results of this step:
Business rules/calculations needed.
Create a Model Schema
You can whiteboard your schema, but at some point in your design process, your schema must be captured in an electronic format. It is one of the required pieces of documentation for the project and is also used during the Model Design Check-in, where a peer checks over your model and provides feedback.
Identify the inputs, outputs, and drivers for each functional area.
Identify the lists used in each functional area.
Show the data flow between the functional areas.
Identify time and versions where appropriate.
Why do this step?
It is required as part of The Anaplan Way process. You will build your model design skills by participating in a Model Design Check-in, which allows you to talk through the tougher parts of design with a peer.
More importantly, designing your model using a schema means that you must think through all of the information you have about the current situation, how it all ties together, and how you will get to that experience that meets the exact needs of the end-user without fuss or bother.
Result of this step:
A model schema that provides the big-picture view of the solution. It should include imports from other systems or flat files, the modules or functional areas that are needed to take the data from current state to what is needed to support the dashboards that were identified in Step 2. Time and versions should be noted where required. Include the lists that will be used in the functional areas/modules.
Your schema will be used to communicate your design to the customer, model builders, and others. While you do not need to include calculations and business logic in the schema, it is important that you understand the state of the data going into a module, the changes or calculations that are performed in the module and the state of the data leaving the module, so that you can effectively explain the schema to others.
For more information, check out 351 Schemas. This 10-to-15-minute course provides basic information about creating a model schema.
Verify That the Schema Aligns with Basic Design Principles
When your schema is complete, give it a final check to ensure:
It is simple.
“Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius — and a lot of courage to move in the opposite direction.” ― Ernst F. Schumacher
“Design should be easy in the sense that every step should be obviously and clearly identifiable. Simplify elements to make change simple so you can manage the technical risk.” — Kent Beck
The model aligns with the manifesto.
The business process is defined and works well within the model.
As a business operations manager on the Anaplan on Anaplan (AoA) team—an internal team, focused on bringing Connected Planning to life within Anaplan—I help to oversee our internal Anaplan model ecosystem and assist in the solutioning and development of Anaplan models across all of our functional business groups.
As Anaplan's largest customer, one of the numerous requirements we must address is user access and security. Utilizing Anaplan's user roles functionality typically gets the job done for granting users access to specific models. Occasionally, we must go one step further and leverage Anaplan's selective access feature. Roles and selective access are powerful tools and address our needs nearly all of the time. However, as we scale our own use of Anaplan, we have begun to encounter the need to provision user's access to lists based on multiple criteria, rather than just a single condition.
In Real Life
A real-life user provisioning challenge we’ve encountered is in our headcount planning model. As this model provides real-time reporting on our employees, there are inherent sensitivities and considerations around who can see information for specific employees—taking into consideration visibility to things like compensation and personally identifiable information (PII). We have multiple use cases built out within the model, including recruiting capacity and analysis, attrition reporting, hiring reporting, etc., and the access to specific employee data depends on the end user of the model.
Sample employee roster: Joey manages Usain, Eluid, and Meb; Americas Geo; HR Cost Center.
In this example model, we have our complete employee roster included. If an HR business partner accesses the model, we want them to see only employees that are tagged to the functional area they support (e.g. finance, sales). Additionally, if a business manager goes into the model, they should only see information for employees where they are the manager, or employees downstream on their management chain.
But wait! If the HR business partner is in Europe, they shouldn’t be able to see PII fields for their employees. Do you see how this could get complicated quickly? Additionally, some dashboards that contain non-sensitive employee information are perfectly fine to open up broadly to all users, while others contain sensitive data we need to provision.
So, how do we handle this? We can’t provision access by roles because all of the aforementioned users need access to the same modules/dashboards as it relates to the employees they manage. Additionally, no single user should be able to see all data for all employees. Selective access could be considered as a solution, but given the levels of complexity and multiple logical drivers—as well as the requirement to not hide reporting of non-sensitive data for employees—that option also has limitations.
Enter Dynamic Cell Access (DCA). Since DCA allows us to base read/write access off of formulae logic, it offers us the ability to layer on multiple levels of logic ahead of deciding whether or not someone should be able to read or write on a particular item in a list. It’s dynamic (who would have thought with that name?), which means it adjusts live as data within the model changes. Additionally, it offers us the flexibility to apply the provisioning logic to the exact modules we want to, rather than blanket provision users across the model.
DCA In Action
The following is a high-level example of how to leverage the power of DCA:
Load employee roster data into Anaplan, ensuring the data contains the employee email—the same email that is used to log in to Anaplan. This allows for the mapping of Anaplan users to the employee roster.
Set up a System module with the ‘applies-to’ list of the user list. User meta-data staging module: Rows represent model users (Joey, in this example) and the line items represent meta-data off of the roster module.
Within this module, we can join the employee roster data and the user list to map the employee’s meta-data to their Anaplan user profile (e.g. cost center, location, management chain, etc.)
Using a series of Boolean line items, we can write whatever logic we want to base our DCA on. In our example, this could include: Is HR business partner? Is Euro? Basically, this is a staging module for all of the employee meta-data we want to leverage to create our DCA drivers.
Set up a second System module with the ‘applies-to’ list of whatever list you want to apply DCA against, as well as the user list. In our case, this would also be our employee roster list.
Create a series of Boolean line items, testing different attributes of the User System module we just set up against the meta-data of the employees. An example would be (Employee Cost Center = User’s Cost Center). DCA logic module for the employee roster list (rows in this module): Line items represent the logic used to determine whether the user (Joey— in the page selector) can see the employee. The key here is to consolidate all of your logic into a single “Master” line item, which is on the far right.
Daisy chain your conditions together as desired, with the end result being a master Boolean line item, which is the driver for whether or not a particular user has read or write access to a particular item within the list. In this dashboard you can see that the information is masked for those employees that did not meet all of the criteria identified in the master DCA line item.
Select which modules you’d like to apply DCA to. The nice thing about DCA is you can go down to the line item level to map the master Boolean driver against.
The incredible power of the process described above is not only the complete control over and ability to customize your user provisioning, but also that as new roster data is loaded into Anaplan, the DCA automatically adjusts itself to account for changes. So, if someone changed cost centers or a manager on an employee changed, the formulas that we set up above would be referencing the new employee meta-data, and would automatically adjust the DCA drivers, allowing for a much more hands-off, sustainable approach to user provisioning.
Another inadvertent benefit we discovered with using this methodology is that Anaplan treats cells that are blank as a result of DCA drivers as being blank for filtering purposes. So, if you want to set up a dashboard that auto-filtered employees for the end user based on the logic above, you just have to add a line item hardcoded to contain values for every list item, and then filter that line item for not-blanks on your dashboards. Then you have a dynamic filter based on the user that is viewing the model…pretty slick!
Take this one step further and filter for not-blanks on a line item that will always contain data for an employee, and you get completely custom reporting based on which end user is viewing the dashboards.
Filters can be very useful in model building and are widely used, but they can come at the expense of performance—often very visible to users through their use on dashboards.
Performance can also hit imports and exports, which in turn may lead to the blocking of other activity, causing a poor perception of the model.
There are some very simple guidelines to design well-performing filters:
Using a Single Boolean Filter on a Line Item That Does Not Have Time or Versions Applied and Does Not Have a Summary Is Fastest
Try to create a Boolean line item that incorporates all the filter criteria you want to apply. This allows you to re-use the line item and combine a series of Boolean line items into a single Boolean for use in the filter. For example, you may want to filter on three data points: Volume, Product Category, and Active Status. Volume is numeric, Product Category is a list formatted line item matching a user selection, and Active Status is a Boolean.
Create a line item called Filter with the following formula: Volume > Min Vol AND Product Cat = User Selection.Category AND Active Status Here’s a very simple example module to demonstrate this:
A Filter line item is added to represent all the filters we need on the view. Only the Filter line needs to be dimensioned by Users. A User Selection module dimension only by Users is created to capture user-specific filter choices:
Here’s the data before we apply the filter: Here's the data with the filter applied:
A best practice suggestion would be to create a filter module and line items for each filter part. You may want other filters and you can then combine each filter as needed from this system module. This should reduce repetition and give you control over the filters to ensure they can all be Boolean.
What Can Make a Filter Slow?
The Biggest Performance Hit for Filters Is When Nesting Dimensions on Rows. The performance loss is significantly increased by the number of nested dimensions and the number of levels they contain. With a flat list versus nested dimensions (filtering on the same number of items) the nested filter will be slower. This was tested with a 10,000,000 list versus 2 lists of 10 and 1,000,000 items as nested rows; the nested dimension filter was 40% slower.
Filtering on Line Items With a Line Item Summary Will Be Slow.
A numeric filter on 10,000,000 items can take less than a second, but with a summary will take at least five seconds.
Multiple Filters Will Increase Time.
This is especially significant if any of the preceding filters do not lower the load because they will take additional time to evaluate. If you do use multiple filter conditions, try to order them so the most effective filters are first. If a filter doesn’t often match on anything, evaluate whether it's even needed.
Hidden Levels Act as a Filter.
If you hide levels on a composite list, this acts like a filter before any other filter is applied. The hiding does take time to process and will impact more depending on the number of levels and the size of the list.
Avoid Nested Rows for Export Views
Using nested rows can be a useful way to filter a complex set of data for export, but, as discussed above, the filter performance here can be poor.
The best way around this is to pivot the dimensions so there is only one dimension on rows and use the Tabular Multi Column export option with a Filter Row based on Boolean option. Some extra filter tips include the following:
Filter duration will affect saved views used in imports, so check the saved view open time to see the impact. This view open time will be on every use of the view, including imports or exports.
If you need to filter on a specific list, create a subset of those items and create a new module dimensioned by the subset to view that data.
Imports are blocking operations: To maintain a consistent view of the data, the model is locked during the import, and concurrent imports run by end-users will need to run one after the other and will block the model for everyone else. Exports are blocking for data entry while the export data is retrieved, and then the model is released. During the blocking phase, users can still navigate within the model.
Rule #1 Carefully Decide If You Let End-Users Import (And Export) During Business Hours
Imports executed by end-users should be carefully considered, and if possible, executed once or twice a day. Customers more easily accept model updates at scheduled hours for a predefined time—even if it takes 10+ minutes—and are frustrated when these imports are run randomly during business hours.
Your first optimization is to adjust the process and run these imports by an administrator at a scheduled time, and then let the user based know about the schedule.
Rule #2 Use a Saved View
The first part of any import (or export) is retrieving the data. The time it takes to open the view directly affects the time of the import or export.
Always import from a saved view—NEVER from the default view. And use the naming convention for easy maintenance.
Ensure the view is using optimized filters with a single Boolean value per axis.
Hide the line items that are not needed for import; do not bring extra columns that are not needed.
If you have done all of the above, and the view is still taking time to complete, consider using the Tabular Multi Column export and filter "in the way out." This has been proven to improve some sluggish exports.
Rule #3 Mapping Objective = Zero Errors or Warning
Make sure your import executes with no errors or warnings as every error takes processing time. The time to import into a medium-to-large list (>50k) is significantly reduced if no errors are to be processed.
In the import definition, always map all displayed line items (source→target) or use the "ignore" setting. Don't leave any line item unmapped.
Rule #4 Watch the Formulas Recalculated During the Import
If your end-users encounter poor performance when clicking a button that triggers an import or a process, it is likely due to the recalculations that are triggered by the import, especially if the action creates or moves items within a hierarchy.
You will likely need the help of the Anaplan Model Optimization team to identify what formulas are triggered after the import is done and to get a performance check on these formulas to identify which one takes most of the time. Usually, those fetching many cells such as SUM, LOOUKP, ANY, or FINDITEM are likely to be responsible for the performance impact. Speak to your Business Partner for more details on the Model Optimization services available to you.
To solve such situations, you will need to challenge the need for recalculating the formula identified each time a user calls the action.
Often, for actions such as creations, moves, assignments done in WFP or Territory Planning, many calculations used for reporting are triggered in real-time after the hierarchy is modified by the import, and are not necessarily needed by users until later in the process.
The recommendation is to challenge your customer and see if these formulas can be calculated only once a day, instead of each time a user runs the action. If this is acceptable, you'll need to rearchitect your modules and/or formulas so that these heavy formulas get to run through a different process run daily by an administrator and not by each end-users. If not, you will need to look at the formulas more closely to see what improvements can be made. Remember, breaking formulas up often helps performance.
Rule #5 Don't Import List Properties
Importing list properties takes more time than importing these as a module line item. Review your model list impacted by imports, and look to replace list properties with module line items when possible. Use a system module to hold these for the key hierarchies, as per D.I.S.C.O.
Rule #6 Get Your Data Hub
Hub and spoke: Setup a data hub model, which will feed the other production models used by stakeholders.
It will prevent production models to be blocked by a large import from an external data source. But since data hub to production model imports will still be blocking operations, carefully filter what you import, and use the best practices rules listed above.
All import, mapping/transformation modules required to prepare the data to be loaded into planning modules can now be located in a dedicated data hub model and not in the planning model. This model will then be smaller and will work more efficiently.
Try and keep the transaction data history in the data hub with a specific analysis dashboard made available for end users; often, the detail is not needed for planning purposes, and holding this data in the planning model has a negative impact on size, model opening times, and performance.
As a reminder of the other benefits of a data hub not linked to performance:
Better structure, easier maintenance: data hub helps keep all the data organized in a central location.
Better governance: Whenever possible put this Data Hub on a different workspace. That will ease the separation of duties between production models and meta-data management, at least on actual data and production lists. IT departments will love the idea to own the data hub and have no one else be an administrator in the workspace.
Lower implementation costs: A data hub is a way to reduce the implementation time of new projects. Assuming IT can load the data needed by the new project in the data hub, then business users do not have to integrate with complex source systems but with the Anaplan data hub instead.
Rule #7 Incremental Import/Export
This can be the magic bullet in some cases. If you export on a frequent basis (daily or more) from an Anaplan model into a reporting system, or write back to the source system, or simply transfer data from one Anaplan model to another, you have ways to only import/export the data that have changed since the last export.
Use a Boolean line item to identify records that have changed and only import those.
Dimension Order Affects Calculation Performance
Ensuring consistency in the order of dimensions will help improve the performance of your models. This consistency is relevant for modules and individual line items. Why does the order matter? Anaplan creates and uses indexes to perform calculations. Each cell in a module where dimensions intersect is given an index number.
Here are two simple modules dimensioned by Customer and Product. In the first module, Product comes first and Customer second, and in the second module, Customer is first and Product is second.
In this model, there is a third module that calculates revenue as Prices * Volumes.
Anaplan assigns indexes to the intersections in the module. Here are the index values for the two modules. Note that some of the intersections are indexed the same for both modules: Customer 1 and Product 1, Customer 2 and Product 2, and Customer 3 and Product 3, and that the remainder of the cells has a different index number. Customer 1 and Product 2 is indexed with the value of 4 in the top module and the value of 2 in the bottom module.
The calculation is Revenue = Price * Volume.
To run the calculation, Anaplan performs the following operations by matching the index values from the two modules.
Since the index values are not aligned, the processor scans the index values to find a match before performing the calculation.
When the dimensions in the module are reordered, these are the index values:
The index values for each of the modules are now aligned. As the line-items of the same dimensional structure have an identical layout, the data is laid out linearly in memory. The calculation process accesses memory in a completely linear and predictable way. Anaplan’s microprocessors and memory sub-systems are optimized to recognize this pattern of access and to pre-emptively fetch the required data.
How does the dimension order become different between modules? When you build a module, Anaplan uses the order that you drag the lists onto the Create Module dialog. The order is also dependent on where the lists are added. The lists that you add to the 'pages' area are first, then the lists that you add to the 'rows' area, and finally the lists added to the 'columns' area.
It is simple to re-order the lists and ensure consistency. Follow these steps:
On the Modules pane, (Model Settings>Modules) look for lists that are out of order in the Applies To column. Click the Applies To row that you want to re-order, then click the ellipsis.
In the Select Lists dialog, click OK.
In the Confirm dialog, click OK.
The lists will be in the order that they appear in General Lists.
When you have completed checking the list order in the modules, click the Line Items tab and check the line items. Follow steps 1 through 3 to re-order the lists.
Subsets and Line Item Subsets
One word of caution about Subsets and Line Item subsets. In the example below, we have added a subset and a Line Item Subset to the module:
The Applies To is as follows:
Clicking on the ellipsis, the dimensions are re-ordered to:
The general lists are listed in order first, followed by subsets and then line item subsets. You still can reorder the dimensions by double-clicking in the Applies to column and manually copying or typing the dimensions in the correct order.
Largest vs. Smallest?
This is the normal follow up question, and unfortunately, the answer is "it depends." Through research we have found that it all depends on the data within the module. Also, it can get very confusing if subsets are used; the Customer list might be bigger than the Products list, but if a subset of Customers is used that is smaller than Products, then what?
Also, we don't advocate ordering the lists in the General Lists setting in size order; the lists should be ordered in hierarchical order top to bottom, so, by definition, that will be smallest to largest. So our advice is be consistent. Think about how you describe the problem. Does the business talk about Customer by Product, or Products for Customers? Agree to a convention, and stick to it.
The calculation performance only relates to the common lists between the source(s) and the target. The order of separate lists in one or other doesn’t have any bearing on the calculation speed.
The Anaplan Optimizer aids business planning and decision making by solving complex problems involving millions of combinations quickly to provide a feasible solution.
Optimization provides a solution for selected variables within your Anaplan model that matches your objective based on your defined constraints. The Anaplan model must be structured and formatted to enable Optimizer to produce the correct solution.
You are welcome to read through the materials and watch the videos on this page, but Optimizer is a premium service offered by Anaplan (Contact your Account Executive if you don't see Optimizer as an action on the settings tab). This means that you will not be able to actually do the training exercises until the feature is turned on in your system.
The training involves an exercise along with documentation and videos to help you complete it.
The goal of the exercise is to setup the optimization exercise for two use cases; network optimization and production optimization. To assist you in this process we have created an optimization exercise guide document which will walk you through each of the steps. To further help we have created three videos you can reference:
An exercise walk-through
A demo of each use case
A demo of setting up dynamic time
Follow the order of the items listed below to assist with understanding how Anaplan's optimization process works:
Watch the use case video which demos the Optimizer functionality in Anaplan
Watch the exercise walkthrough video
Review documentation about how Optimizer works within Anaplan
Attempt the Optimizer exercise
Download the exercise walkthrough document
Download the Optimizer model into your workspace
How to configure Dynamic Time within Optimizer
Download the Dynamic Time document
Watch the Dynamic Time video
Attempt Network Optimization exercise
Attempt Production Optimization exercise
This article provides the steps needed to create a basic time filter module. This module can be used as a point of reference for time filters across all modules and dashboards within a given model.
The benefits of a centralized Time Filter module include:
One centralized governance of time filters.
Optimization of workspace, since the filters do not need to be re-created for each view. Instead, use the Time Filter module.
Conforms with the D.I.S.C.O. methodology as a 'System' module. More on D.I.S.C.O. can be found here.
Step 1: Create a new module with two dimensions—time and line items. The example below has simple examples for Weeks Only, Months Only, Quarters Only, and Years Only.
Step 2: Line items should be Boolean formatted and the time scale should be set in accordance to the scale identified in the line item name.
The example below also includes filters with and without summary methods, providing additional views depending on the level of aggregation desired.
Once your preliminary filters are set, your module will look something like the screenshot below.
Step 3: Use the pre-set Time Filters across various modules and dashboards. Simply click on the filters icon in the toolbar, navigate to the time tab, select your Time Filter module from the module selection screen, and select the line item of your choosing. Use multiple line items at a time to filter your module or dashboard view.
Dynamic Cell Access (DCA) controls the access levels for line items within modules. It is simple to implement and provides modelers with a flexible way of controlling user inputs. Here are a few tips and tricks to help you implement DCA effectively.
Access control modules
Any line item can be controlled by any other applicable Boolean line item. To avoid confusion over which line item(s) to use, it is recommended that you add a separate functional area and create specific modules to hold the driver line items. These modules should be named appropriately (e.g. Access – Customers > Products, or Access – Time etc.). The advantage of this approach is the access driver can be used for multiple line items or modules, and the calculation logic is in one place. In most cases, you will probably want read and write access. Therefore, within each module it is recommended that you add two line items (Write? and Read?). If the logic is being set for Write?, then set the formulas for the Read? line item to NOT WRITE? (or vice-versa). It may be necessary to add multiple line items to use for different target line items, but start with this a default.
You may not need to create a module that mirrors the dimensionality of the line item you wish to control. For example, if you have a line item dimensioned by customer, product, and time, and you wish to make actual months read-only, you can use an access module just dimensioned by time. Think about what dimension the control needs to apply to and create an access module accordingly.
What settings do I need?
There are three different states of access that can be applied: READ, WRITE, and INVISIBLE or hidden. There are two blueprint controls (read control and write control) and there are two states for a driver (TRUE or FALSE). The combination of these determines which state is applied to the line item. The following table illustrates the options:
Only the read access driver is set:
Read Access Driver
Target Line Item
Only the write access driver is set:
Write Access Driver
Target Line Item
Both read access and write access drivers are set:
Read Access Driver
Write Access Driver
Target Line Item
Revert to Read*
*When both access drivers are set, the write access driver takes precedence with write access granted if the status of the write access driver is true. If the status of the write access driver is false, the cell access is then taken from the read access driver status.
The settings can also be expressed in the following table:
WRITE ACCESS DRIVER
READ ACCESS DRIVER
Note: If you want to have read and write access, it is necessary to set both access drivers within the module blueprint.
Think about how you want the totals to appear. When you create a Boolean line item, the default summary option is NONE. This means that if you used this access driver line item, any totals within the target would be invisible. In most cases, you will probably want the totals to be read-only, so setting the access driver line item summary to ANY will provide this setting. If you are using the Invisible setting to “hide” certain items and you do not want the end user to compute hidden values, then it is best to use the ANY setting for the access driver line item. This means that only if all values in the list are visible then the totals show; otherwise, the totals are hidden from view.