Planual Explained - Day 5
Rule "1.05-03": Article 1, Chapter 5, and Rule 3 – “Avoid using List Properties” List Properties are same as line items but have many limitations, so keep it simple and have one place for calculations i.e., Module line items. This goes against P and A of PLANS if you wish to create list properties.
Here is how it was done in Pre Planual Era: Whenever there was a need to have an attribute to the lists we used to create list properties.
What is wrong with this method? There are lot of disadvantages of using list properties. A few listed below
- It is to be believed that list properties have a negative impact on the model performance especially on the model opening times.
- There is no floating formula editor in list properties – meaning you cannot activate the formula bar and use point and click.
- You cannot filter upon any list items or any property items/values in Lists
Here is how it should be done in Planual Way: Create a System Module for each list wherever you have properties associated to the lists. This will be a centralized place to store the attributes of the lists. It is worth to mention that these system modules should not have additional dimensionality – No Time, No Versions, no other lists.
Exceptions: There are 5 exceptions that have been given by Planual where you can use List Properties and here is the detailed explanation about all these exceptions
- 1.05-03a Reference module line items: This exception is given because there might be cases or scenarios where you want to refer the module line items into list properties. For example Display Names of Numbered lists are being driven by Module Line Items
- 1.05-03b Numbered Lists and related actions: Actions like ASSIGN need to have list properties in place before you start using this function. Read more Assign
- 1.05-03c Exports: When you want to take any export from Anaplan list properties can be used as an export labels. Let’s see the difference when we export with both these options
Without Row Labels – By default list Name will be available in the labels
Here is the output file
With all Row Labels i.e., Name, Code and List Property called Order
Here is the output file
4.1.05-03d Conditional Dashboard Navigation: In Classic Dashboard list properties are used to help navigate from one dashboard to another. For example basis the Order list property value different dashboards are being opened up
Read more on Anapedia link below Open Dashboard
5. 1.05-03e Dependent Dropdowns: List Properties are needed to create Dependent Dropdowns. Glimpse of what it means
Here is the micro lesson on how to create dependent dropdowns Dependent Dropdowns - Anaplan Community
That post includes a heavy dose of information 👌
I do have a question though.
I was told the reason why we should replace using list properties with system modules is that when we load the model, line items in modules are calculated and stored in the memory, however, list properties are only calculated when we actually reference the list, causing performance delay.
Is that correct? if not, then what is the correct reason.
I think you are right. Any calculation that happens within Anaplan happens in memory (as far as I know) and with lists there is a delay.(I didn't mention the reason though) Overall, it should have been there in point 1 of "What's wrong with this method section?"
List properties are loaded into memory as well so there is no delay. Think of list properties as line items as that is essentially what they are. Why not use them? Basically, a couple of reasons:
- there is currently not a way to control the summary of them
- writing formulas in modules/line items are much easier than list properties
- back in the day, there was no Reference By on lists, so you didn't know where they were being used (that has since changed)
- List properties add to the overall size of the list upon load
- SYS Properties allows you to store all of the data in one centralized place in a module for other model builders to use. The same can be said as the list, but it is better to centralize on in module.
Hope this helps,
I'm full-heartedly against using list properties for all the reasons you mentioned above that are concerned with audibility. I just wanted to confirm the performance explanation I have received regarding loading into memory.
Appreciate the clarification0
Excellent @rob_marshall Thanks for sharing few more reasons.
Even I had learnt that there are some performance impacts on using List Properties. Are we saying that there are no performance degradation upon using list properties.0
Yes, there could be performance impacts if the properties are formatted as numbers as they will always aggregate as you cannot turn summaries off. Also, you are adding to the overall size of the list, the first thing that is built on model open. Also, if you have subsets defined, you are directly impacting the size of the lists as you are duplicating the properties for every subset.
I've a question on one of your reasons to change List properties to System Modules namely:
- List properties add to the overall size of the list upon load
If we have a large module using a list or a couple of lists with list properties, will that module decrease hugely for each list property removed and moved into a system module? or in other word, is the cell count of a module related to the how many properties a list might have?
Thanks in advance,
Great question and the answer is it depends. Why does it depend? Do you have any subsets on the lists? If so, the system treats the subset like a different list so you are now compounding the number of properties because those are saved with the list as well as the subsets) whereas in a module they would not be.
Basically, this rule comes from people referencing the properties incorrectly where the list has a property and then in a module, they "copy" that property as is which leads to double or triple counting. Take a look at the below:
List defined with 2 properties
So now, you are double counting the cost of those properties because they are defined the exact same way in both the list and the module. To compound this, most people do this "double counting" in multiple modules. So, it is best to define it one place like a SYS module and just use/reference the line items.
Does this help?
Great explanation, thank you.
Yes our lists contain multiple subset lists but the list properties are just being used for LOOKUPs and SUMs. They're not being referenced directly by the module in line items or 'double counting' as you cleverly called it, so in this case is the Model still more bloated than if we were to separate the list properties out to SYS modules? (I understand the other benefits of having system modules i.e. auditability, filtering, formula usability etc.)
Is it more bloated, no it is not, but it is definitely cleaner to use a sys module for the attributes because:
- it is easier to "add to" a module than the list
- whether you use those properties in double counting or for sums/lookups, they are still stored in the subsets so that adds to memory footprint
- auditability, keeping all metadata in one place, as you mentioned
- it is easer to write formulas in line items vs list properties
- keeps the size of the list down which can impact performance on model opening (on model open, hyperblock creates the lists, modules, and then kicks off all formulas)
- Line items show up in the model map whereas list properties don't
For the above reasons are why line items are considered best practice vs. using attributes on a list.
I agree totally.
Thanks again for you're replies.
I'm inheriting some models that do not conform to this best practice. There is one list in particular containing 12 million line items that loads everything into Properties and then references those Properties in a data module that performs additional calculations. Import time for this list is over an hour (in a Process that can take 8 hours).
I thought that eliminating the duplication of data would speed things along, but decided to do some testing first to confirm my assumptions. So, I created two new lists and related modules.
Test 1 (mimic want I'm inheriting):
- load everything directly into the list
- reference each Property in a Line Item in the data module
- Import time: 7-8.25 minutes
- Workspace consumption: 9.83GB
- load only the primary key into the list
- load all data directly into Line Items in the data module
- Import time: 13-20 minutes (about 1 minute for the list and 12-19 for the module)
- Workspace consumption: 7.74GB
I thought this would be low-hanging fruit to improve performance of this Import process as well as model load time, model size, and presumably model performance. What I'm seeing instead is a surprising hit to the very import performance I'm targetting to improve. This brings into question model load time impacts and model performance impacts, leaving only model size as a guaranteed marginal (less than expected) improvement.
Note that this is a flat load list, no hierarchy, so no summary level items aside from Top Level that would bloat the list.
@rob_marshall is my methodology flawed? Were my expectations inaccurate? Other thoughts?0
I see where you are coming from and the extra time is for the additional action to kick off (in the 2nd test, you will have to have 2 actions: 1 to load the list and the 2nd to load the line items).
A couple of questions:
- what is the unique key of the list?
- Do you have Time/Period as part of the code or part of the list properties? If so, by removing the Time/Period from the list will give you the biggest performance increase. Take a look at this article if you haven't already (https://community.anaplan.com/discussion/48866/oeg-best-practice-data-hubs-purpose-and-peak-performance/p1) where it speaks to having Time as part of the code/property
- Along with the bullet point above, do you have transactional data as property?
- Regarding the loading of data being longer in the 2nd test, what are you loading to the modules? Can that data be logically figured out based on the code?
- When loading data, are loading everything as a TEXT format? Text will perform worse than list formatted line items.
- In the 2nd test, was this the same model as Test 1? I ask because there could have a model save in the second test/2nd action which is inflating those numbers
Hopefully, the above will help.
Thank you for quick response.
- These data are being pulled from a Data Hub into a spoke.
- The primary key for Test 1 is the numbered list Name (#123456789) from the Data Hub list, loaded into a text formatted property in the spoke list with uniqueness on the import set to combination of properties using just that one property. This mimics the setup for the production list.
- The primary key for Test 2 strips the '#' from the beginning of the numbered list Name from the Data Hub and uses the resulting number (123456789) as the CODE in the spoke list.
- There are no Time formatted properties nor line items, though 3 Date formatted ones.
- I'm not quite sure what you mean by transactional data in this context. The data are SFDC demand exports. Not as transactional as GL data. Uniqueness going into the Data Hub is complex, but coming from Data Hub to spoke is simple as noted above.
- Most of the Properties/Line Items are currently Text formatted.
- Test 1 has 16 Text formatted, 2 Number formatted, 2 List formatted, and 3 Date formatted properties.
- Test 2 has 15 Text formatted, 2 Number formatted, 2 List formatted, and 3 Date formatted line items (plus the Code for the list).
- I'm keeping everything apples to apples.
- List import comes from a view containing only the Code.
- Module imports matches on Code and the view contains exactly the required columns.
- There's no filtering applied to either view.
- Test 2 has two imports pulling from the same source module in Data Hub.
- Tests are being performed in the same dev model as the original list.
- I performed half a dozen iterations of the testing, thus the time ranges.
- These data are being pulled from a Data Hub into a spoke.
Thanks for the clarification that SFDC is your "source" to the data hub and you are loading from Data Hub to Spoke.
- With that, you can use the RecordID (the 18 character internal SFDC member) which will make it unique. This will alleviate you from using Combination of Properties as that is bad for performance. Why? Every time Anaplan is about to load a member to that list, it must read the entire list to make sure it is not duplicated. In your case, it is doing this read about 12 million times. This is the main reason we frown on using Combination of Properties.
- Transactional data - I used this term to convey some kind of "monies" that is being stored in the list, like 10 or 12 or 20. Basically, the metric you are bringing in.
- What do the dates signify that you are loading?
- If this is a spoke model, why the use of the Text as properties vs list formatted line items?
As I mentioned before, the only way you are going to be able to get a true apples to apples test (or maybe I wasn't clear on this), is if you have one model for test 1 and one model for test 2 because of the change log logging all of the inserts and if you are using the same model, then all of the deletes as well. I am betting there was a save during the 2nd test which added to the overall time.
There is no denying that running one action will be faster vs multiple actions (list and SYS Module), but the model open time will be better without the list properties.0
@rob_marshall I appreciate you trying to get to the root causes of original performance problems. However, I'm trying to understand the A/B testing here. On the A/B testing front:
- Test 2 is actually executing first in my test process. If there is a save delaying it, the save is endemic to Test 2, not caused by it following Test 1. From the multiple iterations I've run (including a 3rd test scenario executing last), I don't think that a save is skewing any results.
- The list create/update of Test 2 takes only about 1 minute (it's basically the same for the create and the update, though the update is slightly but not materially slower). So, the list create/update populating all of the properties in Test 1 executes in roughly half the time as just the module load action in Test 2. This surprised me.
Thank you for clarifying that the model open time should be better despite the import being worse.
On the underlying data front:
- Noted about the SFDC 18 character RecordID. We are working to switch to that, though there's a wrinkle that doesn't pertain to this A/B testing.
- There are 2 number fields. Nothing is happening with them during these tests. Nothing is downstream of either test module. They are Age and Volume and are used in and downstream of the production list/module.
- Similarly, the 3 date fields are not doing anything in the A/B testing, but two of them do get converted to periods in the production module.
There are a number of optimizations that I'm investigating:
- eliminate a bunch of unused properties. This is baked into the testing results I cited. The 3rd test scenario mentioned above included a full copy of the production list including another dozen or so properties that we don't use in the spoke. It performed slightly closer to Test 1 than to Test 2.
- eliminate a bunch of recently deprecated line items in the module.
- convert to the CODE used in Test 2 and eliminate the "combination of properties".
- convert eligible Text properties to List formatted properties.
- eliminate the module and list entirely from the spoke, and pull in the required summaries from the Data Hub instead. (biggest effort, greatest potential impact, feasibility uncertain)
In the article I posted earlier, there is a graphic that shows importing to the list and properties is faster than having the multiple actions within the process from a time perspective, but the model opening is also longer than "calculating" the attributes from the code.
In your case, you might not be able to calculate the attributes from the code as you might be moving to RecordID. One last thing, to consider in your spoke, do you really need all 12 million records in your spoke or can you aggregate the data to higher level?1