Register

1.06-02 Don't use subsets on large lists

Lists, and subsets take up space within a model, so if you need multiple subsets of the same list, consider whether they would be better as separate lists.  This is especially valid if the lists do not overlap and they are being fed from a Data hub.  For overlapping subsets or if there is a need to “consolidate” the value back to the master list then subsets are a valid construct for model efficiency

0 Kudos
Comments

Rule 1.06-02 Don't use subsets on large listsIt is better to create a list on its own if the Subset is more than 75% of the list. This goes against “Performance” of PLANS if you wish to create subsets on large lists

Here is how it was done in Pre Planual Era: Without checking the size of the lists we used to create subsets thinking it saves space and helps in model optimization. Little did we know that there can be a performance hit because of such large subset and at the same time with no space saving. For Example List A with 10,000,000 transactions having a subset which has got 75% occupancy, subset used to be created thinking it saves space for 25% of list items.

What is wrong with this method? First we need to understand what subsets really are? Subsets are essentially the lists within lists. List Subset items consume as much space as List items do (which is roughly 500 bytes per item) even if that list or subset is not being used as a dimension in any module. When a large list with top level which has got one subset in it is being used in modules it impacts the Performance because the system has to aggregate the data not only for the lists but also for the subsets and re-aggregate in all those modules where this particular list and subsets are being used as dimensions. Performance takes a hit when you add or remove subset items from such lists

Also there is a myth that ALL subsets help in space optimization. That is not true. Here is the analysis on it

A List with 10,000,000 List items in it will contain 5,000,000,000 Bytes of space which is roughly equal to 4.7GB. If we add a subset to this list which has got 75% occupancy of the Original list meaning the subset will have atleast 7,500,000 list items in it and will consume additional 3,750,000,000 bytes of space which is roughly equal to 3.5GB. List which was originally consuming 4.7GB space is now consuming 8.2GB Space (4.7GB from Original list and 3.5GB from Subset). Model builders have to take a judicious call on this whether that subset can save 3.5GB in due course of model building which in turn will depend on how many times that subset will be referenced and on how many intersections. Let’s see what happens when this list and/or subset is being used as a dimension in any module.

 

 

Format

Space Used

If List Used

If Subset Used

Diff (In MB)

Line Item 1

Number

8 Bytes

         80,000,000

       60,000,000

20

Line item 2

Number

8 Bytes

         80,000,000

       60,000,000

20

Line item 3

Time Period

4 Bytes

         40,000,000

       30,000,000

10

Line item 4

Time Period

4 Bytes

         40,000,000

       30,000,000

10

Line Item 5

List

4 Bytes

         40,000,000

       30,000,000

10

 

       280,000,000

    210,000,000

70

   

Note: Based on Simple module having a single dimensions

 

As you can see using subset in a module saved 70MB of a space for 5 line items. This subset has to save 3.5GB of a space to Breakeven which in turn will depend on the number of times this subset is being dimensioned by line items/modules

Here is how it should be done in Planual Way: Create a different list altogether instead of a subset for large lists.

Advantages:

  1. System will not have to aggregate the data for List and Subset at the same time and for modules.
  2. Only one list will be impacted upon import

 

About the Author
  • Community Manager

    Anaplan Community Team!

Contributors
Labels (2)