OEG Best Practice: Filter best practice
Filters can be very useful in model building and are widely used, but they can come at the expense of performance—often very visible to users through their use on dashboards.
Performance can also hit imports and exports, which in turn may lead to the blocking of other activity, causing a poor perception of the model.
There are some very simple guidelines to design well-performing filters:
Using a single Boolean filter on a line item that does not have time or versions applied and does not have a summary is fastest
Try to create a Boolean line item that incorporates all the filter criteria you want to apply. This allows you to re-use the line item and combine a series of Boolean line items into a single Boolean for use in the filter.
For example, you may want to filter on three data points: Volume, Product Category, and Active Status. Volume is numeric, Product Category is a list formatted line item matching a user selection, and Active Status is a Boolean.
Create a line item called Filter with the following formula:
Volume > Min Vol AND Product Cat = User Selection.Category AND Active Status
Here’s a very simple example module to demonstrate this:
A Filter line item is added to represent all the filters we need on the view. Only the Filter line needs to be dimensioned by Users.
A User Selection module dimension only by Users is created to capture user-specific filter choices:
Here’s the data before we apply the filter: Here's the data with the filter applied:
A best practice suggestion would be to create a filter module and line items for each filter part. You may want other filters and you can then combine each filter as needed from this system module. This should reduce repetition and give you control over the filters to ensure they can all be Boolean.
What can make a filter slow?
The biggest performance hit for filters is when nesting dimensions on rows.
The performance loss is significantly increased by the number of nested dimensions and the number of levels they contain. With a flat list versus nested dimensions (filtering on the same number of items) the nested filter will be slower. This was tested with a 10,000,000 list versus 2 lists of 10 and 1,000,000 items as nested rows; the nested dimension filter was 40% slower.
Filtering on line items with a line item summary will be slow.
A numeric filter on 10,000,000 items can take less than a second, but with a summary will take at least five seconds.
Multiple filters will increase time.
This is especially significant if any of the preceding filters do not lower the load because they will take additional time to evaluate. If you do use multiple filter conditions, try to order them so the most effective filters are first. If a filter doesn’t often match on anything, evaluate whether it's even needed.
Hidden levels act as a filter.
If you hide levels on a composite list, this acts like a filter before any other filter is applied. The hiding does take time to process and will impact more depending on the number of levels and the size of the list.
Avoid nested rows for export views
Using nested rows can be a useful way to filter a complex set of data for export, but, as discussed above, the filter performance here can be poor.
The best way around this is to pivot the dimensions so there is only one dimension on rows and use the Tabular Multi Column export option with a Filter Row based on Boolean option.
Some extra filter tips include the following:
- Filter duration will affect saved views used in imports, so check the saved view open time to see the impact. This view open time will be on every use of the view, including imports or exports.
- If you need to filter on a specific list, create a subset of those items and create a new module dimensioned by the subset to view that data.
Author Mark Warren.