Predictive Insights (PI) - Model Tracking and Performance

NatalieC · June 2023

Introduction

Predictive Insights model performance tracking can be broken out into model tracking and model performance evaluation. The goal for both is to ensure that the model is performing as expected after being in use and is still relevant for the business use case it was built for. To so do involves looking at the model’s score or rank distribution as well as analysis of conversion at relevant stages. If the model is not performing well or as expected after evaluation, then a new model would be needed.

In tracking model performance, there are two components to consider:

Model Tracking

Model tracking is the monitoring of a model’s score or rank distribution. It should follow the same distribution as when the model was first built. This should be done on an on-going basis when possible. Ideally monthly or depending on the volume of accounts, either weekly or quarterly.

To track the score and rank distribution of a model, simply take all unique accounts scored since the model was created or since a review was last done and check the distribution of score and rank. The accounts used in the analysis should not contain ones found in the original customer and prospect dataset. The accounts should be considered new to the model. The results should follow closely the distribution of the model when first built.

Figure 1: Rank distribution should be similar to model's when first built

If the distribution shows significant deviation from the original distribution, review the accounts to see if there are any differences that can explain the changes. Possible causes include different account source and stages of accounts. If such changes do exist, it would be a good time to consider if the model is still relevant. Keep in mind the accounts in the prospect dataset should reflect accounts you plan to score against the model. If source or stage of prospects have changed significantly since the model was built, making the accounts being scored vastly different from those of the original prospect dataset, the model as is it now might no longer be applicable to current business need and a new model might be warranted.

Performance Evaluation

Overview

Model performance evaluation is the detailed review and analysis of the accounts’ performance since the model was built. The exact metric to evaluate performance will depend on the use case of the model but common metrics include number of converted and conversion rate. Secondary metrics such as velocity and ARR as considered as well. Depending on the volume of accounts and opportunities, it can be beneficial to perform checks monthly, quarterly, or bi-annually.

Data and Setup

To evaluate the performance of the model, the following dimensions are needed:

Required

Optional

Account name/ID

Current stage and date

Score and or rank

Timestamp

Historical stages and dates

Created date

Firmographics/Segments

Figure 2: Performance Data Module

While not all dimensions are required, having additional dimensions will provide a more accurate analysis of model performance. If available, analysis should be based on the score of accounts when it was worked. Else, scoring accounts at the time of analysis could lead to less accurate results due to changes in score.

Only accounts that was not used in the customer dataset for model creation should be considered for the analysis. Since there can be multiple contacts and opportunities from one account, there can be several ways to consider them in calculations for analysis. We recommend for the purpose of analysis, to consider an account as worked if at least one contact is worked. In other words, analysis should be done on the account level and not at the contact or lead level.

Analysis

To start, consider the use case for which the model was built for. The calculations for the various use cases will be the same but the exact metric and criteria will differ based on the model use case. If the model was built for prospect to close won, then the focus will be on the number of the conversions from prospect to close won by rank. Similarly, if the model was built for prospect to MQL, then the metrics to consider will be the conversion by rank of accounts from prospects to MQL.

Keep in mind the if the accounts have been acted on based on prioritization, you would expect to see higher rank accounts to outperformed lower ranked ones. This is due to more time and efforts are put into opportunities and contacts from high ranked accounts. Nevertheless, the analysis can still provide valuable insights into the general trends and performance of the model.

Taking the example of propensity to convert to close won. The goal is to produce a table similar to one below:

Figure 3: Performance Results - Conversion

Figure 4: Performance Results - ARR and Velocity

Analysis – Metrics

The tables above show conversion rate by rank for prospect accounts to close won.

Closed Won – the total number of unique won accounts
All Accounts – the total of unique accounts scored for each rank
Conversion – the percentage of the accounts that closed for each rank
- Conversion = ( Closed-Won / All Accounts )
Lift – conversion of a given rank against overall conversion
- Lift = ( Conversion for given Rank / Overall Conversion Rate )
ARR – average deal size or other metric that show value of account

If a model is performing well, we should expect to see higher ranked or scored accounts to have higher conversion rate. And often we would expect to see higher ranked accounts to have perform better in terms of velocity and ARR even though that was not the main use case of the model. If possible, compare the metrics before and after the model was built to see if higher ranked accounts perform better than previous baselines. Previous baselines can be conversions on the overall conversion for sales. If pre-model baseline is not available, this would be a good time to start tracking model performance to be used for comparison in the future.

Though the example shows metrics by rank, if the default rank cutoff has been changed, it might make more sense to conduct the analysis by score instead. Grouping the scores into equal bins, e.g. 91 - 100, 81 - 90, etc., can provide a more accurate analysis while using rank can mean an easier setup.

Even if there are not quite enough close won accounts for the analysis, review of performance earlier in the sales funnel should reflect similar trends. Looking at prospects to MQL or late stage opportunities should still show higher ranked accounts with better conversion. This can make sense for offerings with longer sales cycles. Evaluating the model’s performance early on will allow time to make any adjustment as needed.

If the analysis shows the model is performing as expected, you can still consider Refreshing the model to ensure it is current. Otherwise, if performance is flat, or lower ranked accounts are outperforming other accounts, review both the model and the accounts to see if there are adjustments that needs to be made when creating a new model.

One thing to keep in mind is that there could be factors that impact the performance that are not related to the model itself. For instance, for a propensity to close model, if late-stage accounts other than prospects were scored, that would impact the scores and the implementation of the model.