How the data collection is used in forecast training and models

EvgyK
edited April 2023 in Groups

This article is meant to provide guidance that helps PlanIQ practitioners understand how the data in the "data collection" in PlanIQ is used in the training and running of a forecast model. This article also explains how PlanIQ users can change the data source for existing forecast models. 

 

What is the "data collection"?

Data collection is exactly what the name implies - a collection of data. Today, data collections reference data that is hosted in Anaplan and pulled into PlanIQ via their associated export actions. When data collections are created, PlanIQ exports the relevant data sets from Anaplan and analyzes them. This analysis allows PlanIQ to make sure that the data is properly formatted, to look for any possible issues or errors in the data, and to identify which algorithms and horizons can be supported for the given data. This process eliminates a significant chunk of manual data prep time that is required for more technical intelligent forecasting tools. 

This checkpoint allows PlanIQ users to make sure everything is in order before running a forecast. If the process finds any issues, the user can always update the data in Anaplan and re-run data assembly of the data collection to resolve the problems. 

Additionally, during this process, PlanIQ is taking note of the structure of data in the data collection and it expects the structure of data (in terms of dimensions and line items) to remain fixed for the lifecycle of the data collection. We recommend leveraging Anaplan views in order to simplify the process of data structure management.

What happens with the data when the "forecast model" is being created?

When forecast models are created, PlanIQ is training the algorithms based on the historical data. During this process, PlanIQ pulls the latest data from the data collection. This means that even if the data collection was created some time ago, PlanIQ will always train its models on the most up to date actuals. 

A tip for building PlanIQ into your planning process: If you want to perform a series of rolling forecasts, it's better to train all the models from the same "point in time" of the rolling forecast.

 

What happens with the data when the forecast action is run?

When forecast actions are run, PlanIQ pulls the latest data from the data collection. So, as new actuals are added to the source model in Anaplan, the forecast will be generated using that data. 

As part of the forecast action execution, PlanIQ compares its previous forecasts with new actuals to make sure that model accuracy stays consistent. You will receive a warning message when you run the forecast action if PlanIQ sees a degradation in the PlanIQ model quality metric (see Anapedia article on PlanIQ metrics).

 

What if I want to train a new forecast model?

There might be several reasons to train a new forecast model, including: 

  • A long time has passed since the previous model has been used and you want to retrain the model with the latest data.
  • The engine noticed changes in forecast quality and you want to re-train the model to review and update the quality metrics.
  • You want to use a new data collection for forecast and therefore need to retrain the model with the new data.

To retrain the model, you must create a new version of the model. This can easily be done by duplicating the forecast model in the PlanIQ interface. PlanIQ will pre-populate all the configuration options so you only need to provide a new name for the model. If you'd like to change the data source for the model, you can point it to a different data collection. Once your new model has been trained, if your forecast action is already operationalized and/or scheduled, you can edit it and point it to the new forecast model. Voila!

What happens when I need to change the structure of the data collection?

PlanIQ assumes that structure of data in the data collection is fixed and it cannot be changed. To leverage data in a different structure or format new data collection needs to be created and then used to train new forecast models.

Categories