PlanIQ – Considerations Before Starting to Forecast
Before starting to forecast with PlanIQ, there are a few topics we recommend considering around the data, properties of the forecast, and forecast evaluation. While it is important to define clear goals for each forecasting scenario, PlanIQ's forecast accuracy will mostly be determined by the quality of the available data. Addressing issues in the data, as well as making pertinent decisions about the forecast, can help maximize PlanIQ's value and avoid producing undesired and suboptimal results.
Below are the main topics to consider before starting to forecast with PlanIQ. Please regard the business problem you are trying to solve as you work your way through the list.
When deciding on the length of the forecast horizon (the length of time into the future for which a forecast is generated), avoid simply basing it on the allowed horizon for the chosen algorithm. While it is a good starting point, be sure to consider both the business use case and the available data. Start with the forecast horizon needed to achieve your goal and work from there. If projections for one year out are needed, verify that there is enough historical data to support it. If using related data, it's recommended to include forward-looking data that covers the forecast horizon. Reference the forecast horizon guide on Anapedia to learn more about the supported horizons for each algorithm.
Related Data and Attributes
Related data (variables that correlate with or affect the forecasted time series) and attributes (categorical information that groups historical data by their shared characteristics) are valuable information that could improve the quality and accuracy of forecasts. Before deciding whether to use such data as part of your forecast, consider the following.
The most important thing is the data itself. Consider if the data adds information that is relevant for the forecast. For instance, related data which displays patterns that correspond to historical data could lead to more accurate forecasts. However, having constant or near-constant values in related or attribute data would have limited value and might limit the algorithms you can use. PlanIQ's deep learning algorithms, CNN-QR and DeepAR+, could also be negatively impacted by too few or too many unique attribute values. The number of related data or attributes also matters, where more isn't always better. More information on using attributes can be found on Anaplan Community.
Mean and Variance of Related Data
The nature of mean and variance of the related data is also worth considering. Is the entire dataset available representative of your business now and in the future? Old patterns do not necessarily represent more recent behavior of the forecast items, and therefore including all available related data might result in a less accurate forecast. If there have been significant changes in your operation that affect related values, it might make sense to only use more recent data.
Zeros in Related Data
Also worth considering is the number of zeros in the related data. Too many zeros will prevent a forecast from being generated or will result in a poor forecast. It's important to understand the cause for the high portion of zeros to counter with the appropriate solution. If the data is too sparse (time series where many of the values are zero), consider the frequency and/or granularity of the data and the desired forecast. Please refer to the sections on data frequency, data granularity, and incomplete data for more information.
The forecast frequency should also be decided before starting with the forecasting process in PlanIQ. The business goal and whether the historical and related data sets support the frequency needed should both be considered. If the forecast frequency needed is different from what the data set can support, consider the minimum (lowest resolution) frequency the forecast is needed at to still be able to achieve your business goal. Other considerations for data frequency include:
Sparsity of Data Set
If a data set is too sparse at the current frequency, either overall or for certain forecast items, this can lead to lower quality forecast and difficulty in using certain evaluation metrics. If aggregating the data to a lower frequency is a possible alternative given your business needs, forecasting with more dense data will generally reduce variance and improve the forecast quality. If only certain forecast items are affected by a sparsity issue, consider removing the items or create a separate forecast model for them.
Frequency Mismatch of Historical and Related Data
In the presence of related data, PlanIQ will forecast at the frequency of the related data. If the historical and related data are on different time scales, PlanIQ is only able to aggregate the historical data to match the related if the latter is on a less frequent time scale. PlanIQ will not change historical data to match the related data otherwise. Consider what is the business goal for the forecast. Will forecasting at the related data frequency make sense or are you able to break out the historical data. If neither are options, does forgoing related data produce acceptable forecasts. Note that related data can be aggregated in Anaplan before being brought into PlanIQ.
For more on managing null values, refer to the best practice guide on Anaplan Community.
The granularity of the data sets presents similar challenges as data frequency. The main difference being focused at the level at which items are forecasted at instead of the time scale or frequency. Instead of considering forecasting at a week versus month level, granularity looks at the SKU or product category level. The main issue is the same, whether the data supports forecasting at the level the business use case calls for, and if not, can the granularity of the data sets be changed while still achieving the forecasting goal. It is important to keep in mind that unless you can use the forecast, lower granular data or transform it back into a useable higher granularity, it would be best to keep the data as is.
Incomplete data, like sparsity, deals with missing data. The difference is the issue centers around missing data due to unavailable data or changes in business. Common causes for incomplete data are new product introduction/cold start scenarios, obsolete items, and lack of related data in the forecast horizon.
New Product Introduction/Cold Start
Forecasting for new products or other items with no historical data will limit the algorithms that can be used to either DeepAR+ or CNN-QR. It would also call for the use of attributes and other requirements. For more on details, see the new product introduction page on Anaplan Community.
Whether to include obsolete products or items no longer necessary in historical data should be based on several factors:
- Potential value of the historical data for the item
- Reason the item is no longer needed
- If there are similar or replacement items to be forecasted
Also, obsolete items will count toward quota consumption. These factors should be considered as including these items could produce a weaker forecast by adding non-relevant information. Generally, unless there are replacement items for the obsolete products, it's better to remove them than keep them in the data.
No Related Data for Forecast Horizon
Related data can be a valuable source of information that adds to the accuracy and quality of the forecast. Not all algorithms can take advantage of it, and except for CNN-QR, those that do support it require forward-looking (future) related data values.
Historical and related data sets with many outliers will be difficult to predict. A quick check for outliers would be to plot the data sets and search for outliers visually. How to address outliers once identified will depend on the cause of the outlier. A data entry error versus something that occurs due to a known factor, such as a holiday, will require different ways to handle the value or values. For more on how to detect outliers and suggestions on adjustments, see the dealing with outliers article on Anaplan Community.
Evaluating the forecast accuracy will help determine the best algorithm to use and if data adjustments are needed to support the business use case. It can also be used to compare the performance of PlanIQ against historical baselines — if those are available. When evaluating a forecast, please note the following:
- Backtesting: Test data taken from historical data to evaluate how well the forecast performed.
- Accuracy metric: Deciding on an appropriate metric based on business use case and data considerations. Learn more about forecast evaluation on Anaplan Community.
- Weighting of SKUs, product lines or categories: Consider the relative importance of the forecast items in the data set. For instance, if 20% of the items drive 80% of the revenue, you may want to focus on those items when performing the forecast evaluation.
Great information here, thanks for pulling this together!1