PlanIQ - Deep dive on the algorithms under the hood
- Baseline time series algorithms
- Advanced statistical time series algorithms
- Flexible local algorithms
- Neural network algorithms
Companies often develop plans and business strategies to drive decisions and actions around finance, operations, supply chain and other areas. They do this based on how they perceive the future, under the conditions of uncertainty and unknowns. Assessing the future and developing these strategies is done so that actions can be taken in the present to better prepare for the future. To this end, companies can collect and analyze past data and other types of data to generate predictions of the future. Failing to plan for the future or creating predictions with a high rate of inaccuracies can be costly in terms of under-forecasting or over-forecasting, misallocation of resources such as time and capital, missed opportunities etc. The benefits of accurate forecasts include, for example, reducing waste, cutting costs, minimizing storage expenses, maximizing resource utilization, and ensuring that no sale opportunity is lost due to insufficient inventory.
In this article, we will review the algorithms that power modern time series forecasting with PlanIQ. These algorithms range from traditional statistical algorithms such as Autoregressive Integrated Moving Average (ARIMA), to those based on complex neural network algorithms like DeepAR+.
Before describing the algorithms in detail, it is important to understand the types of datasets that these algorithms can accept. In addition to historical values, datasets can also include related time series data and item attributes. Related time series is time-dependent data that has some correlation with the target values and may help improve the accuracy of the forecast. Examples include features such as price, promotions and weather. Item attributes are categorical features that provide valuable context for the items in a historical data. Unlike related time series datasets, item attributes datasets provide static. That is, the data values remain constant over time, like item category or type.
Baseline time series algorithms
Baseline time series forecasting algorithms include ARIMA and ETS (Exponential Smoothing). They are commonly-used statistical algorithms for time-series forecasting and are especially useful for simple datasets with under 100 distinct periods. These algorithms work by attempting to ‘explain’ a given time series based on its own past values, so that the resulting equation can be used to forecast future values. The advantages of these baseline algorithms are that they are relatively quick and can establish a performance baseline. They are relevant when simple concepts of trend and seasonality are likely to explain most of the variance in the time series data. Since these models work on an item-based level, they do not support the use of related data or attributes; Other disadvantages are that they are not applicable in cold-start scenarios (forecasting with no historical data), and they do not perform hyperparameter optimization.
Advanced statistical time series algorithms
MVLR (Multi-variate linear regression) is a type of advanced statistical forecasting algorithm. It trains a model using a historical dataset and establishes a linear relationship between the input features.
The underlying assumption in the multivariate analysis is that the time-dependent features not only depend on their historical values but also exhibit dependency between them. Using these dependencies, MVLR models can not only generate fast and accurate feature-based forecast models, but also provide insights on how and which drivers most impact forecast results.
Under the hood, the input features MVLR employs are historical data, related data (optional) and synthetic data (automatically created by PlanIQ, based on either historical or related data). Examples of synthetic data includes trends such as exponential and linear, seasonality effects, as well as lagged values.
Flexible local algorithms
Another category of algorithms is flexible local algorithms, which include Prophet. It is based on an additive modeling procedure where non-linear trends are fit with yearly, weekly, and daily seasonality. It works best with time series with strong seasonal effects and several seasons of historical data, and is compatible with holidays or other previously known important, but irregular, events. Its advantages are that it is suitable for what-if analysis and is computational inexpensive. While Prophet supports some missing observations or outliers, it is not suitable for sparse datasets.
Neural network algorithms
The last category of time series forecasting algorithms is those based on artificial neural networks, including DeepAR+ and CNN-QR. They work by using deep learning architectures such as RNNs (recurrent neural networks) or CNNs (convolutional neural networks) to identify patterns in the historical datasets and predict the future. They work best with larger historical datasets, containing hundreds of time series. DeepAR+ accepts forward-looking related time series and item attributes. CNN-QR accepts item attributes and is the only forecast algorithm that accepts related time series data without future values. Advantages of the algorithms include the ability leverage data from similar time series while forecasting, and to use related datasets, as well as item attributes to identify and learn underlying structures. Furthermore, these algorithms are suitable for advanced forecasting scenarios such as sparse datasets, what-if analysis, and cold start scenarios. The disadvantages of these algorithms, however, are that they require larger historical datasets and may take longer to train and predict.
The time series forecasting field has recently generated interest, with many technology companies developing and open-sourcing algorithms. Like the recent significant improvement in performance in fields such as computer vision or natural language processing, we see that the performance of the time series forecasting algorithms has significantly improved and can be applied to a wide variety of real-world business applications.
PlanIQ employs open-sourced and proprietary algorithms to allow customers to generate accurate forecasts. Users can choose a specific algorithm for their use case or use Anaplan AutoML, which compares the performance of all algorithms automatically and picks the best performing algorithm for most of the items.
It should be noted that algorithm performance depends on the specific use case, datasets, context and historical patterns, so no single algorithm is better than the other. Therefore, we compare and find the best one for specific customer’s datasets, use cases, and items.
Got feedback on this content? Let us know in the comments below.
Contributing authors: Nitzan Paz, Christophe Keomanivong, Frankie Wolf, Timothy Brennan, Andrew Martin, Oren Tevet, and Evgenya Kontorovich.
This is great information! Thank you. Would it be possible to include a section that describes Amazon AutoML and Anaplan AutoML and their differences?0