Forecasters¶
LSTMForecaster¶
Please refer to BasePytorchForecaster for other methods other than initialization.
Long short-term memory(LSTM) is a special type of recurrent neural network(RNN). We implement the basic version of LSTM - VanillaLSTM for this forecaster for time-series forecasting task. It has two LSTM layers, two dropout layer and a dense layer.
For the detailed algorithm description, please refer to here.
- class zoo.chronos.forecaster.lstm_forecaster.LSTMForecaster(past_seq_len, input_feature_num, output_feature_num, hidden_dim=32, layer_num=1, dropout=0.1, optimizer='Adam', loss='mse', lr=0.001, metrics=['mse'], seed=None, distributed=False, workers_per_node=1, distributed_backend='torch_distributed')[source]¶
Bases:
zoo.chronos.forecaster.base_forecaster.BasePytorchForecaster
Example
>>> #The dataset is split into x_train, x_val, x_test, y_train, y_val, y_test >>> forecaster = LSTMForecaster(past_seq_len=24, input_feature_num=2, output_feature_num=2, ...) >>> forecaster.fit((x_train, y_train)) >>> forecaster.to_local() # if you set distributed=True >>> test_pred = forecaster.predict(x_test) >>> test_eval = forecaster.evaluate((x_test, y_test)) >>> forecaster.save({ckpt_name}) >>> forecaster.restore({ckpt_name})
Build a LSTM Forecast Model.
- Parameters
past_seq_len – Specify the history time steps (i.e. lookback).
input_feature_num – Specify the feature dimension.
output_feature_num – Specify the output dimension.
hidden_dim – int or list, Specify the hidden dim of each lstm layer. The value defaults to 32.
layer_num – Specify the number of lstm layer to be used. The value defaults to 1.
dropout – int or list, Specify the dropout close possibility (i.e. the close possibility to a neuron). This value defaults to 0.1.
optimizer – Specify the optimizer used for training. This value defaults to “Adam”.
loss – Specify the loss function used for training. This value defaults to “mse”. You can choose from “mse”, “mae” and “huber_loss”.
lr – Specify the learning rate. This value defaults to 0.001.
metrics – A list contains metrics for evaluating the quality of forecasting. You may only choose from “mse” and “mae” for a distributed forecaster. You may choose from “mse”, “me”, “mae”, “mse”,”rmse”,”msle”,”r2”, “mpe”, “mape”, “mspe”, “smape”, “mdape” and “smdape” for a non-distributed forecaster.
seed – int, random seed for training. This value defaults to None.
distributed – bool, if init the forecaster in a distributed fashion. If True, the internal model will use an Orca Estimator. If False, the internal model will use a pytorch model. The value defaults to False.
workers_per_node – int, the number of worker you want to use. The value defaults to 1. The param is only effective when distributed is set to True.
distributed_backend – str, select from “torch_distributed” or “horovod”. The value defaults to “torch_distributed”.
Seq2SeqForecaster¶
Please refer to BasePytorchForecaster for other methods other than initialization.
Seq2SeqForecaster wraps a sequence to sequence model based on LSTM, and is suitable for multivariant & multistep time series forecasting.
- class zoo.chronos.forecaster.seq2seq_forecaster.Seq2SeqForecaster(past_seq_len, future_seq_len, input_feature_num, output_feature_num, lstm_hidden_dim=64, lstm_layer_num=2, teacher_forcing=False, dropout=0.1, optimizer='Adam', loss='mse', lr=0.001, metrics=['mse'], seed=None, distributed=False, workers_per_node=1, distributed_backend='torch_distributed')[source]¶
Bases:
zoo.chronos.forecaster.base_forecaster.BasePytorchForecaster
Example
>>> #The dataset is split into x_train, x_val, x_test, y_train, y_val, y_test >>> forecaster = Seq2SeqForecaster(past_seq_len=24, future_seq_len=2, input_feature_num=1, output_feature_num=1, ...) >>> forecaster.fit((x_train, y_train)) >>> forecaster.to_local() # if you set distributed=True >>> test_pred = forecaster.predict(x_test) >>> test_eval = forecaster.evaluate((x_test, y_test)) >>> forecaster.save({ckpt_name}) >>> forecaster.restore({ckpt_name})
Build a TCN Forecast Model.
TCN Forecast may fall into local optima. Please set repo_initialization to False to alleviate the issue. You can also change a random seed to work around.
- Parameters
past_seq_len – Specify the history time steps (i.e. lookback).
future_seq_len – Specify the output time steps (i.e. horizon).
input_feature_num – Specify the feature dimension.
output_feature_num – Specify the output dimension.
lstm_hidden_dim – LSTM hidden channel for decoder and encoder. The value defaults to 64.
lstm_layer_num – LSTM layer number for decoder and encoder. The value defaults to 2.
teacher_forcing – If use teacher forcing in training. The value defaults to False.
dropout – Specify the dropout close possibility (i.e. the close possibility to a neuron). This value defaults to 0.1.
optimizer – Specify the optimizer used for training. This value defaults to “Adam”.
loss – Specify the loss function used for training. This value defaults to “mse”. You can choose from “mse”, “mae” and “huber_loss”.
lr – Specify the learning rate. This value defaults to 0.001.
metrics – A list contains metrics for evaluating the quality of forecasting. You may only choose from “mse” and “mae” for a distributed forecaster. You may choose from “mse”, “me”, “mae”, “mse”,”rmse”,”msle”,”r2”, “mpe”, “mape”, “mspe”, “smape”, “mdape” and “smdape” for a non-distributed forecaster.
seed – int, random seed for training. This value defaults to None.
distributed – bool, if init the forecaster in a distributed fashion. If True, the internal model will use an Orca Estimator. If False, the internal model will use a pytorch model. The value defaults to False.
workers_per_node – int, the number of worker you want to use. The value defaults to 1. The param is only effective when distributed is set to True.
distributed_backend – str, select from “torch_distributed” or “horovod”. The value defaults to “torch_distributed”.
TCNForecaster¶
Please refer to BasePytorchForecaster for other methods other than initialization.
Temporal Convolutional Networks (TCN) is a neural network that use convolutional architecture rather than recurrent networks. It supports multi-step and multi-variant cases. Causal Convolutions enables large scale parallel computing which makes TCN has less inference time than RNN based model such as LSTM.
- class zoo.chronos.forecaster.tcn_forecaster.TCNForecaster(past_seq_len, future_seq_len, input_feature_num, output_feature_num, num_channels=[30, 30, 30, 30, 30, 30, 30], kernel_size=3, repo_initialization=True, dropout=0.1, optimizer='Adam', loss='mse', lr=0.001, metrics=['mse'], seed=None, distributed=False, workers_per_node=1, distributed_backend='torch_distributed')[source]¶
Bases:
zoo.chronos.forecaster.base_forecaster.BasePytorchForecaster
Example
>>> #The dataset is split into x_train, x_val, x_test, y_train, y_val, y_test >>> forecaster = TCNForecaster(past_seq_len=24, future_seq_len=5, input_feature_num=1, output_feature_num=1, ...) >>> forecaster.fit((x_train, y_train)) >>> forecaster.to_local() # if you set distributed=True >>> test_pred = forecaster.predict(x_test) >>> test_eval = forecaster.evaluate((x_test, y_test)) >>> forecaster.save({ckpt_name}) >>> forecaster.restore({ckpt_name})
Build a TCN Forecast Model.
TCN Forecast may fall into local optima. Please set repo_initialization to False to alleviate the issue. You can also change a random seed to work around.
- Parameters
past_seq_len – Specify the history time steps (i.e. lookback).
future_seq_len – Specify the output time steps (i.e. horizon).
input_feature_num – Specify the feature dimension.
output_feature_num – Specify the output dimension.
num_channels – Specify the convolutional layer filter number in TCN’s encoder. This value defaults to [30]*7.
kernel_size – Specify convolutional layer filter height in TCN’s encoder. This value defaults to 3.
repo_initialization – if to use framework default initialization, True to use paper author’s initialization and False to use the framework’s default initialization. The value defaults to True.
dropout – Specify the dropout close possibility (i.e. the close possibility to a neuron). This value defaults to 0.1.
optimizer – Specify the optimizer used for training. This value defaults to “Adam”.
loss – Specify the loss function used for training. This value defaults to “mse”. You can choose from “mse”, “mae” and “huber_loss”.
lr – Specify the learning rate. This value defaults to 0.001.
metrics – A list contains metrics for evaluating the quality of forecasting. You may only choose from “mse” and “mae” for a distributed forecaster. You may choose from “mse”, “me”, “mae”, “mse”,”rmse”,”msle”,”r2”, “mpe”, “mape”, “mspe”, “smape”, “mdape” and “smdape” for a non-distributed forecaster.
seed – int, random seed for training. This value defaults to None.
distributed – bool, if init the forecaster in a distributed fashion. If True, the internal model will use an Orca Estimator. If False, the internal model will use a pytorch model. The value defaults to False.
workers_per_node – int, the number of worker you want to use. The value defaults to 1. The param is only effective when distributed is set to True.
distributed_backend – str, select from “torch_distributed” or “horovod”. The value defaults to “torch_distributed”.
TCMFForecaster¶
Analytics Zoo Chronos TCMFForecaster provides an efficient way to forecast high dimensional time series.
TCMFForecaster is based on DeepGLO algorithm, which is a deep forecasting model which thinks globally and acts locally. You can refer to the deepglo paper for more details.
TCMFForecaster supports distributed training and inference. It is based on Orca PyTorch Estimator, which is an estimator to do PyTorch training/evaluation/prediction on Spark in a distributed fashion. Also you can choose to enable distributed training and inference or not.
Remarks:
You can refer to TCMFForecaster installation to install required packages.
Your operating system (OS) is required to be one of the following 64-bit systems: Ubuntu 16.04 or later and macOS 10.12.6 or later.
- class zoo.chronos.forecaster.tcmf_forecaster.TCMFForecaster(vbsize=128, hbsize=256, num_channels_X=[32, 32, 32, 32, 32, 1], num_channels_Y=[16, 16, 16, 16, 16, 1], kernel_size=7, dropout=0.1, rank=64, kernel_size_Y=7, learning_rate=0.0005, normalize=False, use_time=True, svd=True)[source]¶
Bases:
zoo.chronos.forecaster.abstract.Forecaster
Example
>>> import numpy as np >>> model = TCMFForecaster() >>> fit_params = dict(val_len=12, start_date="2020-1-1", freq="5min", y_iters=1, init_FX_epoch=1, max_FX_epoch=1, max_TCN_epoch=1, alt_iters=2) >>> ndarray_input = {'id': np.arange(300), 'y': np.random.rand(300, 480)} >>> model.fit(ndarray_input, fit_params) >>> horizon = np.random.randint(1, 50) >>> yhat = model.predict(horizon=horizon) >>> model.save({tempdirname}) >>> loaded_model = TCMFForecaster.load({tempdirname}, is_xshards_distributed=False) >>> data_new = np.random.rand(300, horizon) >>> model.evaluate(target_value=dict({"y": data_new}), metric=['mse']) >>> model.fit_incremental({"y": data_new}) >>> yhat_incr = model.predict(horizon=horizon)
Build a TCMF Forecast Model.
- Parameters
vbsize – int, default is 128. Vertical batch size, which is the number of cells per batch.
hbsize – int, default is 256. Horizontal batch size, which is the number of time series per batch.
num_channels_X – list, default=[32, 32, 32, 32, 32, 1]. List containing channel progression of temporal convolution network for local model
num_channels_Y – list, default=[16, 16, 16, 16, 16, 1] List containing channel progression of temporal convolution network for hybrid model.
kernel_size – int, default is 7. Kernel size for local models
dropout – float, default is 0.1. Dropout rate during training
rank – int, default is 64. The rank in matrix factorization of global model.
kernel_size_Y – int, default is 7. Kernel size of hybrid model
learning_rate – float, default is 0.0005
normalize – boolean, false by default. Whether to normalize input data for training.
use_time – boolean, default is True. Whether to use time coveriates.
svd – boolean, default is False. Whether factor matrices are initialized by NMF
- fit(x, val_len=24, start_date='2020-4-1', freq='1H', covariates=None, dti=None, period=24, y_iters=10, init_FX_epoch=100, max_FX_epoch=300, max_TCN_epoch=300, alt_iters=10, num_workers=None)[source]¶
Fit the model on x from scratch
- Parameters
x – the input for fit. Only dict of ndarray and SparkXShards of dict of ndarray are supported. Example: {‘id’: id_arr, ‘y’: data_ndarray}, and data_ndarray is of shape (n, T), where n is the number f target time series and T is the number of time steps.
val_len – int, default is 24. Validation length. We will use the last val_len time points as validation data.
start_date – str or datetime-like. Start date time for the time-series. e.g. “2020-01-01”
freq – str or DateOffset, default is ‘H’ Frequency of data
covariates – 2-D ndarray or None. The shape of ndarray should be (r, T), where r is the number of covariates and T is the number of time points. Global covariates for all time series. If None, only default time coveriates will be used while use_time is True. If not, the time coveriates used is the stack of input covariates and default time coveriates.
dti – DatetimeIndex or None. If None, use default fixed frequency DatetimeIndex generated with start_date and freq.
period – int, default is 24. Periodicity of input time series, leave it out if not known
y_iters – int, default is 10. Number of iterations while training the hybrid model.
init_FX_epoch – int, default is 100. Number of iterations while initializing factors
max_FX_epoch – int, default is 300. Max number of iterations while training factors.
max_TCN_epoch – int, default is 300. Max number of iterations while training the local model.
alt_iters – int, default is 10. Number of iterations while alternate training.
num_workers – the number of workers you want to use for fit. If None, it defaults to num_ray_nodes in the created RayContext or 1 if there is no active RayContext.
- fit_incremental(x_incr, covariates_incr=None, dti_incr=None)[source]¶
Incrementally fit the model. Note that we only incrementally fit X_seq (TCN in global model)
- Parameters
x_incr – incremental data to be fitted. It should be of the same format as input x in fit, which is a dict of ndarray or SparkXShards of dict of ndarray. Example: {‘id’: id_arr, ‘y’: incr_ndarray}, and incr_ndarray is of shape (n, T_incr) , where n is the number of target time series, T_incr is the number of time steps incremented. You can choose not to input ‘id’ in x_incr, but if you do, the elements of id in x_incr should be the same as id in x of fit.
covariates_incr – covariates corresponding to x_incr. 2-D ndarray or None. The shape of ndarray should be (r, T_incr), where r is the number of covariates. Global covariates for all time series. If None, only default time coveriates will be used while use_time is True. If not, the time coveriates used is the stack of input covariates and default time coveriates.
dti_incr – dti corresponding to the x_incr. DatetimeIndex or None. If None, use default fixed frequency DatetimeIndex generated with the last date of x in fit and freq.
- evaluate(target_value, metric=['mae'], target_covariates=None, target_dti=None, num_workers=None)[source]¶
Evaluate the model
- Parameters
target_value – target value for evaluation. We interpret its second dimension of as the horizon length for evaluation.
metric – the metrics. A list of metric names.
target_covariates – covariates corresponding to target_value. 2-D ndarray or None. The shape of ndarray should be (r, horizon), where r is the number of covariates. Global covariates for all time series. If None, only default time coveriates will be used while use_time is True. If not, the time coveriates used is the stack of input covariates and default time coveriates.
target_dti – dti corresponding to target_value. DatetimeIndex or None. If None, use default fixed frequency DatetimeIndex generated with the last date of x in fit and freq.
num_workers – the number of workers to use in evaluate. If None, it defaults to num_ray_nodes in the created RayContext or 1 if there is no active RayContext.
- Returns
A list of evaluation results. Each item represents a metric.
- predict(horizon=24, future_covariates=None, future_dti=None, num_workers=None)[source]¶
Predict using a trained forecaster.
- Parameters
horizon – horizon length to look forward.
future_covariates – covariates corresponding to future horizon steps data to predict. 2-D ndarray or None. The shape of ndarray should be (r, horizon), where r is the number of covariates. Global covariates for all time series. If None, only default time coveriates will be used while use_time is True. If not, the time coveriates used is the stack of input covariates and default time coveriates.
future_dti – dti corresponding to future horizon steps data to predict. DatetimeIndex or None. If None, use default fixed frequency DatetimeIndex generated with the last date of x in fit and freq.
num_workers – the number of workers to use in predict. If None, it defaults to num_ray_nodes in the created RayContext or 1 if there is no active RayContext.
- Returns
A numpy ndarray with shape of (nd, horizon), where nd is the same number of time series as input x in fit_eval.
- is_xshards_distributed()[source]¶
Check whether model is distributed by input xshards.
- Returns
True if the model is distributed by input xshards
- classmethod load(path, is_xshards_distributed=False, minPartitions=None)[source]¶
Load a saved model.
- Parameters
path – The location you want to save the forecaster.
is_xshards_distributed – Whether the model is distributed trained with input of dict of SparkXshards.
minPartitions – The minimum partitions for the XShards.
- Returns
the model loaded
MTNetForecaster¶
MTNet is a memory-network based solution for multivariate time-series forecasting. In a specific task of multivariate time-series forecasting, we have several variables observed in time series and we want to forecast some or all of the variables’ value in a future time stamp.
MTNet is proposed by paper A Memory-Network Based Solution for Multivariate Time-Series Forecasting. MTNetForecaster is derived from tfpark.KerasMode, and can use all methods of KerasModel. Refer to tfpark.KerasModel API Doc for details.
For the detailed algorithm description, please refer to here.
- class zoo.chronos.forecaster.mtnet_forecaster.MTNetForecaster(target_dim=1, feature_dim=1, long_series_num=1, series_length=1, ar_window_size=1, cnn_height=1, cnn_hid_size=32, rnn_hid_sizes=[16, 32], lr=0.001, loss='mae', cnn_dropout=0.2, rnn_dropout=0.2, metric='mean_squared_error', uncertainty: bool = False)[source]¶
Bases:
zoo.chronos.forecaster.tfpark_forecaster.TFParkForecaster
Example
>>> #The dataset is split into x_train, x_val, x_test, y_train, y_val, y_test >>> model = MTNetForecaster(target_dim=1, feature_dim=x_train.shape[-1], long_series_num=6, series_length=2 ) >>> x_train_long, x_train_short = model.preprocess_input(x_train) >>> x_val_long, x_val_short = model.preprocess_input(x_val) >>> x_test_long, x_test_short = model.preprocess_input(x_test) >>> model.fit([x_train_long, x_train_short], y_train, validation_data=([x_val_long, x_val_short], y_val), batch_size=32, distributed=False) >>> predict_result = [x_test_long, x_test_short]
Build a MTNet Forecast Model.
- Parameters
target_dim – the dimension of model output
feature_dim – the dimension of input feature
long_series_num – the number of series for the long-term memory series
series_length – the series size for long-term and short-term memory series
ar_window_size – the auto regression window size in MTNet
cnn_hid_size – the hidden layer unit for cnn in encoder
rnn_hid_sizes – the hidden layers unit for rnn in encoder
cnn_height – cnn filter height in MTNet
metric – the metric for validation and evaluation
uncertainty – whether to enable calculation of uncertainty
lr – learning rate
loss – the target function you want to optimize on
cnn_dropout – the dropout possibility for cnn in encoder
rnn_dropout – the dropout possibility for rnn in encoder
- preprocess_input(x)[source]¶
The original rolled features needs an extra step to process. This should be called before train_x, validation_x, and test_x
- Parameters
x – the original samples from rolling
- Returns
a tuple (long_term_x, short_term_x) which are long term and short term history respectively
ARIMAForecaster¶
AutoRegressive Integrated Moving Average (ARIMA) is a class of statistical models for analyzing and forecasting time series data. It consists of 3 components: AR (AutoRegressive), I (Integrated) and MA (Moving Average). In ARIMAForecaster we use the SARIMA model (Seasonal ARIMA), which is an extension of ARIMA that additionally supports the direct modeling of the seasonal component of the time series.
- class zoo.chronos.forecaster.arima_forecaster.ARIMAForecaster(p=2, q=2, seasonality_mode=True, P=3, Q=1, m=7, metric='mse')[source]¶
Bases:
zoo.chronos.forecaster.abstract.Forecaster
Example
>>> #The dataset is split into data, validation_data >>> model = ARIMAForecaster(p=2, q=2, seasonality_mode=False) >>> model.fit(data, validation_data) >>> predict_result = model.predict(horizon=24)
Build a ARIMA Forecast Model. User can customize p, q, seasonality_mode, P, Q, m, metric for the ARIMA model, the differencing term (d) and seasonal differencing term (D) are automatically estimated from the data. For details of the ARIMA model hyperparameters, refer to https://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.ARIMA.html#pmdarima.arima.ARIMA.
- Parameters
p – hyperparameter p for the ARIMA model.
q – hyperparameter q for the ARIMA model.
seasonality_mode – hyperparameter seasonality_mode for the ARIMA model.
P – hyperparameter P for the ARIMA model.
Q – hyperparameter Q for the ARIMA model.
m – hyperparameter m for the ARIMA model.
metric – the metric for validation and evaluation. For regression, we support Mean Squared Error: (“mean_squared_error”, “MSE” or “mse”), Mean Absolute Error: (“mean_absolute_error”,”MAE” or “mae”), Mean Absolute Percentage Error: (“mean_absolute_percentage_error”, “MAPE”, “mape”) Cosine Proximity: (“cosine_proximity”, “cosine”)
- fit(data, validation_data)[source]¶
Fit(Train) the forecaster.
- Parameters
data – A 1-D numpy array as the training data
validation_data – A 1-D numpy array as the evaluation data
- predict(horizon, rolling=False)[source]¶
Predict using a trained forecaster.
- Parameters
horizon – the number of steps forward to predict
rolling – whether to use rolling prediction
- evaluate(validation_data, metrics=['mse'], rolling=False)[source]¶
Evaluate using a trained forecaster.
- Parameters
validation_data – A 1-D numpy array as the evaluation data
metrics – A list contains metrics for test/valid data.
ProphetForecaster¶
Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.
For the detailed algorithm description, please refer to here.
- class zoo.chronos.forecaster.prophet_forecaster.ProphetForecaster(changepoint_prior_scale=0.05, seasonality_prior_scale=10.0, holidays_prior_scale=10.0, seasonality_mode='additive', changepoint_range=0.8, metric='mse')[source]¶
Bases:
zoo.chronos.forecaster.abstract.Forecaster
Example
>>> #The dataset is split into data, validation_data >>> model = ProphetForecaster(changepoint_prior_scale=0.05, seasonality_mode='additive') >>> model.fit(data, validation_data) >>> predict_result = model.predict(horizon=24)
Build a Prophet Forecast Model. User can customize changepoint_prior_scale, seasonality_prior_scale, holidays_prior_scale, seasonality_mode, changepoint_range and metric of the Prophet model, for details of the Prophet model hyperparameters, refer to https://facebook.github.io/prophet/docs/diagnostics.html#hyperparameter-tuning.
- Parameters
changepoint_prior_scale – hyperparameter changepoint_prior_scale for the Prophet model.
seasonality_prior_scale – hyperparameter seasonality_prior_scale for the Prophet model.
holidays_prior_scale – hyperparameter holidays_prior_scale for the Prophet model.
seasonality_mode – hyperparameter seasonality_mode for the Prophet model.
changepoint_range – hyperparameter changepoint_range for the Prophet model.
metric – the metric for validation and evaluation. For regression, we support Mean Squared Error: (“mean_squared_error”, “MSE” or “mse”), Mean Absolute Error: (“mean_absolute_error”,”MAE” or “mae”), Mean Absolute Percentage Error: (“mean_absolute_percentage_error”, “MAPE”, “mape”) Cosine Proximity: (“cosine_proximity”, “cosine”)
- fit(data, validation_data)[source]¶
Fit(Train) the forecaster.
- Parameters
data – training data, a pandas dataframe with Td rows, and 2 columns, with column ‘ds’ indicating date and column ‘y’ indicating value and Td is the time dimension
validation_data – evaluation data, should be the same type as data
- predict(horizon=1, freq='D', ds_data=None)[source]¶
Predict using a trained forecaster.
- Parameters
horizon – the number of steps forward to predict, the value defaults to 1.
freq – the freqency of the predicted dataframe, defaulted to day(“D”), the frequency can be anything from the pandas list of frequency strings here: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases
ds_data – a dataframe that has 1 column ‘ds’ indicating date.
- evaluate(data, metrics=['mse'])[source]¶
Evaluate using a trained forecaster.
- Parameters
data – evaluation data, a pandas dataframe with Td rows, and 2 columns, with column ‘ds’ indicating date and column ‘y’ indicating value and Td is the time dimension
metrics – A list contains metrics for test/valid data.
chronos.forecast.tfpark_forecaster¶
- class zoo.chronos.forecaster.tfpark_forecaster.TFParkForecaster[source]¶
Bases:
zoo.tfpark.model.KerasModel
,zoo.chronos.forecaster.abstract.Forecaster
Base class for TFPark KerasModel based Forecast models.
Build a tf.keras model. Turns the tf.keras model returned from _build into a tfpark.KerasModel
chronos.forecast.base_forecaster.BasePytorchForecaster¶
- class zoo.chronos.forecaster.base_forecaster.BasePytorchForecaster(**kwargs)[source]¶
Bases:
zoo.chronos.forecaster.abstract.Forecaster
Forecaster base model for lstm, mtnet, seq2seq and tcn forecasters.
- fit(data, epochs=1, batch_size=32)[source]¶
Fit(Train) the forecaster.
- Parameters
data –
The data support following formats:
1. a numpy ndarray tuple (x, y):x’s shape is (num_samples, lookback, feature_dim) where lookback and feature_dimshould be the same as past_seq_len and input_feature_num.y’s shape is (num_samples, horizon, target_dim), where horizon and target_dimshould be the same as future_seq_len and output_feature_num.2. a xshard item:each partition can be a dictionary of {‘x’: x, ‘y’: y}, where x and y’s shapeshould follow the shape stated before.epochs – Number of epochs you want to train. The value defaults to 1.
batch_size – Number of batch size you want to train. The value defaults to 32.
- Returns
Evaluation results on data.
- predict(data, batch_size=32)[source]¶
Predict using a trained forecaster.
if you want to predict on a single node(which is common practice), please call .to_local().predict(x, …)
- Parameters
data –
The data support following formats:
1. a numpy ndarray x:x’s shape is (num_samples, lookback, feature_dim) where lookback and feature_dimshould be the same as past_seq_len and input_feature_num.2. a xshard item:each partition can be a dictionary of {‘x’: x}, where x’s shapeshould follow the shape stated before.batch_size – predict batch size. The value will not affect predict result but will affect resources cost(e.g. memory and time).
- Returns
A numpy array with shape (num_samples, horizon, target_dim) if data is a numpy ndarray. A xshard item with format {‘prediction’: result}, where result is a numpy array with shape (num_samples, horizon, target_dim) if data is a xshard item.
- predict_with_onnx(data, batch_size=32, dirname=None)[source]¶
Predict using a trained forecaster with onnxruntime. The method can only be used when forecaster is a non-distributed version.
Directly call this method without calling build_onnx is valid and Forecaster will automatically build an onnxruntime session with default settings.
- Parameters
data –
The data support following formats:
1. a numpy ndarray x:x’s shape is (num_samples, lookback, feature_dim) where lookback and feature_dimshould be the same as past_seq_len and input_feature_num.batch_size – predict batch size. The value will not affect predict result but will affect resources cost(e.g. memory and time).
dirname – The directory to save onnx model file. This value defaults to None for no saving file.
- Returns
A numpy array with shape (num_samples, horizon, target_dim).
- evaluate(data, batch_size=32, multioutput='raw_values')[source]¶
Evaluate using a trained forecaster.
Please note that evaluate result is calculated by scaled y and yhat. If you scaled your data (e.g. use .scale() on the TSDataset) please follow the following code snap to evaluate your result if you need to evaluate on unscaled data.
if you want to evaluate on a single node(which is common practice), please call .to_local().evaluate(data, …)
>>> from zoo.orca.automl.metrics import Evaluator >>> y_hat = forecaster.predict(x) >>> y_hat_unscaled = tsdata.unscale_numpy(y_hat) # or other customized unscale methods >>> y_unscaled = tsdata.unscale_numpy(y) # or other customized unscale methods >>> Evaluator.evaluate(metric=..., y_unscaled, y_hat_unscaled, multioutput=...)
- Parameters
data –
The data support following formats:
1. a numpy ndarray tuple (x, y):x’s shape is (num_samples, lookback, feature_dim) where lookback and feature_dimshould be the same as past_seq_len and input_feature_num.y’s shape is (num_samples, horizon, target_dim), where horizon and target_dimshould be the same as future_seq_len and output_feature_num.2. a xshard item:each partition can be a dictionary of {‘x’: x, ‘y’: y}, where x and y’s shapeshould follow the shape stated before.batch_size – evaluate batch size. The value will not affect evaluate result but will affect resources cost(e.g. memory and time).
multioutput – Defines aggregating of multiple output values. String in [‘raw_values’, ‘uniform_average’]. The value defaults to ‘raw_values’.The param is only effective when the forecaster is a non-distribtued version.
- Returns
A list of evaluation results. Each item represents a metric.
- evaluate_with_onnx(data, batch_size=32, dirname=None, multioutput='raw_values')[source]¶
Evaluate using a trained forecaster with onnxruntime. The method can only be used when forecaster is a non-distributed version.
Directly call this method without calling build_onnx is valid and Forecaster will automatically build an onnxruntime session with default settings.
Please note that evaluate result is calculated by scaled y and yhat. If you scaled your data (e.g. use .scale() on the TSDataset) please follow the following code snap to evaluate your result if you need to evaluate on unscaled data.
>>> from zoo.orca.automl.metrics import Evaluator >>> y_hat = forecaster.predict(x) >>> y_hat_unscaled = tsdata.unscale_numpy(y_hat) # or other customized unscale methods >>> y_unscaled = tsdata.unscale_numpy(y) # or other customized unscale methods >>> Evaluator.evaluate(metric=..., y_unscaled, y_hat_unscaled, multioutput=...)
- Parameters
data –
The data support following formats:
1. a numpy ndarray tuple (x, y):x’s shape is (num_samples, lookback, feature_dim) where lookback and feature_dimshould be the same as past_seq_len and input_feature_num.y’s shape is (num_samples, horizon, target_dim), where horizon and target_dimshould be the same as future_seq_len and output_feature_num.batch_size – evaluate batch size. The value will not affect evaluate result but will affect resources cost(e.g. memory and time).
dirname – The directory to save onnx model file. This value defaults to None for no saving file.
multioutput – Defines aggregating of multiple output values. String in [‘raw_values’, ‘uniform_average’]. The value defaults to ‘raw_values’.
- Returns
A list of evaluation results. Each item represents a metric.
- save(checkpoint_file)[source]¶
Save the forecaster.
Please note that if you only want the pytorch model or onnx model file, you can call .get_model() or .export_onnx_file(). The checkpoint file generated by .save() method can only be used by .load().
- Parameters
checkpoint_file – The location you want to save the forecaster.
- load(checkpoint_file)[source]¶
restore the forecaster.
- Parameters
checkpoint_file – The checkpoint file location you want to load the forecaster.
- to_local()[source]¶
Transform a distributed forecaster to a local (non-distributed) one.
Common practice is to use distributed training (fit) and predict/ evaluate with onnx or other frameworks on a single node. To do so, you need to call .to_local() and transform the forecaster to a non- distributed one.
The optimizer is refreshed, incremental training after to_local might have some problem.
- Returns
a forecaster instance.
- build_onnx(thread_num=None, sess_options=None)[source]¶
Build onnx model to speed up inference and reduce latency. The method is Not required to call before predict_with_onnx, evaluate_with_onnx or export_onnx_file. It is recommended to use when you want to:
1. Strictly control the thread to be used during inferencing.2. Alleviate the cold start problem when you call predict_with_onnx for the first time.- Parameters
thread_num – int, the num of thread limit. The value is set to None by default where no limit is set.
sess_options – an onnxruntime.SessionOptions instance, if you set this other than None, a new onnxruntime session will be built on this setting and ignore other settings you assigned(e.g. thread_num…).
Example
>>> # to pre build onnx sess >>> forecaster.build_onnx(thread_num=1) # build onnx runtime sess for single thread >>> pred = forecaster.predict_with_onnx(data) >>> # ------------------------------------------------------ >>> # directly call onnx related method is also supported >>> pred = forecaster.predict_with_onnx(data)