AutoTS

AutoTSEstimator

Automated TimeSeries Estimator for time series forecasting task. AutoTSEstimator will replace AutoTSTrainer in later version.

class zoo.chronos.autots.autotsestimator.AutoTSEstimator(model='lstm', search_space={}, metric='mse', loss=None, optimizer='Adam', past_seq_len=2, future_seq_len=1, input_feature_num=None, output_target_num=None, selected_features='auto', backend='torch', logs_dir='/tmp/autots_estimator', cpus_per_trial=1, name='autots_estimator', remote_dir=None)[source]

Bases: object

Automated TimeSeries Estimator for time series forecasting task, which supports TSDataset and customized data creator as data input on built-in model (only “lstm”, “tcn”, “seq2seq” for now) and 3rd party model.

Only backend=”torch” is supported for now. Customized data creator has not been fully supported by TSPipeline.

>>> # Here is a use case example:
>>> # prepare train/valid/test tsdataset
>>> autoest = AutoTSEstimator(model="lstm",
>>>                           search_space=search_space,
>>>                           past_seq_len=6,
>>>                           future_seq_len=1)
>>> tsppl = autoest.fit(data=tsdata_train,
>>>                     validation_data=tsdata_valid)
>>> tsppl.predict(tsdata_test)
>>> tsppl.save("my_tsppl")

AutoTSEstimator trains a model for time series forecasting. Users can choose one of the built-in models, or pass in a customized pytorch or keras model for tuning using AutoML.

Parameters
  • model – a string or a model creation function. A string indicates a built-in model, currently “lstm”, “tcn”, “seq2seq” are supported. A model creation function indicates a 3rd party model, the function should take a config param and return a torch.nn.Module (backend=”torch”) / tf model (backend=”keras”). If you use chronos.data.TSDataset as data input, the 3rd party should have 3 dim input (num_sample, past_seq_len, input_feature_num) and 3 dim output (num_sample, future_seq_len, output_feature_num) and use the same key in the model creation function. If you use a customized data creator, the output of data creator should fit the input of model creation function.

  • search_space – str or dict. hyper parameter configurations. For str, you can choose from “minimal”, “normal”, or “large”, each represents a default search_space for our built-in model with different computing requirement. For dict, Read the API docs for each auto model. Some common hyper parameter can be explicitly set in named parameter. search_space should contain those parameters other than the keyword arguments in this constructor in its key. If a 3rd parth model is used, then you must set search_space to a dict.

  • metric – String. The evaluation metric name to optimize. e.g. “mse”

  • loss – String or pytorch/tf.keras loss instance or pytorch loss creator function. The default loss function for pytorch backend is nn.MSELoss().

  • optimizer – String or pyTorch optimizer creator function or tf.keras optimizer instance.

  • past_seq_len – Int or or hp sampling function. The number of historical steps (i.e. lookback) used for forecasting. For hp sampling, see zoo.orca.automl.hp for more details. The values defaults to 2.

  • future_seq_len – Int. The number of future steps to forecast. The value defaults to 1.

  • input_feature_num – Int. The number of features in the input. The value is ignored if you use chronos.data.TSDataset as input data type.

  • output_target_num – Int. The number of targets in the output. The value is ignored if you use chronos.data.TSDataset as input data type.

  • selected_features – String. “all” and “auto” are supported for now. For “all”, all features that are generated are used for each trial. For “auto”, a subset is sampled randomly from all features for each trial. The parameter is ignored if not using chronos.data.TSDataset as input data type. The value defaults to “auto”.

  • backend – The backend of the auto model. We only support backend as “torch” for now.

  • logs_dir – Local directory to save logs and results. It defaults to “/tmp/autots_estimator”

  • cpus_per_trial – Int. Number of cpus for each trial. It defaults to 1.

  • name – name of the autots estimator. It defaults to “autots_estimator”.

  • remote_dir – String. Remote directory to sync training results and checkpoints. It defaults to None and doesn’t take effects while running in local. While running in cluster, it defaults to “hdfs:///tmp/{name}”.

fit(data, epochs=1, batch_size=32, validation_data=None, metric_threshold=None, n_sampling=1, search_alg=None, search_alg_params=None, scheduler=None, scheduler_params=None)[source]

fit using AutoEstimator

Parameters
  • data – train data. For backend of “torch”, data can be a TSDataset or a function that takes a config dictionary as parameter and returns a PyTorch DataLoader. For backend of “keras”, data can be a TSDataset.

  • epochs – Max number of epochs to train in each trial. Defaults to 1. If you have also set metric_threshold, a trial will stop if either it has been optimized to the metric_threshold or it has been trained for {epochs} epochs.

  • batch_size – Int or hp sampling function from an integer space. Training batch size. It defaults to 32.

  • validation_data – Validation data. Validation data type should be the same as data.

  • metric_threshold – a trial will be terminated when metric threshold is met.

  • n_sampling – Number of times to sample from the search_space. Defaults to 1. If hp.grid_search is in search_space, the grid will be repeated n_sampling of times. If this is -1, (virtually) infinite samples are generated until a stopping condition is met.

  • search_alg – str, all supported searcher provided by ray tune (i.e.”variant_generator”, “random”, “ax”, “dragonfly”, “skopt”, “hyperopt”, “bayesopt”, “bohb”, “nevergrad”, “optuna”, “zoopt” and “sigopt”)

  • search_alg_params – extra parameters for searcher algorithm besides search_space, metric and searcher mode

  • scheduler – str, all supported scheduler provided by ray tune

  • scheduler_params – parameters for scheduler

Returns

a TSPipeline with the best model.

get_best_config()[source]

Get the best configuration

Returns

A dictionary of best hyper parameters

TSPipeline

TSPipeline is an E2E solution for time series forecasting task. AutoTSEstimator will replace original TSPipeline returned by AutoTSTrainer in later version.

class zoo.chronos.autots.tspipeline.TSPipeline(best_model, best_config, **kwargs)[source]

Bases: object

TSPipeline is an E2E solution for time series analysis (only forecasting task for now). You can use TSPipeline to:

  1. Further development on the prototype. (predict, evaluate, incremental fit)

  2. Deploy the model to their scenario. (save, load)

evaluate(data, metrics=['mse'], multioutput='uniform_average', batch_size=32)[source]

Evaluate the time series pipeline.

Parameters
  • data – data can be a TSDataset or data creator(will be supported). The TSDataset should follow the same operations as the training TSDataset used in AutoTSEstimator.fit.

  • metrics – list. The evaluation metric name to optimize. e.g. [“mse”]

  • multioutput – Defines aggregating of multiple output values. String in [‘raw_values’, ‘uniform_average’]. The value defaults to ‘uniform_average’.

  • batch_size – predict batch_size, the process will cost more time if batch_size is small while cost less memory. The param is only effective when data is a TSDataset. The values defaults to 32.

evaluate_with_onnx(data, metrics=['mse'], multioutput='uniform_average', batch_size=32)[source]

Evaluate the time series pipeline with onnx.

Parameters
  • data – data can be a TSDataset or data creator(will be supported). The TSDataset should follow the same operations as the training TSDataset used in AutoTSEstimator.fit.

  • metrics – list. The evaluation metric name to optimize. e.g. [“mse”]

  • multioutput – Defines aggregating of multiple output values. String in [‘raw_values’, ‘uniform_average’]. The value defaults to ‘uniform_average’.

  • batch_size – predict batch_size, the process will cost more time if batch_size is small while cost less memory. The param is only effective when data is a TSDataset. The values defaults to 32.

predict(data, batch_size=32)[source]

Rolling predict with time series pipeline.

Parameters
  • data – data can be a TSDataset or data creator(will be supported). The TSDataset should follow the same operations as the training TSDataset used in AutoTSEstimator.fit.

  • batch_size – predict batch_size, the process will cost more time if batch_size is small while cost less memory. The param is only effective when data is a TSDataset. The values defaults to 32.

predict_with_onnx(data, batch_size=32)[source]

Rolling predict with onnx with time series pipeline.

Parameters
  • data – data can be a TSDataset or data creator(will be supported). The TSDataset should follow the same operations as the training TSDataset used in AutoTSEstimator.fit.

  • batch_size – predict batch_size, the process will cost more time if batch_size is small while cost less memory. The param is only effective when data is a TSDataset. The values defaults to 32.

fit(data, validation_data=None, epochs=1, metric='mse')[source]

Incremental fitting

Parameters
  • data – data can be a TSDataset or data creator(will be supported). the TSDataset should follow the same operations as the training TSDataset used in AutoTSEstimator.fit.

  • validation_data – validation data, same format as data.

  • epochs – incremental fitting epoch. The value defaults to 1.

  • metric – evaluate metric.

save(file_path)[source]

Save the TSPipeline to a folder

Parameters

file_path – the folder location to save the pipeline

static load(file_path)[source]

Load the TSPipeline to a folder

Parameters

file_path – the folder location to load the pipeline