Use AutoML for Time-Series Forecasting


../../../_images/colab_logo_32px.pngRun in Google Colab  ../../../_images/GitHub-Mark-32px.pngView source on GitHub


In this guide we will demonstrate how to use Chronos AutoTS for automated time seires forecasting in 4 simple steps.

Step 0: Prepare Environment

We recommend using conda to prepare the environment. Please refer to the install guide for more details.

conda create -n zoo python=3.7 # "zoo" is conda environment name, you can use any name you like.
conda activate zoo
pip install analytics-zoo[automl] # install either version 0.9 or latest nightly build

Step 1: Init Orca Context

if args.cluster_mode == "local":
    init_orca_context(cluster_mode="local", cores=4) # run in local mode
elif args.cluster_mode == "k8s":
    init_orca_context(cluster_mode="k8s", num_nodes=2, cores=2) # run on K8s cluster
elif args.cluster_mode == "yarn":
    init_orca_context(cluster_mode="yarn-client", num_nodes=2, cores=2) # run on Hadoop YARN cluster

This is the only place where you need to specify local or distributed mode. View Orca Context for more details.

Note: You should export HADOOP_CONF_DIR=/path/to/hadoop/conf/dir when running on Hadoop YARN cluster. View Hadoop User Guide for more details.

Step 2: Create an AutoTSTrainer

You can then Create an AutoTSTrainer.

from zoo.chronos.autots.forecast import AutoTSTrainer

trainer = AutoTSTrainer(dt_col="timestamp",  
                        target_col="value",  
                        horizon=1,           
                        extra_features_col=None
                        )

Step 3: Fit with AutoTSTrainer

You can then train on the input data using AutoTSTrainer.fit with a recipe to specify search space.

from zoo.chronos.config.recipe import LSTMGridRandomRecipe

ts_pipeline = trainer.fit(train_df, val_df,
                          recipe=LSTMGridRandomRecipe(
                              num_rand_samples=1,
                              epochs=1,
                              look_back=6,
                              batch_size=[64]),
                          metric="mse")

Step 4: Further deployment with TSPipeline

You can use the result ts_pipeline for prediction, evaluation or (incremental) fitting.

# predict with the best trial
pred_df = ts_pipeline.predict(test_df)

# evaluate the result pipeline
mse, smape = ts_pipeline.evaluate(test_df, metrics=["mse", "smape"])
print("Evaluate: the mean square error is", mse)
print("Evaluate: the smape value is", smape)

You can also save and restore the pipeline for further deployment.

# save the pipeline
my_ppl_file_path = ts_pipeline.save("/tmp/saved_pipeline/nyc_taxi.ppl")

# restore the pipeline for further deployment
from zoo.chronos.autots.forecast import TSPipeline
loaded_ppl = TSPipeline.load(my_ppl_file_path)

That’s it, the same code can run seamlessly in your local laptop and the distribute K8s or Hadoop cluster.

Note: An OrcaContext is only necessary for AutoTSTrainer and is not needed if you only use TSPipeline.