AutoML API

orca.automl.auto_estimator

A general estimator supports automatic model tuning. It allows users to fit and search the best hyperparameter for their model.

class zoo.orca.automl.auto_estimator.AutoEstimator(model_builder, logs_dir='/tmp/auto_estimator_logs', resources_per_trial=None, remote_dir=None, name=None)[source]

Bases: object

Example

>>> auto_est = AutoEstimator.from_torch(model_creator=model_creator,
                                        optimizer=get_optimizer,
                                        loss=nn.BCELoss(),
                                        logs_dir="/tmp/zoo_automl_logs",
                                        resources_per_trial={"cpu": 2},
                                        name="test_fit")
>>> auto_est.fit(data=data,
                 validation_data=validation_data,
                 search_space=create_linear_search_space(),
                 n_sampling=4,
                 epochs=1,
                 metric="accuracy")
>>> best_model = auto_est.get_best_model()
static from_torch(*, model_creator, optimizer, loss, logs_dir='/tmp/auto_estimator_logs', resources_per_trial=None, name='auto_pytorch_estimator', remote_dir=None)[source]

Create an AutoEstimator for torch.

Parameters
  • model_creator – PyTorch model creator function.

  • optimizer – PyTorch optimizer creator function or pytorch optimizer name (string). Note that you should specify learning rate search space with key as “lr” or LR_NAME (from zoo.orca.automl.pytorch_utils import LR_NAME) if input optimizer name. Without learning rate search space specified, the default learning rate value of 1e-3 will be used for all estimators.

  • loss – PyTorch loss instance or PyTorch loss creator function or pytorch loss name (string).

  • logs_dir – Local directory to save logs and results. It defaults to “/tmp/auto_estimator_logs”

  • resources_per_trial – Dict. resources for each trial. e.g. {“cpu”: 2}.

  • name – Name of the auto estimator. It defaults to “auto_pytorch_estimator”

  • remote_dir – String. Remote directory to sync training results and checkpoints. It defaults to None and doesn’t take effects while running in local. While running in cluster, it defaults to “hdfs:///tmp/{name}”.

Returns

an AutoEstimator object.

static from_keras(*, model_creator, logs_dir='/tmp/auto_estimator_logs', resources_per_trial=None, name='auto_keras_estimator', remote_dir=None)[source]

Create an AutoEstimator for tensorflow keras.

Parameters
  • model_creator – Tensorflow keras model creator function.

  • logs_dir – Local directory to save logs and results. It defaults to “/tmp/auto_estimator_logs”

  • resources_per_trial – Dict. resources for each trial. e.g. {“cpu”: 2}.

  • name – Name of the auto estimator. It defaults to “auto_keras_estimator”

  • remote_dir – String. Remote directory to sync training results and checkpoints. It defaults to None and doesn’t take effects while running in local. While running in cluster, it defaults to “hdfs:///tmp/{name}”.

Returns

an AutoEstimator object.

fit(data, epochs=1, validation_data=None, metric=None, metric_mode=None, metric_threshold=None, n_sampling=1, search_space=None, search_alg=None, search_alg_params=None, scheduler=None, scheduler_params=None)[source]

Automatically fit the model and search for the best hyperparameters.

Parameters
  • data – train data. If the AutoEstimator is created with from_torch, data can be a tuple of ndarrays or a PyTorch DataLoader or a function that takes a config dictionary as parameter and returns a PyTorch DataLoader. If the AutoEstimator is created with from_keras, data can be a tuple of ndarrays. If data is a tuple of ndarrays, it should be in the form of (x, y), where x is training input data and y is training target data.

  • epochs – Max number of epochs to train in each trial. Defaults to 1. If you have also set metric_threshold, a trial will stop if either it has been optimized to the metric_threshold or it has been trained for {epochs} epochs.

  • validation_data – Validation data. Validation data type should be the same as data.

  • metric – String or customized evaluation metric function. If string, metric is the evaluation metric name to optimize, e.g. “mse”. If callable function, it signature should be func(y_true, y_pred), where y_true and y_pred are numpy ndarray. The function should return a float value as evaluation result.

  • metric_mode – One of [“min”, “max”]. “max” means greater metric value is better. You have to specify metric_mode if you use a customized metric function. You don’t have to specify metric_mode if you use the built-in metric in zoo.orca.automl.metrics.Evaluator.

  • metric_threshold – a trial will be terminated when metric threshold is met

  • n_sampling – Number of times to sample from the search_space. Defaults to 1. If hp.grid_search is in search_space, the grid will be repeated n_sampling of times. If this is -1, (virtually) infinite samples are generated until a stopping condition is met.

  • search_space – a dict for search space

  • search_alg – str, all supported searcher provided by ray tune (i.e.”variant_generator”, “random”, “ax”, “dragonfly”, “skopt”, “hyperopt”, “bayesopt”, “bohb”, “nevergrad”, “optuna”, “zoopt” and “sigopt”)

  • search_alg_params – extra parameters for searcher algorithm besides search_space, metric and searcher mode

  • scheduler – str, all supported scheduler provided by ray tune

  • scheduler_params – parameters for scheduler

get_best_model()[source]

Return the best model found by the AutoEstimator

Returns

the best model instance

get_best_config()[source]

Return the best config found by the AutoEstimator

Returns

A dictionary of best hyper parameters

orca.automl.hp

Sampling specs to be used in search space configuration.

zoo.orca.automl.hp.uniform(lower, upper)[source]

Sample a float uniformly between lower and upper.

Parameters
  • lower – Lower bound of the sampling range.

  • upper – Upper bound of the sampling range.

zoo.orca.automl.hp.quniform(lower, upper, q)[source]

Sample a float uniformly between lower and upper. Round the result to nearest value with granularity q, include upper.

Parameters
  • lower – Lower bound of the sampling range.

  • upper – Upper bound of the sampling range.

  • q – Granularity for increment.

zoo.orca.automl.hp.loguniform(lower, upper, base=10)[source]

Sample a float between lower and upper. Power distribute uniformly between log_{base}(lower) and log_{base}(upper).

Parameters
  • lower – Lower bound of the sampling range.

  • upper – Upper bound of the sampling range.

  • base – Log base for distribution. Default to 10.

zoo.orca.automl.hp.qloguniform(lower, upper, q, base=10)[source]

Sample a float between lower and upper. Power distribute uniformly between log_{base}(lower) and log_{base}(upper). Round the result to nearest value with granularity q, include upper.

Parameters
  • lower – Lower bound of the sampling range.

  • upper – Upper bound of the sampling range.

  • q – Granularity for increment.

  • base – Log base for distribution. Default to 10.

zoo.orca.automl.hp.randn(mean=0.0, std=1.0)[source]

Sample a float from normal distribution.

Parameters
  • mean – Mean of the normal distribution. Default to 0.0.

  • std – Std of the normal distribution. Default to 1.0.

zoo.orca.automl.hp.qrandn(mean, std, q)[source]

Sample a float from normal distribution. Round the result to nearest value with granularity q.

Parameters
  • mean – Mean of the normal distribution. Default to 0.0.

  • std – Std of the normal distribution. Default to 1.0.

  • q – Granularity for increment.

zoo.orca.automl.hp.randint(lower, upper)[source]

Uniformly sample integer between lower and upper. (Both inclusive)

Parameters
  • lower – Lower bound of the sampling range.

  • upper – Upper bound of the sampling range.

zoo.orca.automl.hp.qrandint(lower, upper, q=1)[source]

Uniformly sample integer between lower and upper. (Both inclusive) Round the result to nearest value with granularity q.

Parameters
  • lower – Lower bound of the sampling range.

  • upper – Upper bound of the sampling range.

  • q – Integer Granularity for increment.

zoo.orca.automl.hp.choice(categories)[source]

Uniformly sample from a list

Parameters

categories – A list to be sampled.

zoo.orca.automl.hp.choice_n(categories, min_items, max_items)[source]

Sample a subset from a list

Parameters
  • categories – A list to be sampled

  • min_items – minimum number of items to be sampled

  • max_items – maximum number of items to be sampled

zoo.orca.automl.hp.sample_from(func)[source]

Sample from a function.

Parameters

func – The function to be sampled.

Specifying grid search over a list.

Parameters

values – A list to be grid searched.

automl.metrics

Evaluate unscaled metrics between y true value and y predicted value.

zoo.orca.automl.metrics.sMAPE(y_true, y_pred, multioutput='raw_values')[source]

Calculate Symmetric mean absolute percentage error (sMAPE).

\[\text{sMAPE} = \frac{100\%}{n} \sum_{t=1}^n \frac{|y_t-\hat{y_t}|}{|y_t|+|\hat{y_t}|}\]
Parameters
  • y_true – Array-like of shape = (n_samples, *). Ground truth (correct) target values.

  • y_pred – Array-like of shape = (n_samples, *). Estimated target values.

  • multioutput – String in [‘raw_values’, ‘uniform_average’]

Returns

Float or ndarray of floats. A non-negative floating point value (the best value is 0.0), or an array of floating point values, one for each individual target.

zoo.orca.automl.metrics.MPE(y_true, y_pred, multioutput='raw_values')[source]

Calculate mean percentage error (MPE).

\[\text{MPE} = \frac{100\%}{n}\sum_{t=1}^n \frac{y_t-\hat{y_t}}{y_t}\]
Parameters
  • y_true – Array-like of shape = (n_samples, *). Ground truth (correct) target values.

  • y_pred – Array-like of shape = (n_samples, *). Estimated target values.

  • multioutput – String in [‘raw_values’, ‘uniform_average’]

Returns

Float or ndarray of floats. A non-negative floating point value (the best value is 0.0), or an array of floating point values, one for each individual target.

zoo.orca.automl.metrics.MAPE(y_true, y_pred, multioutput='raw_values')[source]

Calculate mean absolute percentage error (MAPE).

\[\text{MAPE} = \frac{100\%}{n}\sum_{t=1}^n |\frac{y_t-\hat{y_t}}{y_t}|\]
Parameters
  • y_true – Array-like of shape = (n_samples, *). Ground truth (correct) target values.

  • y_pred – Array-like of shape = (n_samples, *). Estimated target values.

  • multioutput – String in [‘raw_values’, ‘uniform_average’]

Returns

Float or ndarray of floats. A non-negative floating point value (the best value is 0.0), or an array of floating point values, one for each individual target.

zoo.orca.automl.metrics.MDAPE(y_true, y_pred, multioutput='raw_values')[source]

Calculate Median Absolute Percentage Error (MDAPE).

\[\text{MDAPE} = 100\%\ median(|\frac{y_1-\hat{y_1}}{y_1}|, \ldots, |\frac{y_n-\hat{y_n}}{y_n}|)\]
Parameters
  • y_true – Array-like of shape = (n_samples, *). Ground truth (correct) target values.

  • y_pred – Array-like of shape = (n_samples, *). Estimated target values.

  • multioutput – String in [‘raw_values’, ‘uniform_average’]

Returns

Float or ndarray of floats. A non-negative floating point value (the best value is 0.0), or an array of floating point values, one for each individual target.

zoo.orca.automl.metrics.sMDAPE(y_true, y_pred, multioutput='raw_values')[source]

Calculate Symmetric Median Absolute Percentage Error (sMDAPE).

\[\text{sMDAPE} = 100\%\ median(\frac{|y_1-\hat{y_1}|}{|y_1|+|\hat{y_1}|}, \ldots, \frac{|y_n-\hat{y_n}|}{|y_n|+|\hat{y_n}|})\]
Parameters
  • y_true – Array-like of shape = (n_samples, *). Ground truth (correct) target values.

  • y_pred – Array-like of shape = (n_samples, *). Estimated target values.

  • multioutput – String in [‘raw_values’, ‘uniform_average’]

Returns

Float or ndarray of floats. A non-negative floating point value (the best value is 0.0), or an array of floating point values, one for each individual target.

zoo.orca.automl.metrics.ME(y_true, y_pred, multioutput='raw_values')[source]

Calculate Mean Error (ME).

\[\text{ME} = \frac{1}{n}\sum_{t=1}^n y_t-\hat{y_t}\]
Parameters
  • y_true – Array-like of shape = (n_samples, *). Ground truth (correct) target values.

  • y_pred – Array-like of shape = (n_samples, *). Estimated target values.

  • multioutput – String in [‘raw_values’, ‘uniform_average’]

Returns

Float or ndarray of floats. A floating point value (the best value is 0.0), or an array of floating point values, one for each individual target.

zoo.orca.automl.metrics.MSPE(y_true, y_pred, multioutput='raw_values')[source]

Calculate mean squared percentage error (MSPE).

\[\text{MSPE} = \frac{100\%}{n}\sum_{t=1}^n (\frac{y_n-\hat{y_n}}{y_n})^2\]
Parameters
  • y_true – Array-like of shape = (n_samples, *). Ground truth (correct) target values.

  • y_pred – Array-like of shape = (n_samples, *). Estimated target values.

  • multioutput – String in [‘raw_values’, ‘uniform_average’]

Returns

Float or ndarray of floats. A non-negative floating point value (the best value is 0.0), or an array of floating point values, one for each individual target.

zoo.orca.automl.metrics.MSLE(y_true, y_pred, multioutput='raw_values')[source]

Calculate the mean squared log error(MSLE).

\[\text{MSLE} = \frac{1}{n}\sum_{t=1}^n (log_e(1+y_t)-log_e(1+\hat{y_t}))^2\]
Parameters
  • y_true – Array-like of shape = (n_samples, *). Ground truth (correct) target values.

  • y_pred – Array-like of shape = (n_samples, *). Estimated target values.

  • multioutput – String in [‘raw_values’, ‘uniform_average’]

Returns

Float or ndarray of floats. A non-negative floating point value (the best value is 0.0), or an array of floating point values, one for each individual target.

zoo.orca.automl.metrics.R2(y_true, y_pred, multioutput='raw_values')[source]

Calculate the r2 score.

\[R^2 = 1-\frac{\sum_{t=1}^n (y_t-\hat{y_t})^2}{\sum_{t=1}^n (y_t-\bar{y})^2}\]
Parameters
  • y_true – Array-like of shape = (n_samples, *). Ground truth (correct) target values.

  • y_pred – Array-like of shape = (n_samples, *). Estimated target values.

  • multioutput – String in [‘raw_values’, ‘uniform_average’]

Returns

Float or ndarray of floats. A non-negative floating point value (the best value is 1.0), or an array of floating point values, one for each individual target.

zoo.orca.automl.metrics.MAE(y_true, y_pred, multioutput='raw_values')[source]

Calculate the mean absolute error (MAE).

\[\text{MAE} = \frac{1}{n}\sum_{t=1}^n |y_t-\hat{y_t}|\]
Parameters
  • y_true – Array-like of shape = (n_samples, *). Ground truth (correct) target values.

  • y_pred – Array-like of shape = (n_samples, *). Estimated target values.

  • multioutput – String in [‘raw_values’, ‘uniform_average’]

Returns

Float or ndarray of floats. A non-negative floating point value (the best value is 0.0), or an array of floating point values, one for each individual target.

zoo.orca.automl.metrics.RMSE(y_true, y_pred, multioutput='raw_values')[source]

Calculate square root of the mean squared error (RMSE).

\[\text{RMSE} = \sqrt{(\frac{1}{n}\sum_{t=1}^n (y_t-\hat{y_t})^2)}\]
Parameters
  • y_true – Array-like of shape = (n_samples, *). Ground truth (correct) target values.

  • y_pred – Array-like of shape = (n_samples, *). Estimated target values.

  • multioutput – String in [‘raw_values’, ‘uniform_average’]

Returns

Float or ndarray of floats. A non-negative floating point value (the best value is 0.0), or an array of floating point values, one for each individual target.

zoo.orca.automl.metrics.MSE(y_true, y_pred, multioutput='uniform_average')[source]

Calculate the mean squared error (MSE).

\[\text{MSE} = \frac{1}{n}\sum_{t=1}^n (y_t-\hat{y_t})^2\]
Parameters
  • y_true – Array-like of shape = (n_samples, *). Ground truth (correct) target values.

  • y_pred – Array-like of shape = (n_samples, *). Estimated target values.

  • multioutput – String in [‘raw_values’, ‘uniform_average’]

Returns

Float or ndarray of floats. A non-negative floating point value (the best value is 0.0), or an array of floating point values, one for each individual target.

zoo.orca.automl.metrics.Accuracy(y_true, y_pred, multioutput=None)[source]

Calculate the accuracy score (Accuracy).

\[\text{Accuracy} = \frac{1}{n}\sum_{t=1}^n 1(y_t=\hat{y_t})\]
Parameters
  • y_true – Array-like of shape = (n_samples, *). Ground truth (correct) target values.

  • y_pred – Array-like of shape = (n_samples, *). Estimated target values.

Returns

Float or ndarray of floats. A non-negative floating point value (the best value is 1.0), or an array of floating point values, one for each individual target.

class zoo.orca.automl.metrics.Evaluator[source]

Bases: object

Evaluate metrics for y_true and y_pred.

static evaluate(metric, y_true, y_pred, multioutput='raw_values')[source]

Evaluate a specific metric for y_true and y_pred.

Parameters
  • metric – String in [‘me’, ‘mae’, ‘mse’, ‘rmse’, ‘msle’, ‘r2’ , ‘mpe’, ‘mape’, ‘mspe’, ‘smape’, ‘mdape’, ‘smdape’, ‘accuracy’]

  • y_true – Array-like of shape = (n_samples, *). Ground truth (correct) target values.

  • y_pred – Array-like of shape = (n_samples, *). Estimated target values.

  • multioutput – String in [‘raw_values’, ‘uniform_average’]

Returns

Float or ndarray of floats. A floating point value, or an array of floating point values, one for each individual target.