AutoML API¶

orca.automl.auto_estimator¶

A general estimator supports automatic model tuning. It allows users to fit and search the best hyperparameter for their model.

class zoo.orca.automl.auto_estimator.AutoEstimator(model_builder, logs_dir='/tmp/auto_estimator_logs', resources_per_trial=None, remote_dir=None, name=None)[source]¶

Bases: object

Example

>>> auto_est = AutoEstimator.from_torch(model_creator=model_creator,
                                        optimizer=get_optimizer,
                                        loss=nn.BCELoss(),
                                        logs_dir="/tmp/zoo_automl_logs",
                                        resources_per_trial={"cpu": 2},
                                        name="test_fit")
>>> auto_est.fit(data=data,
                 validation_data=validation_data,
                 search_space=create_linear_search_space(),
                 n_sampling=4,
                 epochs=1,
                 metric="accuracy")
>>> best_model = auto_est.get_best_model()

static from_torch(*, model_creator, optimizer, loss, logs_dir='/tmp/auto_estimator_logs', resources_per_trial=None, name=None)[source]¶

Create an AutoEstimator for torch.

Parameters

model_creator – PyTorch model creator function.
optimizer – PyTorch optimizer creator function or pytorch optimizer name (string). Note that you should specify learning rate search space with key as “lr” or LR_NAME (from zoo.orca.automl.pytorch_utils import LR_NAME) if input optimizer name. Without learning rate search space specified, the default learning rate value of 1e-3 will be used for all estimators.
loss – PyTorch loss instance or PyTorch loss creator function or pytorch loss name (string).
logs_dir – Local directory to save logs and results. It defaults to “/tmp/auto_estimator_logs”
resources_per_trial – Dict. resources for each trial. e.g. {“cpu”: 2}.
name – Name of the auto estimator.

Returns

an AutoEstimator object.

static from_keras(*, model_creator, logs_dir='/tmp/auto_estimator_logs', resources_per_trial=None, name=None)[source]¶

Create an AutoEstimator for tensorflow keras.

Parameters

model_creator – Tensorflow keras model creator function.
logs_dir – Local directory to save logs and results. It defaults to “/tmp/auto_estimator_logs”
resources_per_trial – Dict. resources for each trial. e.g. {“cpu”: 2}.
name – Name of the auto estimator.

Returns

an AutoEstimator object.

fit(data, epochs=1, validation_data=None, metric=None, metric_mode=None, metric_threshold=None, n_sampling=1, search_space=None, search_alg=None, search_alg_params=None, scheduler=None, scheduler_params=None)[source]¶

Automatically fit the model and search for the best hyperparameters.

Parameters

data – train data. If the AutoEstimator is created with from_torch, data can be a tuple of ndarrays or a function that takes a config dictionary as parameter and returns a PyTorch DataLoader. If the AutoEstimator is created with from_keras, data can be a tuple of ndarrays. If data is a tuple of ndarrays, it should be in the form of (x, y), where x is training input data and y is training target data.
epochs – Max number of epochs to train in each trial. Defaults to 1. If you have also set metric_threshold, a trial will stop if either it has been optimized to the metric_threshold or it has been trained for {epochs} epochs.
validation_data – Validation data. Validation data type should be the same as data.
metric – String or customized evaluation metric function. If string, metric is the evaluation metric name to optimize, e.g. “mse”. If callable function, it signature should be func(y_true, y_pred), where y_true and y_pred are numpy ndarray. The function should return a float value as evaluation result.
metric_mode – One of [“min”, “max”]. “max” means greater metric value is better. You have to specify metric_mode if you use a customized metric function. You don’t have to specify metric_mode if you use the built-in metric in zoo.automl.common.metrics.Evaluator.
metric_threshold – a trial will be terminated when metric threshold is met
n_sampling – Number of times to sample from the search_space. Defaults to 1. If hp.grid_search is in search_space, the grid will be repeated n_sampling of times. If this is -1, (virtually) infinite samples are generated until a stopping condition is met.
search_space – a dict for search space
search_alg – str, all supported searcher provided by ray tune (i.e.”variant_generator”, “random”, “ax”, “dragonfly”, “skopt”, “hyperopt”, “bayesopt”, “bohb”, “nevergrad”, “optuna”, “zoopt” and “sigopt”)
search_alg_params – extra parameters for searcher algorithm besides search_space, metric and searcher mode
scheduler – str, all supported scheduler provided by ray tune
scheduler_params – parameters for scheduler

get_best_model()[source]¶

Return the best model found by the AutoEstimator

Returns: the best model instance

get_best_config()[source]¶

Return the best config found by the AutoEstimator

Returns: A dictionary of best hyper parameters

orca.automl.hp¶

Sampling specs to be used in search space configuration.

zoo.orca.automl.hp.uniform(lower, upper)[source]¶

Sample a float uniformly between lower and upper.

Parameters

lower – Lower bound of the sampling range.
upper – Upper bound of the sampling range.

zoo.orca.automl.hp.quniform(lower, upper, q)[source]¶

Sample a float uniformly between lower and upper. Round the result to nearest value with granularity q, include upper.

Parameters

lower – Lower bound of the sampling range.
upper – Upper bound of the sampling range.
q – Granularity for increment.

zoo.orca.automl.hp.loguniform(lower, upper, base=10)[source]¶

Sample a float between lower and upper. Power distribute uniformly between log_{base}(lower) and log_{base}(upper).

Parameters

lower – Lower bound of the sampling range.
upper – Upper bound of the sampling range.
base – Log base for distribution. Default to 10.

zoo.orca.automl.hp.qloguniform(lower, upper, q, base=10)[source]¶

Sample a float between lower and upper. Power distribute uniformly between log_{base}(lower) and log_{base}(upper). Round the result to nearest value with granularity q, include upper.

Parameters

lower – Lower bound of the sampling range.
upper – Upper bound of the sampling range.
q – Granularity for increment.
base – Log base for distribution. Default to 10.

zoo.orca.automl.hp.randn(mean=0.0, std=1.0)[source]¶

Sample a float from normal distribution.

Parameters

mean – Mean of the normal distribution. Default to 0.0.
std – Std of the normal distribution. Default to 1.0.

zoo.orca.automl.hp.qrandn(mean, std, q)[source]¶

Sample a float from normal distribution. Round the result to nearest value with granularity q.

Parameters

mean – Mean of the normal distribution. Default to 0.0.
std – Std of the normal distribution. Default to 1.0.
q – Granularity for increment.

zoo.orca.automl.hp.randint(lower, upper)[source]¶

Uniformly sample integer between lower and upper. (Both inclusive)

Parameters

lower – Lower bound of the sampling range.
upper – Upper bound of the sampling range.

zoo.orca.automl.hp.qrandint(lower, upper, q=1)[source]¶

Uniformly sample integer between lower and upper. (Both inclusive) Round the result to nearest value with granularity q.

Parameters

lower – Lower bound of the sampling range.
upper – Upper bound of the sampling range.
q – Integer Granularity for increment.

zoo.orca.automl.hp.choice(categories)[source]¶

Uniformly sample from a list

Parameters: categories – A list to be sampled.

zoo.orca.automl.hp.choice_n(categories, min_items, max_items)[source]¶

Sample a subset from a list

Parameters

categories – A list to be sampled
min_items – minimum number of items to be sampled
max_items – maximum number of items to be sampled

zoo.orca.automl.hp.sample_from(func)[source]¶

Sample from a function.

Parameters: func – The function to be sampled.

zoo.orca.automl.hp.grid_search(values)[source]¶

Specifying grid search over a list.

Parameters: values – A list to be grid searched.