AutoML API¶
orca.automl.auto_estimator¶
A general estimator supports automatic model tuning. It allows users to fit and search the best hyperparameter for their model.
- class zoo.orca.automl.auto_estimator.AutoEstimator(model_builder, logs_dir='/tmp/auto_estimator_logs', resources_per_trial=None, remote_dir=None, name=None)[source]¶
Bases:
object
Example
>>> auto_est = AutoEstimator.from_torch(model_creator=model_creator, optimizer=get_optimizer, loss=nn.BCELoss(), logs_dir="/tmp/zoo_automl_logs", resources_per_trial={"cpu": 2}, name="test_fit") >>> auto_est.fit(data=data, validation_data=validation_data, search_space=create_linear_search_space(), n_sampling=4, epochs=1, metric="accuracy") >>> best_model = auto_est.get_best_model()
- static from_torch(*, model_creator, optimizer, loss, logs_dir='/tmp/auto_estimator_logs', resources_per_trial=None, name=None)[source]¶
Create an AutoEstimator for torch.
- Parameters
model_creator – PyTorch model creator function.
optimizer – PyTorch optimizer creator function or pytorch optimizer name (string). Note that you should specify learning rate search space with key as “lr” or LR_NAME (from zoo.orca.automl.pytorch_utils import LR_NAME) if input optimizer name. Without learning rate search space specified, the default learning rate value of 1e-3 will be used for all estimators.
loss – PyTorch loss instance or PyTorch loss creator function or pytorch loss name (string).
logs_dir – Local directory to save logs and results. It defaults to “/tmp/auto_estimator_logs”
resources_per_trial – Dict. resources for each trial. e.g. {“cpu”: 2}.
name – Name of the auto estimator.
- Returns
an AutoEstimator object.
- static from_keras(*, model_creator, logs_dir='/tmp/auto_estimator_logs', resources_per_trial=None, name=None)[source]¶
Create an AutoEstimator for tensorflow keras.
- Parameters
model_creator – Tensorflow keras model creator function.
logs_dir – Local directory to save logs and results. It defaults to “/tmp/auto_estimator_logs”
resources_per_trial – Dict. resources for each trial. e.g. {“cpu”: 2}.
name – Name of the auto estimator.
- Returns
an AutoEstimator object.
- fit(data, epochs=1, validation_data=None, metric=None, metric_mode=None, metric_threshold=None, n_sampling=1, search_space=None, search_alg=None, search_alg_params=None, scheduler=None, scheduler_params=None)[source]¶
Automatically fit the model and search for the best hyperparameters.
- Parameters
data – train data. If the AutoEstimator is created with from_torch, data can be a tuple of ndarrays or a function that takes a config dictionary as parameter and returns a PyTorch DataLoader. If the AutoEstimator is created with from_keras, data can be a tuple of ndarrays. If data is a tuple of ndarrays, it should be in the form of (x, y), where x is training input data and y is training target data.
epochs – Max number of epochs to train in each trial. Defaults to 1. If you have also set metric_threshold, a trial will stop if either it has been optimized to the metric_threshold or it has been trained for {epochs} epochs.
validation_data – Validation data. Validation data type should be the same as data.
metric – String or customized evaluation metric function. If string, metric is the evaluation metric name to optimize, e.g. “mse”. If callable function, it signature should be func(y_true, y_pred), where y_true and y_pred are numpy ndarray. The function should return a float value as evaluation result.
metric_mode – One of [“min”, “max”]. “max” means greater metric value is better. You have to specify metric_mode if you use a customized metric function. You don’t have to specify metric_mode if you use the built-in metric in zoo.automl.common.metrics.Evaluator.
metric_threshold – a trial will be terminated when metric threshold is met
n_sampling – Number of times to sample from the search_space. Defaults to 1. If hp.grid_search is in search_space, the grid will be repeated n_sampling of times. If this is -1, (virtually) infinite samples are generated until a stopping condition is met.
search_space – a dict for search space
search_alg – str, all supported searcher provided by ray tune (i.e.”variant_generator”, “random”, “ax”, “dragonfly”, “skopt”, “hyperopt”, “bayesopt”, “bohb”, “nevergrad”, “optuna”, “zoopt” and “sigopt”)
search_alg_params – extra parameters for searcher algorithm besides search_space, metric and searcher mode
scheduler – str, all supported scheduler provided by ray tune
scheduler_params – parameters for scheduler
orca.automl.hp¶
Sampling specs to be used in search space configuration.
- zoo.orca.automl.hp.uniform(lower, upper)[source]¶
Sample a float uniformly between lower and upper.
- Parameters
lower – Lower bound of the sampling range.
upper – Upper bound of the sampling range.
- zoo.orca.automl.hp.quniform(lower, upper, q)[source]¶
Sample a float uniformly between lower and upper. Round the result to nearest value with granularity q, include upper.
- Parameters
lower – Lower bound of the sampling range.
upper – Upper bound of the sampling range.
q – Granularity for increment.
- zoo.orca.automl.hp.loguniform(lower, upper, base=10)[source]¶
Sample a float between lower and upper. Power distribute uniformly between log_{base}(lower) and log_{base}(upper).
- Parameters
lower – Lower bound of the sampling range.
upper – Upper bound of the sampling range.
base – Log base for distribution. Default to 10.
- zoo.orca.automl.hp.qloguniform(lower, upper, q, base=10)[source]¶
Sample a float between lower and upper. Power distribute uniformly between log_{base}(lower) and log_{base}(upper). Round the result to nearest value with granularity q, include upper.
- Parameters
lower – Lower bound of the sampling range.
upper – Upper bound of the sampling range.
q – Granularity for increment.
base – Log base for distribution. Default to 10.
- zoo.orca.automl.hp.randn(mean=0.0, std=1.0)[source]¶
Sample a float from normal distribution.
- Parameters
mean – Mean of the normal distribution. Default to 0.0.
std – Std of the normal distribution. Default to 1.0.
- zoo.orca.automl.hp.qrandn(mean, std, q)[source]¶
Sample a float from normal distribution. Round the result to nearest value with granularity q.
- Parameters
mean – Mean of the normal distribution. Default to 0.0.
std – Std of the normal distribution. Default to 1.0.
q – Granularity for increment.
- zoo.orca.automl.hp.randint(lower, upper)[source]¶
Uniformly sample integer between lower and upper. (Both inclusive)
- Parameters
lower – Lower bound of the sampling range.
upper – Upper bound of the sampling range.
- zoo.orca.automl.hp.qrandint(lower, upper, q=1)[source]¶
Uniformly sample integer between lower and upper. (Both inclusive) Round the result to nearest value with granularity q.
- Parameters
lower – Lower bound of the sampling range.
upper – Upper bound of the sampling range.
q – Integer Granularity for increment.
- zoo.orca.automl.hp.choice(categories)[source]¶
Uniformly sample from a list
- Parameters
categories – A list to be sampled.
- zoo.orca.automl.hp.choice_n(categories, min_items, max_items)[source]¶
Sample a subset from a list
- Parameters
categories – A list to be sampled
min_items – minimum number of items to be sampled
max_items – maximum number of items to be sampled