Use AutoXGBoost to auto-tune XGBoost parameters¶
Run in Google Colab
View source on GitHub
In this guide we will describe how to use Orca AutoXGBoost for automated xgboost tuning
Orca AutoXGBoost enables distributed automated hyper-parameter tuning for XGBoost, which includes AutoXGBRegressor
and AutoXGBClassifier
for sklearnXGBRegressor
and XGBClassifier
respectively. See more about xgboost scikit-learn API.
Step 0: Prepare Environment¶
Conda is needed to prepare the Python environment for running this example. Please refer to the install guide for more details.
conda create -n zoo python=3.7 # zoo is conda environment name, you can use any name you like.
conda activate zoo
pip install analytics-zoo[ray]
pip install torch==1.7.1 torchvision==0.8.2
Step 1: Init Orca Context¶
from zoo.orca import init_orca_context, stop_orca_context
if cluster_mode == "local":
init_orca_context(cores=6, memory="2g", init_ray_on_spark=True) # run in local mode
elif cluster_mode == "k8s":
init_orca_context(cluster_mode="k8s", num_nodes=2, cores=4, init_ray_on_spark=True) # run on K8s cluster
elif cluster_mode == "yarn":
init_orca_context(
cluster_mode="yarn-client", cores=4, num_nodes=2, memory="2g", init_ray_on_spark=True,
driver_memory="10g", driver_cores=1) # run on Hadoop YARN cluster
This is the only place where you need to specify local or distributed mode. View Orca Context for more details.
Note: You should export HADOOP_CONF_DIR=/path/to/hadoop/conf/dir
when running on Hadoop YARN cluster. View Hadoop User Guide for more details.
Step 2: Define Search space¶
You should define a dictionary as your hyper-parameter search space.
The keys are hyper-parameter names you want to search for XGBRegressor
, and you can specify how you want to sample each hyper-parameter in the values of the search space. See automl.hp for more details.
from zoo.orca.automl import hp
search_space = {
"n_estimators": hp.grid_search([50, 100, 200]),
"max_depth": hp.choice([2, 4, 6]),
}
Step 3: Automatically fit and search with Orca AutoXGBoost¶
First create an AutoXGBRegressor
.
from zoo.orca.automl.xgboost import AutoXGBRegressor
auto_xgb_reg = AutoXGBRegressor(cpus_per_trial=2,
name="auto_xgb_classifier",
min_child_weight=3,
random_state=2)
Next, use the AutoXGBRegressor
to fit and search for the best hyper-parameter set.
auto_xgb_reg.fit(data=(X_train, y_train),
validation_data=(X_test, y_test),
search_space=search_space,
n_sampling=2,
metric="rmse")
Step 4: Get best model and hyper parameters¶
You can get the best learned model and the best hyper-parameter set for further deployment. The best model is an sklearn XGBRegressor
instance.
best_model = auto_xgb_reg.get_best_model()
best_config = auto_xgb_reg.get_best_config()
Note: You should call stop_orca_context()
when your application finishes.