【Hyperopt】XGBoost で DART を試してみる

環境は以下です。

macOS siera
Python 2.7.13
xgboost==0.6a2
hyperopt==0.1

DART

DART: Dropouts meet Multiple Additive Regression Trees [Rashmi and Gilad-Bachrach, 2015] はMART (多重加法回帰木) におけるアンサンブルのサイズの増加時に現れる特定のインスタンスでの一種の over-fit を起こす問題 (over-specialization) に対して, DNN の Dropouts のアイデアを導入し改善した正則化手法。

over-specialization の説明をみてみると後半の反復で追加されたツリーは, 少しのインスタンスの予測にだけ影響を与える傾向があり, 残りの多くのインスタンスには僅かにしか貢献しないとあり, Boosting ではよく知られた問題のようだ。

XGBoost で DART

XGBoost での DART の使い方は tutorials にあるように簡単で, パラメータの booster に dart を指定し, rate_drop と skip_drop を [0.0, 1.0] の範囲で指定する。
また, dart は gbtree を継承しているので, eta, gamma, max_depth を持っている。

param = {'booster': 'dart',
         'max_depth': 5,
         'learning_rate': 0.1,
         'objective': 'binary:logistic',
         'silent': True,
         'sample_type': 'uniform',
         'normalize_type': 'tree',
         'rate_drop': 0.1,
         'skip_drop': 0.5}

Dropouts はランダム性がある為, early stop を小さくすると安定性を損なう場合があるらしい。

Hyperopt でパラメータチューニング

hyperopt は分散非同期で動くハイパーパラメータ最適化のライブラリで, Random Search と Tree of Parzen Estimators (TPE) が実装されている。

space の各keyの値には hp.quniform(‘パラメータ名’, 最小値, 最大値, 間隔) のように記述する。spaceの範囲で fn に指定した目的関数を最小化する。

def optimize(trials):
    space = {
         'learning_rate': hp.quniform('learning_rate', 0.1, 1, 0.05),
         'objective': 'reg:linear',
         'booster': 'dart',
         'sample_type': 'uniform',
         'normalize_type': 'tree',
         'rate_drop': hp.quniform('rate_drop', 0.1, 0.8, 0.1),
         'skip_drop': hp.quniform('skip_drop', 0.1, 0.8, 0.1),
         'max_depth' : hp.quniform('max_depth', 6, 13, 1),
         'silent': True,
    }

    # minimize the objective over the space
    best_params = fmin(
        fn=objective,
        space=space,
        algo=tpe.suggest,
        trials=trials,
        max_evals=10
    )

DART と Hyperopt を試してみる

使うデータセットは Boston house-prices で米国ボストンにおける506の地域別の住宅価格のデータ。
住宅価格の中央値を目的変数として, 住宅価格に影響を与えると考えられる築年数, 高速道路へのアクセスしやすさ, 固定資産税率, 犯罪率などの説明変数から予測する回帰問題。

DARTモデルのハイパーパラメータチューニングを Hyperopt で行ってみた。

最終的なコードは以下。得られたパラメータでの MSE は 23.31 となった。

# -*- coding: utf-8 -*-

import numpy as np
import xgboost as xgb
from hyperopt import fmin, tpe, hp, STATUS_OK, Trials
from sklearn import datasets, metrics, cross_validation

def objective(params):
    # https://github.com/dmlc/xgboost/issues/1034
    params['max_depth'] = int(params['max_depth'])

    skf = cross_validation.StratifiedKFold(
        train_y, # Samples to split in K folds
        n_folds=5, # Number of folds. Must be at least 2.
        shuffle=True, # Whether to shuffle each stratification of the data before splitting into batches.
        random_state=30 # pseudo-random number generator state used for shuffling
    )

    boost_rounds = []
    score = []

    for train, test in skf:
        _train_x, _test_x, _train_y, _test_y = \
            train_x[train], train_x[test], train_y[train], train_y[test]

        train_xd = xgb.DMatrix(_train_x, label=_train_y)
        test_xd = xgb.DMatrix(_test_x, label=_test_y)
        watchlist = [(train_xd, 'train'),(test_xd, 'eval')]

        model = xgb.train(
            params,
            train_xd,
            num_boost_round=100,
            evals=watchlist,
            early_stopping_rounds=30
        )

        boost_rounds.append(model.best_iteration)
        score.append(model.best_score)

    print('average of best iteration:', np.average(boost_rounds))
    return {'loss': np.average(score), 'status': STATUS_OK}

def optimize(trials):
    space = {
         'learning_rate': hp.quniform('learning_rate', 0.1, 1, 0.05),
         'objective': 'reg:linear',
         'booster': 'dart',
         'sample_type': 'uniform',
         'normalize_type': 'tree',
         'rate_drop': hp.quniform('rate_drop', 0.1, 0.8, 0.1),
         'skip_drop': hp.quniform('skip_drop', 0.1, 0.8, 0.1),
         'max_depth' : hp.quniform('max_depth', 6, 13, 1),
         'silent': True,
    }

    # minimize the objective over the space
    best_params = fmin(
        fn=objective,
        space=space,
        algo=tpe.suggest,
        trials=trials,
        max_evals=10
    )

    return best_params

if __name__ == '__main__':
    np.random.seed(131)

    boston = datasets.load_boston()
    train_x = boston.data[0:400,:]
    train_y = boston.target[0:400]
    test_x = boston.data[401:501,:]
    test_y = boston.target[401:501]

    trials = Trials()
    best_params = optimize(trials)
    print(best_params)
    print(objective(best_params))

    train_xd = xgb.DMatrix(np.array(train_x), label=train_y)
    bst = xgb.train(best_params, train_xd, num_boost_round=100)

    pred_y = bst.predict(xgb.DMatrix(test_x))
    mse = metrics.mean_squared_error(test_y, pred_y)
    print(mse)

    bst.save_model('./model/housing-hyperopt.model')

[1] complete-guide-parameter-tuning-xgboost-with-codes-python
[2] PythonでXGBoostをちゃんと理解する(3) hyperoptでパラメーターチューニング
[3] scikit-learn に付属しているデータセット
[4] bergstra_hyperopt
[5] sklearn.metrics.mean_squared_error
[6] XGBoostにDart boosterを追加しました
[7] Feature Importance and Feature Selection With XGBoost in Python
[8] ランダムフォレスト系ツールで特徴量の重要度を測る
[9] Difference between Random Forest and MART