モデルの詳細
Python API Reference — xgboost 0.6 documentation にあるパラメータのうち、特に影響が大きい物をグリッドサーチで決定します。xgboostの本論文はKDD2016の以下を参照。最近のコンペでは協力なベースラインとしてみんな使ってますね。
Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges.
コード
from xgboost import XGBRegressor from sklearn.pipeline import Pipeline from sklearn.decomposition import PCA, NMF from sklearn.preprocessing import PolynomialFeatures from sklearn.feature_selection import SelectKBest, chi2 from sklearn.model_selection import GridSearchCV pipeline = Pipeline([ ('clf', XGBRegressor()), ]) params = dict(clf__n_estimators=(10, 20, 30), clf__learning_rate=(.1, .2, .3), clf__max_depth=(2, 3, 4, 5), clf__min_child_weight=(.5, .75, 1.0)) grid_search = GridSearchCV(pipeline, param_grid=params).fit(train_X, train_y) predictions = grid_search.predict(test_X)
参考スライド
www.slideshare.net