Im creating a model using optuna lightgbm integration, My training set has some categorical features and i pass those features to the model using the lgb.Dataset
class, here is the code im using ( NOTE: X_train, X_val, y_train, y_val are all pandas dataframes ).
JavaScript
x
26
26
1
import lightgbm as lgb
2
3
grid = {
4
5
6
'boosting': 'gbdt',
7
'metric': ['huber', 'rmse' , 'mape'],
8
'verbose':1
9
10
}
11
12
X_train, X_val, y_train, y_val = train_test_split(X, y)
13
14
cat_features = [ col for col in X_train if col.startswith('cat') ]
15
16
dval = Dataset(X_val, label=y_val, categorical_feature=cat_features)
17
dtrain = Dataset(X_train, label=y_train, categorical_feature=cat_features)
18
19
model = lgb.train(
20
grid,
21
dtrain,
22
valid_sets=[dval],
23
early_stopping_rounds=100)
24
25
26
Every time the lgb.train
function is called, i get the following user warning
JavaScript
1
3
1
UserWarning: categorical_column in param dict is overridden.
2
3
I believe that lighgbm is not treating my categorical features the way it should, someone knows how to fix this issue? Am i using the parameter correctly?
Advertisement
Answer
In case of picking the name (not indexes) of those columns, add as well the feature_name
parameters as the documentation states
That said, your dval
and dtrain
will be initialized as follow:
JavaScript
1
3
1
dval = Dataset(X_val, label=y_val, feature_name=cat_features, categorical_feature=cat_features)
2
dtrain = Dataset(X_train, label=y_train, feature_name=cat_features, categorical_feature=cat_features)
3