Optuna lightgbm integration giving categorical features error

Im creating a model using optuna lightgbm integration, My training set has some categorical features and i pass those features to the model using the lgb.Dataset class, here is the code im using ( NOTE: X_train, X_val, y_train, y_val are all pandas dataframes ).

import lightgbm as lgb 

        grid = {
            
       
            'boosting': 'gbdt',
            'metric': ['huber', 'rmse' , 'mape'],
            'verbose':1

        }
        
        X_train, X_val, y_train, y_val = train_test_split(X, y)

        cat_features = [ col for col in X_train if col.startswith('cat') ]

        dval = Dataset(X_val, label=y_val, categorical_feature=cat_features)
        dtrain = Dataset(X_train, label=y_train,  categorical_feature=cat_features)
        
        model = lgb.train(      
                                    grid,
                                    dtrain,
                                    valid_sets=[dval],
                                    early_stopping_rounds=100)

JavaScript
​x
 
import lightgbm as lgb 
​
        grid = {
            
       
            'boosting': 'gbdt',
            'metric': ['huber', 'rmse' , 'mape'],
            'verbose':1
​
        }
        
        X_train, X_val, y_train, y_val = train_test_split(X, y)
​
        cat_features = [ col for col in X_train if col.startswith('cat') ]
​
        dval = Dataset(X_val, label=y_val, categorical_feature=cat_features)
        dtrain = Dataset(X_train, label=y_train,  categorical_feature=cat_features)
        
        model = lgb.train(      
                                    grid,
                                    dtrain,
                                    valid_sets=[dval],
                                    early_stopping_rounds=100)
                                    
​
​

Every time the lgb.train function is called, i get the following user warning

 UserWarning: categorical_column in param dict is overridden.

JavaScript
 
 UserWarning: categorical_column in param dict is overridden.
​
​

I believe that lighgbm is not treating my categorical features the way it should, someone knows how to fix this issue? Am i using the parameter correctly?

Answer

In case of picking the name (not indexes) of those columns, add as well the feature_name parameters as the documentation states

That said, your dval and dtrain will be initialized as follow:

dval = Dataset(X_val, label=y_val, feature_name=cat_features, categorical_feature=cat_features)
dtrain = Dataset(X_train, label=y_train, feature_name=cat_features, categorical_feature=cat_features)

JavaScript
 
dval = Dataset(X_val, label=y_val, feature_name=cat_features, categorical_feature=cat_features)
dtrain = Dataset(X_train, label=y_train, feature_name=cat_features, categorical_feature=cat_features)
​

Advertisement

Answer