I’m trying to get the top 10 most informative (best) features for a SVM classifier with RBF kernel. As I’m a beginner in programming, I tried some codes that I found online. Unfortunately, none work. I always get the error: ValueError: coef_ is only available when using a linear kernel
.
This is the last code I tested:
scaler = StandardScaler(with_mean=False) enc = LabelEncoder() y = enc.fit_transform(labels) vec = DictVectorizer() feat_sel = SelectKBest(mutual_info_classif, k=200) # Pipeline for SVM classifier clf = SVC() pipe = Pipeline([('vectorizer', vec), ('scaler', StandardScaler(with_mean=False)), ('mutual_info', feat_sel), ('svc', clf)]) y_pred = model_selection.cross_val_predict(pipe, instances, y, cv=10) # Now fit the pipeline using your data pipe.fit(instances, y) def show_most_informative_features(vec, clf, n=10): feature_names = vec.get_feature_names() coefs_with_fns = sorted(zip(clf.coef_[0], feature_names)) top = zip(coefs_with_fns[:n], coefs_with_fns[:-(n + 1):-1]) for (coef_1, fn_1), (coef_2, fn_2) in top: return ('t%.4ft%-15stt%.4ft%-15s' % (coef_1, fn_1, coef_2, fn_2)) print(show_most_informative_features(vec, clf))
Does someone no a way to get the top 10 features from a classifier with a RBF kernel? Or another way to visualize the best features?
Advertisement
Answer
I am not sure if what you are asking is possible for an RBF kernel in a similar way to the example you show (which, as your error suggests, only works with a linear kernel).
However, you could always try feature ablation
; remove each feature one by one and test how that affects performance. The 10 features that most affect performance are your “top 10 features”.
Obviously, this is only possible if (1) you have relatively few features and/or (2) training and testing your model does not take a long time.