I am working on binary classification and trying to explain my model using SHAP framework.
I am using logistic regression algorithm. I would like to explain this model using both KernelExplainer
and LinearExplainer
.
So, I tried the below code from SO here
JavaScript
x
22
22
1
from sklearn.ensemble import RandomForestClassifier
2
from sklearn.datasets import load_breast_cancer
3
from shap import TreeExplainer, Explanation
4
from shap.plots import waterfall
5
6
X, y = load_breast_cancer(return_X_y=True, as_frame=True)
7
8
idx = 9
9
model = LogisticRegression().fit(X, y)
10
background = shap.maskers.Independent(X, max_samples=100)
11
explainer = KernelExplainer(model,background)
12
sv = explainer(X.iloc[[5]]) # pass the row of interest as df
13
exp = Explanation(
14
sv.values[:, :, 1], # class to explain
15
sv.base_values[:, 1],
16
data=X.iloc[[idx]].values, # pass the row of interest as df
17
feature_names=X.columns,
18
)
19
waterfall(exp[0])
20
21
22
This threw an error as shown below
AssertionError: Unknown type passed as data object: <class ‘shap.maskers._tabular.Independent’>
How can I explain logistic regression
model using SHAP KernelExplainer
and SHAP LinearExplainer?
Advertisement
Answer
Calculation-wise the following will do:
JavaScript
1
18
18
1
from sklearn.linear_model import LogisticRegression
2
from sklearn.datasets import load_breast_cancer
3
4
from shap import LinearExplainer, KernelExplainer, Explanation
5
from shap.plots import waterfall
6
from shap.maskers import Independent
7
8
X, y = load_breast_cancer(return_X_y=True, as_frame=True)
9
10
idx = 9
11
model = LogisticRegression().fit(X, y)
12
13
explainer = KernelExplainer(model.predict, X)
14
sv = explainer.shap_values(X.loc[[5]]) # pass the row of interest as df
15
16
exp = Explanation(sv,explainer.expected_value, data=X.loc[[idx]].values, feature_names=X.columns)
17
waterfall(exp[0])
18
Note: KernelExplainer
doesn’t support maskers, and in this case either loc
or iloc
will return the same.
JavaScript
1
5
1
background = Independent(X, max_samples=100)
2
explainer = LinearExplainer(model,background)
3
sv = explainer(X.loc[[5]]) # pass the row of interest by index
4
waterfall(sv[0])
5
Note here, LinearExplainer
‘s result can be provided to waterfall “as-is”