I am trying to apply resampling for my dataset which has unbalanced classes. What I have done is the following:
JavaScript
x
25
25
1
from sklearn.utils import resample
2
3
y = df.Label
4
5
vectorizer = CountVectorizer()
6
X = vectorizer.fit_transform(df['Text'].replace(np.NaN, ""))
7
8
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, stratify=y)
9
10
# concatenate our training data back together
11
X = pd.concat([X_train, y_train], axis=1)
12
13
# separate minority and majority classes
14
not_df = X[X.Label==0]
15
df = X[X.Label==1]
16
17
# upsample minority
18
df_upsampled = resample(df,
19
replace=True,
20
n_samples=len(not_df),
21
random_state=27)
22
23
# combine majority and upsampled minority
24
upsampled = pd.concat([not_df, df_upsampled])
25
Unfortunately, I am having some problems at this step: X = pd.concat([X_train, y_train], axis=1)
, i.e.
JavaScript
1
16
16
1
/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/concat.py in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
2
279 verify_integrity=verify_integrity,
3
280 copy=copy,
4
--> 281 sort=sort,
5
282 )
6
283
7
8
/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/concat.py in __init__(self, objs, axis, join, keys, levels, names, ignore_index, verify_integrity, copy, sort)
9
355 "only Series and DataFrame objs are valid".format(typ=type(obj))
10
356 )
11
--> 357 raise TypeError(msg)
12
358
13
359 # consolidate
14
15
TypeError: cannot concatenate object of type '<class 'scipy.sparse.csr.csr_matrix'>'; only Series and DataFrame objs are valid
16
You can think of Text column as
JavaScript
1
5
1
Text
2
Have a non-programming question?
3
More helpful links
4
I am trying to apply
5
I hope you can help me to handle with it.
Advertisement
Answer
You have to convert X_train
to a Dataframe before use concat
JavaScript
1
2
1
X = pd.concat([pd.DataFrame(X_train), y_train], axis=1)
2