I’ve been trying to map a column from my df into 4 categories (binning) but, the column contains mixed values in it: int and str, it looks something like this:
JavaScript
x
2
1
df['data_column'] = ['22', '8', '11', 'Text', '17', 'Text', '6']
2
The categories I’ve been tring to change them to:
JavaScript
1
5
1
- 1 to 10: superb
2
- 10 to 20: awesome
3
- 20 to 30: great
4
- 'Text': text
5
This has been the way I’ve been trying to solve this:
JavaScript
1
9
1
my_criteria = [df['data_column'][df['data_column'] != 'Text'].astype('int64').between(1, 10),
2
df['data_column'][df['data_column'] != 'Text'].astype('int64').between(10, 20),
3
df['data_column'][df['data_column'] != 'Text'].astype('int64').between(20, 30),
4
df['data_column'][df['data_column'] == 'Text']]
5
6
my_values = ['superb', 'awesome', 'great', 'text']
7
8
df['data_column'] = np.select(my_ criteria, my_ values, 0)
9
But, I get this error: ValueError: shape mismatch: objects cannot be broadcast to a single shape.
How can I fix this? Any help is welcomed. The desired output:
JavaScript
1
2
1
df['data_column'] = ['great', 'superb', 'awesome', text', 'awesome', 'text', 'superb']
2
Thank you in advance!
Advertisement
Answer
All values in your condlist
for np.select
must be the same length. Yours are not.
You can use pd.to_numeric
with errors='coerce'
to force values to convert to numeric.
Then, use pd.cut
to create your bins. Convert back to strings from categorical, and replace 'nan'
entries with 'text'
.
Given:
JavaScript
1
9
1
data_column
2
0 22
3
1 8
4
2 11
5
3 Text
6
4 17
7
5 Text
8
6 6
9
Doing:
JavaScript
1
6
1
df.data_column = pd.to_numeric(df.data_column, 'coerce')
2
3
df.data_column = (pd.cut(df.data_column, [1, 10, 20, 30], labels=['superb','awesome','great'])
4
.astype(str)
5
.replace('nan', 'text'))
6
Output:
JavaScript
1
9
1
data_column
2
0 great
3
1 superb
4
2 awesome
5
3 text
6
4 awesome
7
5 text
8
6 superb
9