I’ve been trying to map a column from my df into 4 categories (binning) but, the column contains mixed values in it: int and str, it looks something like this:
df['data_column'] = ['22', '8', '11', 'Text', '17', 'Text', '6']
The categories I’ve been tring to change them to:
- 1 to 10: superb - 10 to 20: awesome - 20 to 30: great - 'Text': text
This has been the way I’ve been trying to solve this:
my_criteria = [df['data_column'][df['data_column'] != 'Text'].astype('int64').between(1, 10), df['data_column'][df['data_column'] != 'Text'].astype('int64').between(10, 20), df['data_column'][df['data_column'] != 'Text'].astype('int64').between(20, 30), df['data_column'][df['data_column'] == 'Text']] my_values = ['superb', 'awesome', 'great', 'text'] df['data_column'] = np.select(my_ criteria, my_ values, 0)
But, I get this error: ValueError: shape mismatch: objects cannot be broadcast to a single shape.
How can I fix this? Any help is welcomed. The desired output:
df['data_column'] = ['great', 'superb', 'awesome', text', 'awesome', 'text', 'superb']
Thank you in advance!
Advertisement
Answer
All values in your condlist
for np.select
must be the same length. Yours are not.
You can use pd.to_numeric
with errors='coerce'
to force values to convert to numeric.
Then, use pd.cut
to create your bins. Convert back to strings from categorical, and replace 'nan'
entries with 'text'
.
Given:
data_column 0 22 1 8 2 11 3 Text 4 17 5 Text 6 6
Doing:
df.data_column = pd.to_numeric(df.data_column, 'coerce') df.data_column = (pd.cut(df.data_column, [1, 10, 20, 30], labels=['superb','awesome','great']) .astype(str) .replace('nan', 'text'))
Output:
data_column 0 great 1 superb 2 awesome 3 text 4 awesome 5 text 6 superb