Skip to content
Advertisement

Using np.select to change mix data types (int and str) in a Pandas column

I’ve been trying to map a column from my df into 4 categories (binning) but, the column contains mixed values in it: int and str, it looks something like this:

df['data_column'] = ['22', '8', '11', 'Text', '17', 'Text', '6']

The categories I’ve been tring to change them to:

- 1 to 10: superb
- 10 to 20: awesome
- 20 to 30: great
- 'Text': text

This has been the way I’ve been trying to solve this:

my_criteria = [df['data_column'][df['data_column'] != 'Text'].astype('int64').between(1, 10),
               df['data_column'][df['data_column'] != 'Text'].astype('int64').between(10, 20),
               df['data_column'][df['data_column'] != 'Text'].astype('int64').between(20, 30),
               df['data_column'][df['data_column'] == 'Text']]

my_values = ['superb', 'awesome', 'great', 'text']

df['data_column'] = np.select(my_ criteria, my_ values, 0)

But, I get this error: ValueError: shape mismatch: objects cannot be broadcast to a single shape. How can I fix this? Any help is welcomed. The desired output:

df['data_column'] = ['great', 'superb', 'awesome', text', 'awesome', 'text', 'superb']

Thank you in advance!

Advertisement

Answer

All values in your condlist for np.select must be the same length. Yours are not.


You can use pd.to_numeric with errors='coerce' to force values to convert to numeric.

Then, use pd.cut to create your bins. Convert back to strings from categorical, and replace 'nan' entries with 'text'.

Given:

  data_column
0          22
1           8
2          11
3        Text
4          17
5        Text
6           6

Doing:

df.data_column = pd.to_numeric(df.data_column, 'coerce')

df.data_column = (pd.cut(df.data_column, [1, 10, 20, 30], labels=['superb','awesome','great'])
                    .astype(str)
                    .replace('nan', 'text'))

Output:

  data_column
0       great
1      superb
2     awesome
3        text
4     awesome
5        text
6      superb
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement