Skip to content
Advertisement

python qcut method to bin scores

I want to bin scores from df[‘SCORES’] into 4 bins in a new column called df[‘Remark’] as accomplished the code below (right most column in table below).

However, using the qcut method, this distributes these scores evenly in 1/4 intervals (we specify this in the code below)

df['Remark'] = pd.qcut(df['SCORE'],4,labels = ['Bad','Fair','Good','Excellent'])

That being the case, the only way a Remark of ‘Bad’ can occur is if either columns df[‘banned’] or df[‘charged’] are true = 1.

Is it possible to automatically program the model with code to assign any User having having either a banned or charged field as 1 with a remark of bad, and then, with the remaining users with non banned or non-charged fields, then just divying up by the remaining records by pd.qcuit(df['SCORE'],3) ?

User    banned  charged score   **remark**
Sam      1         0    0        Bad
Rob      0         0    23       Fair
Tom      0         0    54       Good 
Kim      0         1    65       Bad
Nik      0         0    99       Excellent
Leo      1         1    3        Bad

Advertisement

Answer

Apply the three-way cut to the “good” data:

not_bad_mask = (df['banned'] == 0) & (df['charged'] == 0)
df['remark'] = pd.qcut(df[not_bad_mask]['score'], 3, 
                       labels = ['Fair', 'Good', 'Excellent'])

Then add another category to the category list:

df['remark'].cat.add_categories(['Bad'], inplace=True)

And fill in the gaps:

df['remark'].fillna('Bad', inplace=True)
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement