I want to bin scores from df[‘SCORES’] into 4 bins in a new column called df[‘Remark’] as accomplished the code below (right most column in table below).
However, using the qcut method, this distributes these scores evenly in 1/4 intervals (we specify this in the code below)
df['Remark'] = pd.qcut(df['SCORE'],4,labels = ['Bad','Fair','Good','Excellent'])
That being the case, the only way a Remark of ‘Bad’ can occur is if either columns df[‘banned’] or df[‘charged’] are true = 1.
Is it possible to automatically program the model with code to assign any User having having either a banned or charged field as 1 with a remark of bad, and then, with the remaining users with non banned or non-charged fields, then just divying up by the remaining records by pd.qcuit(df['SCORE'],3)
?
User banned charged score **remark** Sam 1 0 0 Bad Rob 0 0 23 Fair Tom 0 0 54 Good Kim 0 1 65 Bad Nik 0 0 99 Excellent Leo 1 1 3 Bad
Advertisement
Answer
Apply the three-way cut to the “good” data:
not_bad_mask = (df['banned'] == 0) & (df['charged'] == 0) df['remark'] = pd.qcut(df[not_bad_mask]['score'], 3, labels = ['Fair', 'Good', 'Excellent'])
Then add another category to the category list:
df['remark'].cat.add_categories(['Bad'], inplace=True)
And fill in the gaps:
df['remark'].fillna('Bad', inplace=True)