I have probably not explained my issue right in the headline, so let’s try to clarify it here.
I want to categorise values from 1 column into a new one.
The first ten lines in my data set are this:
index,id,june,july,difference,score 0,117600,1799.0,0.0,-1.0,0.0 1,35117,707.0,1345.0,0.9024045261669024,100.0 2,95970,660.0,99.0,-0.85,100.0 3,125450,639.0,747.0,0.16901408450704225,100.0 4,32910,527.0,1395.0,1.6470588235294117,100.0 5,68409,466.0,549.0,0.1781115879828326,100.0 6,30059,464.0,831.0,0.790948275862069,100.0 7,60108,347.0,740.0,1.1325648414985592,100.0 8,28749,314.0,616.0,0.9617834394904459,100.0 9,60112,300.0,496.0,0.6533333333333333,100.0 10,57643,294.0,536.0,0.8231292517006803,100.0
And the code I use is this
df2['score'] = np.where(df2['score'] > 0.25, 55, df['score']) df2['score'] = np.where(df2['score'] > 0.5, 65, df['score']) df2['score'] = np.where(df2['score'] > 0.8, 85, df['score']) df2['score'] = np.where(df2['score'] > 1, 100, df['score']) df2['score'] = np.where(df2['score'] == -1, 0, df['score']) df2['score'] = np.where(df2['score'] < -0.9, 5, df['score']) df2['score'] = np.where(df2['score'] < -0.5, 25, df['score']) df2['score'] = np.where(df2['score'] < -0.25, 30, df['score']) df2
I think this can be done easier with a user-defined function, but I got stuck on that. There are many issues with this code and I can’t figure out how to fix it. Why does it see -0.8 as a value higher than 1? If you only run the code for the negative values, it works so why is that?
If anyone can give me a hand that would be fantastic.
Advertisement
Answer
If you want to split values in a column into several ranges, you can use pandas.cut()
. It transforms each value into a range that the value belongs to.
More information here: https://pandas.pydata.org/docs/reference/api/pandas.cut.html