Pandas DataFrame: How do I create numerical values out of numerical values from another column?

I have probably not explained my issue right in the headline, so let’s try to clarify it here.

I want to categorise values from 1 column into a new one.

The first ten lines in my data set are this:

index,id,june,july,difference,score
0,117600,1799.0,0.0,-1.0,0.0
1,35117,707.0,1345.0,0.9024045261669024,100.0
2,95970,660.0,99.0,-0.85,100.0
3,125450,639.0,747.0,0.16901408450704225,100.0
4,32910,527.0,1395.0,1.6470588235294117,100.0
5,68409,466.0,549.0,0.1781115879828326,100.0
6,30059,464.0,831.0,0.790948275862069,100.0
7,60108,347.0,740.0,1.1325648414985592,100.0
8,28749,314.0,616.0,0.9617834394904459,100.0
9,60112,300.0,496.0,0.6533333333333333,100.0
10,57643,294.0,536.0,0.8231292517006803,100.0

JavaScript
​x
 
index,id,june,july,difference,score
0,117600,1799.0,0.0,-1.0,0.0
1,35117,707.0,1345.0,0.9024045261669024,100.0
2,95970,660.0,99.0,-0.85,100.0
3,125450,639.0,747.0,0.16901408450704225,100.0
4,32910,527.0,1395.0,1.6470588235294117,100.0
5,68409,466.0,549.0,0.1781115879828326,100.0
6,30059,464.0,831.0,0.790948275862069,100.0
7,60108,347.0,740.0,1.1325648414985592,100.0
8,28749,314.0,616.0,0.9617834394904459,100.0
9,60112,300.0,496.0,0.6533333333333333,100.0
10,57643,294.0,536.0,0.8231292517006803,100.0
​

And the code I use is this

df2['score'] = np.where(df2['score'] > 0.25, 55, df['score'])
df2['score'] = np.where(df2['score'] > 0.5, 65, df['score'])
df2['score'] = np.where(df2['score'] > 0.8, 85, df['score'])
df2['score'] = np.where(df2['score'] > 1, 100, df['score'])

df2['score'] = np.where(df2['score'] == -1, 0, df['score'])
df2['score'] = np.where(df2['score'] < -0.9, 5, df['score'])
df2['score'] = np.where(df2['score'] < -0.5, 25, df['score'])
df2['score'] = np.where(df2['score'] < -0.25, 30, df['score'])

df2

JavaScript
 
df2['score'] = np.where(df2['score'] > 0.25, 55, df['score'])
df2['score'] = np.where(df2['score'] > 0.5, 65, df['score'])
df2['score'] = np.where(df2['score'] > 0.8, 85, df['score'])
df2['score'] = np.where(df2['score'] > 1, 100, df['score'])
​
df2['score'] = np.where(df2['score'] == -1, 0, df['score'])
df2['score'] = np.where(df2['score'] < -0.9, 5, df['score'])
df2['score'] = np.where(df2['score'] < -0.5, 25, df['score'])
df2['score'] = np.where(df2['score'] < -0.25, 30, df['score'])
​
df2
​

I think this can be done easier with a user-defined function, but I got stuck on that. There are many issues with this code and I can’t figure out how to fix it. Why does it see -0.8 as a value higher than 1? If you only run the code for the negative values, it works so why is that?

If anyone can give me a hand that would be fantastic.

Answer

If you want to split values in a column into several ranges, you can use pandas.cut(). It transforms each value into a range that the value belongs to.

More information here: https://pandas.pydata.org/docs/reference/api/pandas.cut.html

Advertisement

Answer