I am trying to add values to a column based on a couple of conditions. Here is the code example:
JavaScript
x
12
12
1
Import pandas as pd
2
3
df1 = pd.DataFrame({'Type': ['A', 'A', 'A', 'A', 'B', 'B', 'C', 'C'], 'Val': [20, -10, 20, -10, 30, -20, 40, -30]})
4
df2 = pd.DataFrame({'Type': ['A', 'A', 'B', 'B', 'C', 'C'], 'Cat':['p', 'n', 'p', 'n','p', 'n'], 'Val': [30, -40, 20, -30, 10, -20]})
5
6
for index, _ in df1.iterrows():
7
8
if df1.loc[index,'Val'] >=0:
9
df1.loc[index,'Val'] = df1.loc[index,'Val'] + float(df2.loc[(df2['Type'] == df1.loc[index,'Type']) & (df2['Cat'] == 'p'), 'Val'])
10
else:
11
df1.loc[index,'Val'] = df1.loc[index,'Val'] + float(df2.loc[(df2['Type'] == df1.loc[index,'Type']) & (df2['Cat'] == 'n'), 'Val'])
12
For each value in the ‘Val’ column of df1, I want to add values from df2, based on the type and whether the original value was positive or negative.
The expected output for this example would be alternate 50 and -50 in df1. The above code does the job, but is too slow to be usable for a large data set. Is there a better way to do this?
Advertisement
Answer
Try adding a Cat
column to df1
merge
then sum
val
columns across axis 1 then drop
the extra columns:
JavaScript
1
5
1
df1['Cat'] = np.where(df1['Val'].lt(0), 'n', 'p')
2
df1 = df1.merge(df2, on=['Type', 'Cat'], how='left')
3
df1['Val'] = df1[['Val_x', 'Val_y']].sum(axis=1)
4
df1 = df1.drop(['Cat', 'Val_x', 'Val_y'], 1)
5
JavaScript
1
10
10
1
Type Val
2
0 A 50
3
1 A 50
4
2 A -50
5
3 A -50
6
4 B 50
7
5 B -50
8
6 C 50
9
7 C -50
10
Add new column with np.where
JavaScript
1
2
1
df1['Cat'] = np.where(df1['Val'].lt(0), 'n', 'p')
2
JavaScript
1
10
10
1
Type Val Cat
2
0 A 20 p
3
1 A -10 n
4
2 A 20 p
5
3 A -10 n
6
4 B 30 p
7
5 B -20 n
8
6 C 40 p
9
7 C -30 n
10
merge
on Type
and Cat
JavaScript
1
2
1
df1 = df1.merge(df2, on=['Type', 'Cat'], how='left')
2
JavaScript
1
10
10
1
Type Val_x Cat Val_y
2
0 A 20 p 30
3
1 A -10 n -40
4
2 A 20 p 30
5
3 A -10 n -40
6
4 B 30 p 20
7
5 B -20 n -30
8
6 C 40 p 10
9
7 C -30 n -20
10
sum
Val
columns:
JavaScript
1
2
1
df1['Val'] = df1[['Val_x', 'Val_y']].sum(axis=1)
2
JavaScript
1
10
10
1
Type Val_x Cat Val_y Val
2
0 A 20 p 30 50
3
1 A -10 n -40 -50
4
2 A 20 p 30 50
5
3 A -10 n -40 -50
6
4 B 30 p 20 50
7
5 B -20 n -30 -50
8
6 C 40 p 10 50
9
7 C -30 n -20 -50
10
drop
extra columns:
JavaScript
1
2
1
df1 = df1.drop(['Cat', 'Val_x', 'Val_y'], 1)
2
JavaScript
1
10
10
1
Type Val
2
0 A 50
3
1 A -50
4
2 A 50
5
3 A -50
6
4 B 50
7
5 B -50
8
6 C 50
9
7 C -50
10