Skip to content
Advertisement

pandas: Create new column by comparing DataFrame rows with columns of another DataFrame

Assume I have df1:

df1= pd.DataFrame({'alligator_apple': range(1, 11),
                  'barbadine': range(11, 21),
                  'capulin_cherry': range(21, 31)})

   alligator_apple  barbadine  capulin_cherry
0                1         11              21
1                2         12              22
2                3         13              23
3                4         14              24
4                5         15              25
5                6         16              26
6                7         17              27
7                8         18              28
8                9         19              29
9               10         20              30

And a df2:

df2= pd.DataFrame({'alligator_apple': [6, 7, 15, 5],
                  'barbadine': [3, 19, 25, 12],
                  'capulin_cherry': [1, 9, 15, 27]})

   alligator_apple  barbadine  capulin_cherry
0                6          3               1
1                7         19               9
2               15         25              15
3                5         12              27

I’m looking for a way to create a new column in df2 that gets number of rows based on a condition where all columns in df1 has values greater than their counterparts in df2 for each row. For example:

   alligator_apple  barbadine  capulin_cherry  greater
0                6          3               1       4
1                7         19               9       1
2               15         25              15       0
3                5         12              27       3

To elaborate, at row 0 of df2, df1.alligator_apple has 4 rows which values are higher than df2.alligator_apple with the value of 6. df1.barbadine has 10 rows which values are higher than df2.barbadine with value of 3, while similarly df1.capulin_cherry has 10 rows.

Finally, apply an ‘and’ condition to all aforementioned conditions to get the number ‘4’ of df2.greater of first row. Repeat for the rest of rows in df2.

Is there a simple way to do this?

Advertisement

Answer

I believe this does what you want:

df2['greater'] = df2.apply(
    lambda row: 
    (df1['alligator_apple'] > row['alligator_apple']) & 
    (df1['barbadine'] > row['barbadine']) & 
    (df1['capulin_cherry'] > row['capulin_cherry']), 
    axis=1,
).sum(axis=1)

print(df2)

output:

   alligator_apple  barbadine  capulin_cherry  greater
0                6          3               1        4
1                7         19               9        1
2               15         25              15        0
3                5         12              27        3

Edit: if you want to generalize and apply this logic for a given column set, we can use functools.reduce together with operator.and_:

import functools
import operator

columns = ['alligator_apple', 'barbadine', 'capulin_cherry']

df2['greater'] = df2.apply(
    lambda row: functools.reduce(
        operator.and_, 
        (df1[column] > row[column] for column in columns),
    ), 
    axis=1,
).sum(axis=1)
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement