Assume I have df1
:
df1= pd.DataFrame({'alligator_apple': range(1, 11), 'barbadine': range(11, 21), 'capulin_cherry': range(21, 31)}) alligator_apple barbadine capulin_cherry 0 1 11 21 1 2 12 22 2 3 13 23 3 4 14 24 4 5 15 25 5 6 16 26 6 7 17 27 7 8 18 28 8 9 19 29 9 10 20 30
And a df2
:
df2= pd.DataFrame({'alligator_apple': [6, 7, 15, 5], 'barbadine': [3, 19, 25, 12], 'capulin_cherry': [1, 9, 15, 27]}) alligator_apple barbadine capulin_cherry 0 6 3 1 1 7 19 9 2 15 25 15 3 5 12 27
I’m looking for a way to create a new column in df2
that gets number of rows based on a condition where all columns in df1
has values greater than their counterparts in df2
for each row. For example:
alligator_apple barbadine capulin_cherry greater 0 6 3 1 4 1 7 19 9 1 2 15 25 15 0 3 5 12 27 3
To elaborate, at row 0 of df2
, df1.alligator_apple
has 4 rows which values are higher than df2.alligator_apple
with the value of 6. df1.barbadine
has 10 rows which values are higher than df2.barbadine
with value of 3, while similarly df1.capulin_cherry
has 10 rows.
Finally, apply an ‘and’ condition to all aforementioned conditions to get the number ‘4’ of df2.greater
of first row. Repeat for the rest of rows in df2
.
Is there a simple way to do this?
Advertisement
Answer
I believe this does what you want:
df2['greater'] = df2.apply( lambda row: (df1['alligator_apple'] > row['alligator_apple']) & (df1['barbadine'] > row['barbadine']) & (df1['capulin_cherry'] > row['capulin_cherry']), axis=1, ).sum(axis=1) print(df2)
output:
alligator_apple barbadine capulin_cherry greater 0 6 3 1 4 1 7 19 9 1 2 15 25 15 0 3 5 12 27 3
Edit: if you want to generalize and apply this logic for a given column set, we can use functools.reduce
together with operator.and_
:
import functools import operator columns = ['alligator_apple', 'barbadine', 'capulin_cherry'] df2['greater'] = df2.apply( lambda row: functools.reduce( operator.and_, (df1[column] > row[column] for column in columns), ), axis=1, ).sum(axis=1)