I’d like to group by a specific column within a data frame called ‘Fruit’ and calculate the percentage of that particular fruit that are ‘Good’
See below for my initial dataframe
import pandas as pd df = pd.DataFrame({'Fruit': ['Apple','Apple','Banana'], 'Condition': ['Good','Bad','Good']})
Dataframe
Fruit Condition 0 Apple Good 1 Apple Bad 2 Banana Good
See below for my desired output data frame
Fruit Percentage 0 Apple 50% 1 Banana 100%
Note: Because there is 1 “Good” Apple and 1 “Bad” Apple, the percentage of Good Apples is 50%.
See below for my attempt which is overwriting all the columns
groupedDF = df.groupby('Fruit') groupedDF.apply(lambda x: x[(x['Condition'] == 'Good')].count()/x.count())
See below for resulting table, which seems to calculate percentage but within existing columns instead of new column:
Fruit Condition Fruit Apple 0.5 0.5 Banana 1.0 1.0
Advertisement
Answer
We can compare Condition
with eq
and take advantage of the fact that True
is (1) and False
is (0) when processed as numbers and take the groupby mean
over Fruits
:
new_df = ( df['Condition'].eq('Good').groupby(df['Fruit']).mean().reset_index() )
new_df
:
Fruit Condition 0 Apple 0.5 1 Banana 1.0
We can further map
to a format string and rename
to get output into the shown desired output:
new_df = ( df['Condition'].eq('Good') .groupby(df['Fruit']).mean() .map('{:.0%}'.format) # Change to Percent Format .rename('Percentage') # Rename Column to Percentage .reset_index() # Restore RangeIndex and make Fruit a Column )
new_df
:
Fruit Percentage 0 Apple 50% 1 Banana 100%
*Naturally further manipulations can be done as well.