Skip to content
Advertisement

How to groupby and calculate new field with python pandas?

I’d like to group by a specific column within a data frame called ‘Fruit’ and calculate the percentage of that particular fruit that are ‘Good’

See below for my initial dataframe

import pandas as pd
df = pd.DataFrame({'Fruit': ['Apple','Apple','Banana'], 'Condition': ['Good','Bad','Good']})

Dataframe

    Fruit   Condition
0   Apple   Good
1   Apple   Bad
2   Banana  Good

See below for my desired output data frame

    Fruit   Percentage
0   Apple   50%
1   Banana  100%

Note: Because there is 1 “Good” Apple and 1 “Bad” Apple, the percentage of Good Apples is 50%.

See below for my attempt which is overwriting all the columns

groupedDF = df.groupby('Fruit')
groupedDF.apply(lambda x: x[(x['Condition'] == 'Good')].count()/x.count())

See below for resulting table, which seems to calculate percentage but within existing columns instead of new column:

        Fruit Condition
Fruit       
Apple   0.5 0.5
Banana  1.0 1.0

Advertisement

Answer

We can compare Condition with eq and take advantage of the fact that True is (1) and False is (0) when processed as numbers and take the groupby mean over Fruits:

new_df = (
    df['Condition'].eq('Good').groupby(df['Fruit']).mean().reset_index()
)

new_df:

    Fruit  Condition
0   Apple        0.5
1  Banana        1.0

We can further map to a format string and rename to get output into the shown desired output:

new_df = (
    df['Condition'].eq('Good')
        .groupby(df['Fruit']).mean()
        .map('{:.0%}'.format)  # Change to Percent Format
        .rename('Percentage')  # Rename Column to Percentage
        .reset_index()  # Restore RangeIndex and make Fruit a Column
)

new_df:

    Fruit Percentage
0   Apple        50%
1  Banana       100%

*Naturally further manipulations can be done as well.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement