How to obtain dataframe from grouped element after using apply

Question

Let&#8217;s say this the dataframe: Then the goal is to produce this: The total Val1 is Y as long as one of the instances is Y. My code looks like this: This works except that cumulative has dtype object and I can only access Val1, that is, I cannot access First Name or Last Name (Although when I run print(cu…

Accepted Answer

Another way is to use groupby.agg where you use max to get &#8220;Y&#8221; if it exists (because Y>N) and count:out = df.groupby(['First Name', 'Last Name'], sort=False, as_index=False)        .agg(Val1=('Val1', 'max'), Total=('Val1', 'count'))Output:  First Name Last Name Val1  Total0     George   Clooney    Y      31     George   Freeman    N      22     Claire     Stark    Y      2You can pass in a lambda that selects based whatever criteria you want. For example, the following aggregates &#8220;Val1&#8221; based on whether the number of &#8220;Y&#8221;s are greater than the number of &#8220;N&#8221;s (if there are more &#8220;Y&#8221;s select &#8220;Y&#8221; else &#8220;N&#8221;):out = df.groupby(['First Name', 'Last Name'], sort=False, as_index=False)        .agg(Val1=('Val1', lambda x: 'Y' if x.eq('Y').sum() > x.eq('N').sum() else 'N'),              Total=('Val1', 'count'))

Advertisement

Answer