GroupBy Pandas with ratio

Question

I am working on a dataset which looks something like this: I am trying to do 2 things: Find length of longest sequence of each type and find ratio of A/B and B/A for those sequences for each ID. Ratio attribute explanation: Calculate the total amount in the longest sequence for each ID(say length n). If the sequence is that

Accepted Answer

Here&#8217;s an attempt:def ratios(df):    df = df.reset_index(drop=True)    groups = (df.Type != df.Type.shift(1)).cumsum()    result = {}    for t in ('A', 'B'):        if t in df.Type.values:            max_num = groups[df.Type.eq(t)].mode().iat[-1]            max_group = df[groups.eq(max_num)]            result[f'Longest_Sequence_{t}'] = len(max_group)            amounts = max_group.Amount.sum()            idx = max_group.index            ratio = None            if t == 'A':                if idx[-1] != df.index[-1] and amounts != 0:                    ratio = df.Amount.at[idx[-1] + 1] / amounts            elif t == 'B':                if idx[0] != df.index[0]:                    denom = df.Amount.at[idx[0] - 1]                    if denom != 0:                        ratio = amounts / denom            result[f'Ratio_{t}'] = ratio        else:            result[f'Longest_Sequence_{t}'] = 0            result[f'Ratio_{t}'] = None    return pd.DataFrame([result])df = df.groupby('ID').apply(ratios).reset_index(level=1, drop=True)The result for dataframe   ID  Amount Type0   1      50    A1   1    1000    A2   1     500    B3   1     200    B4   2    1000    A5   2     500    Bis    Longest_Sequence_A  Ratio_A  Longest_Sequence_B  Ratio_BID                                                          1                    2  0.47619                   2      0.72                    1  0.50000                   1      0.5The naming and ordering of the columns is a bit different, but this shouldn&#8217;t matter.Some explanations (I&#8217;m using the whole dataframe as sample):This groups = (df.Type != df.Type.shift(1)).cumsum() identifies the sequences:0    11    12    23    24    35    4Name: Type, dtype: int64For groups[df.Type.eq('A')]0    11    14    3Name: Type, dtype: int64.mode() identifies the 'A'-sequence number for the sequence of maximal length (in case there are maximal sequences of equal length the .iat[-1] selects the last one):0    1dtype: int64Now here with max_num == 1 this max_group = df[groups.eq(max_num)] selects the respective group with the index from df (the last point is important for the rest):   ID  Amount Type0   1      50    A1   1    1000    AThe rest is trying to follow your calculation instructions and thereby taking care of the edge cases. The use of an index idx relative to df allows to step back and forth in df to select the other values needed for the ratios. (At the beginnig of the function the index is transformed into the standard index, just to make sure, because I want to be able to use +/- on it.)

Advertisement

Answer