Skip to content
Advertisement

Collapsing rows

I have the following table below:

I would like to collapse Code_1 and Code_2 columns based on ID and Date. Based on what I have found online, I have tried the below snippet of code but it does not seem to be working.

df= df.groupby(['ID','Date']).agg(''.join)

DF:

ID Date Count_Code1 Count_Code2 Code_1 Code_2
A1 2022-02-02 90 0 AAAA NaN
A1 2022-02-02 0 50 NaN BBBB
A1 2022-03-14 34 0 AAAA NaN
C1 2022-04-20 0 13 NaN BBBB
C1 2022-04-20 15 0 AAAA NaN

Desired output:

ID Date Count_Code1 Count_Code2 Code_1 Code_2
A1 2022-02-02 90 50 AAAA BBBB
A1 2022-03-14 34 0 AAAA NaN
C1 2022-04-20 15 13 AAAA BBBB

Advertisement

Answer

# groupby and take the group's max
df.groupby(['ID','Date'], as_index=False).max() 
    ID  Date       Code_1   Code_2
0   A1  2022-02-02  AAAA    BBBB
1   A1  2022-03-14  AAAA    0
2   C1  2022-04-20  AAAA    BBBB

Alternate Solution

# replace 0 with null, and ffill, bfill null values on grouped rows
# groupby drops the grouped on keys, so, concat them back
# finally drop the duplicates

pd.concat([df.iloc[:,:2], 
         df.replace(0, np.nan).groupby(['ID', 'Date'], as_index=False ).ffill().bfill()],
         axis=1).drop_duplicates()

ID  Date    Count_Code1     Count_Code2     Code_1  Code_2
0   A1  2022-02-02  90.0    50.0    AAAA    BBBB
2   A1  2022-03-14  34.0    13.0    AAAA    BBBB
3   C1  2022-04-20  15.0    13.0    AAAA    BBBB
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement