I have dataframe, where ‘A’ 1 – client, B – admin I need to merge messages in row with 1 sequentially and merge lines 2 – admin response sequentially across the dataframe.
df1 = pd.DataFrame({'A' : ['a', 'b', 'c', 'd', 'e', 'f', 'h', 'j', 'de', 'be'],
                       'B' : [1, 1, 2, 1, 1, 1, 2, 2, 1, 2]})
df1
    A   B
A   B
0   a   1
1   b   1
2   c   2
3   d   1
4   e   1
5   f   1
6   h   2
7   j   2
8   de  1
9   be  2
I need to get in the end this dataframe:
df2 = pd.DataFrame({'A' : ['a, b', 'd, e, f', 'de'],
                       'B' : ['c', 'h, j', 'be' ]})
Out:
     A       B
0   a,b      c
1   d,e,f   h,j
2   de       be
I do not know how to do this
Advertisement
Answer
Create groups by consecutive values in B – trick compare shifted values with cumulative sum and aggregate first and join. Create helper column for posible pivoting in next step by DataFrame.pivot:
Solution working if exist pairs 1,2 in sequentially order with duplicates.
df = (df1.groupby(df1['B'].ne(df1['B'].shift()).cumsum())
          .agg(B = ('B','first'), A= ('A', ','.join))
          .assign(C = lambda x: x['B'].eq(1).cumsum()))
print (df)
   B      A  C
B             
1  1    a,b  1
2  2      c  1
3  1  d,e,f  2
4  2    h,j  2
5  1     de  3
6  2     be  3
df = (df.pivot('C','B','A')
        .rename(columns={1:'A',2:'B'})
        .reset_index(drop=True).rename_axis(None, axis=1))
print (df)
       A    B
0    a,b    c
1  d,e,f  h,j
2     de   be
