Skip to content
Advertisement

Join two columns of sequentially values

I have dataframe, where ‘A’ 1 – client, B – admin I need to merge messages in row with 1 sequentially and merge lines 2 – admin response sequentially across the dataframe.

df1 = pd.DataFrame({'A' : ['a', 'b', 'c', 'd', 'e', 'f', 'h', 'j', 'de', 'be'],
                       'B' : [1, 1, 2, 1, 1, 1, 2, 2, 1, 2]})

df1

    A   B

A   B
0   a   1
1   b   1
2   c   2
3   d   1
4   e   1
5   f   1
6   h   2
7   j   2
8   de  1
9   be  2


I need to get in the end this dataframe:

df2 = pd.DataFrame({'A' : ['a, b', 'd, e, f', 'de'],
                       'B' : ['c', 'h, j', 'be' ]})

Out:

     A       B
0   a,b      c
1   d,e,f   h,j
2   de       be

I do not know how to do this

Advertisement

Answer

Create groups by consecutive values in B – trick compare shifted values with cumulative sum and aggregate first and join. Create helper column for posible pivoting in next step by DataFrame.pivot:

Solution working if exist pairs 1,2 in sequentially order with duplicates.

df = (df1.groupby(df1['B'].ne(df1['B'].shift()).cumsum())
          .agg(B = ('B','first'), A= ('A', ','.join))
          .assign(C = lambda x: x['B'].eq(1).cumsum()))

print (df)
   B      A  C
B             
1  1    a,b  1
2  2      c  1
3  1  d,e,f  2
4  2    h,j  2
5  1     de  3
6  2     be  3

df = (df.pivot('C','B','A')
        .rename(columns={1:'A',2:'B'})
        .reset_index(drop=True).rename_axis(None, axis=1))
print (df)
       A    B
0    a,b    c
1  d,e,f  h,j
2     de   be
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement