I have dataframe, where ‘A’ 1 – client, B – admin I need to merge messages in row with 1 sequentially and merge lines 2 – admin response sequentially across the dataframe.
df1 = pd.DataFrame({'A' : ['a', 'b', 'c', 'd', 'e', 'f', 'h', 'j', 'de', 'be'], 'B' : [1, 1, 2, 1, 1, 1, 2, 2, 1, 2]}) df1 A B A B 0 a 1 1 b 1 2 c 2 3 d 1 4 e 1 5 f 1 6 h 2 7 j 2 8 de 1 9 be 2 I need to get in the end this dataframe: df2 = pd.DataFrame({'A' : ['a, b', 'd, e, f', 'de'], 'B' : ['c', 'h, j', 'be' ]}) Out: A B 0 a,b c 1 d,e,f h,j 2 de be
I do not know how to do this
Advertisement
Answer
Create groups by consecutive values in B
– trick compare shifted values with cumulative sum and aggregate first
and join. Create helper column for posible pivoting in next step by DataFrame.pivot
:
Solution working if exist pairs 1,2
in sequentially order with duplicates.
df = (df1.groupby(df1['B'].ne(df1['B'].shift()).cumsum()) .agg(B = ('B','first'), A= ('A', ','.join)) .assign(C = lambda x: x['B'].eq(1).cumsum())) print (df) B A C B 1 1 a,b 1 2 2 c 1 3 1 d,e,f 2 4 2 h,j 2 5 1 de 3 6 2 be 3 df = (df.pivot('C','B','A') .rename(columns={1:'A',2:'B'}) .reset_index(drop=True).rename_axis(None, axis=1)) print (df) A B 0 a,b c 1 d,e,f h,j 2 de be