Skip to content
Advertisement

How add value in second row into first row?

I would like to add a new columns from a values of ‘Pr’ in second rows for each value same id and date.

Input a:

ID Date order Date restock Pr Infos
778005 2022-04-07 11:34:46.0 NaN 87.0;113001.0;00 a
778005 2022-04-07 11:34:46.0 NaN 87.0;113159.0;FC at
7001 2021-12-10 13:50:46.0 2021-12-13 00:00:00.0 87.0;271007.0;BV b
7001 2021-12-10 13:50:46.0 2021-12-13 00:00:00.0 87.0;286005.0;BV bt
778005 2022-05-24 12:22:56.0 NaN 87.0;113001.0;00 cc
778005 2022-05-24 12:22:56.0 NaN 87.0;113159.0;FC cct
778005 2022-05-23 10:49:34.0 2022-05-29 00:00:00.0 87.0;113001.0;00 d
778005 2022-05-23 10:49:34.0 2022-05-29 00:00:00.0 87.0;113159.0;FC dt
7001 2022-01-06 08:48:01.0 2022-01-13 00:00:00.0 87.0;271007.0;BV rr
7001 2022-01-06 08:48:01.0 2022-01-13 00:00:00.0 87.0;286005.0;BV rrt

Output:

ID Date order Date restock Pr_1 Infos Pr_2
778005 2022-04-07 11:34:46.0 NaN 87.0;113001.0;00 a 87.0;113159.0;FC
7001 2021-12-10 13:50:46.0 2021-12-13 00:00:00.0 87.0;271007.0;BV b 87.0;286005.0;BV
778005 2022-05-24 12:22:56.0 NaN 87.0;113001.0;00 cc 87.0;113159.0;FC
778005 2022-05-23 10:49:34.0 2022-05-29 00:00:00.0 87.0;113001.0;00 d 87.0;113159.0;FC
7001 2022-01-06 08:48:01.0 2022-01-13 00:00:00.0 87.0;271007.0;BV rr 87.0;286005.0;BV

I wrote this code but ValueError: Index contains duplicate entries, cannot reshape

out = (a.pivot('ID', 'Date restock', 'Pr')
       .add_prefix('Pr_').fillna(0)
       .assign(type= a.groupby([['ID', 'Date order', 'Date restock']])['Pr'].first())
       .reset_index()
       .rename_axis(columns=[None]))
out

How can I fix it? Thank you.

Advertisement

Answer

here is one way to do it

taking cumcount to identify the duplicate rows, then concatenating the duplicate row as a column with the original df, backfilling the null values and then keeping only the first row

df['cc'] = df.groupby(['Date order']).cumcount()
df2=pd.concat([df, 
           df[df['cc'] == 1][[ 'Pr']]],  axis=1).fillna(method='bfill')
df2 = df2[df['cc'] == 0].drop(columns='cc')
df2
    ID      Date order                       Date restock   Pr               Infos  Pr
0   778005  2022-04-07 11:34:46.0   NaN                     87.0;113001.0;00    a   87.0;113159.0;FC
2   7001    2021-12-10 13:50:46.0   2021-12-13 00:00:00.0   87.0;271007.0;BV    b   87.0;286005.0;BV
4   778005  2022-05-24 12:22:56.0   NaN                     87.0;113001.0;00    cc  87.0;113159.0;FC
6   778005  2022-05-23 10:49:34.0   2022-05-29 00:00:00.0   87.0;113001.0;00    d   87.0;113159.0;FC
8   7001    2022-01-06 08:48:01.0   2022-01-13 00:00:00.0   87.0;271007.0;BV    rr  87.0;286005.0;BV
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement