I would like to add a new columns from a values of ‘Pr’ in second rows for each value same id and date.
Input a:
ID | Date order | Date restock | Pr | Infos |
---|---|---|---|---|
778005 | 2022-04-07 11:34:46.0 | NaN | 87.0;113001.0;00 | a |
778005 | 2022-04-07 11:34:46.0 | NaN | 87.0;113159.0;FC | at |
7001 | 2021-12-10 13:50:46.0 | 2021-12-13 00:00:00.0 | 87.0;271007.0;BV | b |
7001 | 2021-12-10 13:50:46.0 | 2021-12-13 00:00:00.0 | 87.0;286005.0;BV | bt |
778005 | 2022-05-24 12:22:56.0 | NaN | 87.0;113001.0;00 | cc |
778005 | 2022-05-24 12:22:56.0 | NaN | 87.0;113159.0;FC | cct |
778005 | 2022-05-23 10:49:34.0 | 2022-05-29 00:00:00.0 | 87.0;113001.0;00 | d |
778005 | 2022-05-23 10:49:34.0 | 2022-05-29 00:00:00.0 | 87.0;113159.0;FC | dt |
7001 | 2022-01-06 08:48:01.0 | 2022-01-13 00:00:00.0 | 87.0;271007.0;BV | rr |
7001 | 2022-01-06 08:48:01.0 | 2022-01-13 00:00:00.0 | 87.0;286005.0;BV | rrt |
Output:
ID | Date order | Date restock | Pr_1 | Infos | Pr_2 |
---|---|---|---|---|---|
778005 | 2022-04-07 11:34:46.0 | NaN | 87.0;113001.0;00 | a | 87.0;113159.0;FC |
7001 | 2021-12-10 13:50:46.0 | 2021-12-13 00:00:00.0 | 87.0;271007.0;BV | b | 87.0;286005.0;BV |
778005 | 2022-05-24 12:22:56.0 | NaN | 87.0;113001.0;00 | cc | 87.0;113159.0;FC |
778005 | 2022-05-23 10:49:34.0 | 2022-05-29 00:00:00.0 | 87.0;113001.0;00 | d | 87.0;113159.0;FC |
7001 | 2022-01-06 08:48:01.0 | 2022-01-13 00:00:00.0 | 87.0;271007.0;BV | rr | 87.0;286005.0;BV |
I wrote this code but ValueError: Index contains duplicate entries, cannot reshape
out = (a.pivot('ID', 'Date restock', 'Pr') .add_prefix('Pr_').fillna(0) .assign(type= a.groupby([['ID', 'Date order', 'Date restock']])['Pr'].first()) .reset_index() .rename_axis(columns=[None])) out
How can I fix it? Thank you.
Advertisement
Answer
here is one way to do it
taking cumcount to identify the duplicate rows, then concatenating the duplicate row as a column with the original df, backfilling the null values and then keeping only the first row
df['cc'] = df.groupby(['Date order']).cumcount() df2=pd.concat([df, df[df['cc'] == 1][[ 'Pr']]], axis=1).fillna(method='bfill') df2 = df2[df['cc'] == 0].drop(columns='cc') df2
ID Date order Date restock Pr Infos Pr 0 778005 2022-04-07 11:34:46.0 NaN 87.0;113001.0;00 a 87.0;113159.0;FC 2 7001 2021-12-10 13:50:46.0 2021-12-13 00:00:00.0 87.0;271007.0;BV b 87.0;286005.0;BV 4 778005 2022-05-24 12:22:56.0 NaN 87.0;113001.0;00 cc 87.0;113159.0;FC 6 778005 2022-05-23 10:49:34.0 2022-05-29 00:00:00.0 87.0;113001.0;00 d 87.0;113159.0;FC 8 7001 2022-01-06 08:48:01.0 2022-01-13 00:00:00.0 87.0;271007.0;BV rr 87.0;286005.0;BV