I have a DataFrame that contains a column with dates which I’d like to use as my DataFrame’s index. The dates in that column are not necessarily unique – sometimes there might be duplicates. I wish to append duplicates as new columns. Dates that are unique can just contain NaN (or whatever) for the newly appended columns.
To clarify I’ll provide an example:
import pandas as pd data = [ {'Date':'01-01-2020','A':0,'B':1}, {'Date':'01-01-2020','A':2,'B':3}, {'Date':'02-01-2020','A':4,'B':5}, {'Date':'02-01-2020','A':6,'B':7}, {'Date':'03-01-2020','A':8,'B':9}, ] df = pd.DataFrame(data)
This will yield:
Date A B 0 01-01-2020 0 1 1 01-01-2020 2 3 2 02-01-2020 4 5 3 02-01-2020 6 7 4 03-01-2020 8 9
What I want:
A B C D Date 01-01-2020 0 1 2 3 02-01-2020 4 5 6 7 03-01-2020 8 9 NaN NaN
The naming of the newly appended columns can be arbitrary. I don’t even know whether appending would be the right way to go about it. Maybe it’s easier to create a new DataFrame from scratch.
Advertisement
Answer
Use DataFrame.set_index
with DataFrame.stack
for unpivot data and then pivoting by GroupBy.cumcount
and Series.unstack
:
df1 = df.set_index('Date').stack().reset_index(name='val') df = df1.set_index(['Date', df1.groupby('Date').cumcount()])['val'].unstack() print (df) 0 1 2 3 Date 01-01-2020 0.0 1.0 2.0 3.0 02-01-2020 4.0 5.0 6.0 7.0 03-01-2020 8.0 9.0 NaN NaN
If order of output values is not important:
df1 = df.melt('Date') df = df1.set_index(['Date', df1.groupby('Date').cumcount()])['value'].unstack() print (df) 0 1 2 3 Date 01-01-2020 0.0 2.0 1.0 3.0 02-01-2020 4.0 6.0 5.0 7.0 03-01-2020 8.0 9.0 NaN NaN
Another idea is use lambda function for reshape:
df = (df.set_index('Date') .groupby(level=0) .apply(lambda x: pd.Series(x.to_numpy().ravel())) .unstack()) print (df) 0 1 2 3 Date 01-01-2020 0.0 1.0 2.0 3.0 02-01-2020 4.0 5.0 6.0 7.0 03-01-2020 8.0 9.0 NaN NaN