I have a dataframe like this :
dte | res | year |
---|---|---|
1995-01-01 | 65.3 | 1995 |
1995-01-02 | 65.5 | 1995 |
… | … | … |
2019-01-03 | 55.2 | 2019 |
2019-01-04 | 52.2 | 2019 |
and I’m trying to create another file in this format :
| year-1995 | res | year-1996 | res | year-1997 | res | year-1998 | res | |----------- |------|------------|------|------------|------|------------|------| | 1995-01-01 | 65.3 | 1996-01-01 | 70.5 | 1997-01-01 | 70.5 | 1998-01-01 | 70.5 | | 1995-01-02 | 65.5 | 1996-01-02 | 68.3 | 1997-01-02 | 68.3 | 1998-01-02 | 68.3 | | 1995-01-03 | 61.8 | 1996-01-03 | 65.2 | 1997-01-03 | 65.2 | 1998-01-03 | 65.2 | | 1995-01-04 | 55.2 | 1996-01-04 | 60.4 | 1997-01-04 | 60.4 | 1998-01-04 | 60.4 | | 1995-01-05 | 52.2 | 1996-01-05 | 55.8 | 1997-01-05 | 55.8 | 1998-01-05 | 55.8 |
basically I want every year in a different column.
Here is what I already did :
import pandas as pd import numpy as np import openpyxl as op import datetime as dt def reader(file) : return pd.read_csv(file,sep=";") vierzon=reader("input/vierzon.csv") output="output/writer.xlsx" vierzon['dte']=pd.to_datetime(vierzon['dte']) vierzon['date']=vierzon['dte'].dt.date vierzon['year']=pd.DatetimeIndex(vierzon['date']).year del vierzon['date'] currentYear=vierzon.iloc[0]['year'] myDict={} count=0 for row in vierzon.itertuples(): if row[3]!=currentYear and count==0 : df=pd.DataFrame(myDict,index=[row.dte]) df=df.transpose() myDict={row.dte:row.res} currentYear=row[3] if row[3]!=currentYear and count!=0 : df2=pd.DataFrame(myDict,index=[row.dte]) df2=df2.transpose() df=df.append(df2) myDict={row.dte:row.res} currentYear=row[3] myDict[row.dte]=row.res print(df) df.to_excel(output,sheet_name='vierzon')
when I write in my loop df=pd.DataFrame(myDict,index=[row.dte])
and df2=pd.DataFrame(myDict,index=[row.dte])
it doesn’t work at all.
so i tried : df=pd.DataFrame(myDict,index=[output])
and df2=pd.DataFrame(myDict,index=[output])
but my years are still on the same column as my first table.
I already checked, df and df2 seems fine, so maybe my issue is in my .append ?
I know it’s about the index, but i have no idea how to fix it, i don’t understand well how it works. Can you help me ?
thank you.
Advertisement
Answer
set a custom index then use unstack
– we can then flatten the subsequent multi index.
#df['dte'] = pd.to_datetime(df['dte'],errors='coerce') df1 = df.set_index([df.groupby('year').cumcount(),'year']).unstack(1) df1 = df1.sort_index(1,level=1) df1.columns = [f"{x}-{y}" for x,y in df1.columns]
dte-1995 res-1995 dte-2019 res-2019 0 1995-02-01 65.3 2019-05-01 55.2 1 1995-03-01 65.5 2019-06-01 52.2