I have a dataframe like this :
dte | res | year |
---|---|---|
1995-01-01 | 65.3 | 1995 |
1995-01-02 | 65.5 | 1995 |
… | … | … |
2019-01-03 | 55.2 | 2019 |
2019-01-04 | 52.2 | 2019 |
and I’m trying to create another file in this format :
JavaScript
x
8
1
| year-1995 | res | year-1996 | res | year-1997 | res | year-1998 | res |
2
|----------- |------|------------|------|------------|------|------------|------|
3
| 1995-01-01 | 65.3 | 1996-01-01 | 70.5 | 1997-01-01 | 70.5 | 1998-01-01 | 70.5 |
4
| 1995-01-02 | 65.5 | 1996-01-02 | 68.3 | 1997-01-02 | 68.3 | 1998-01-02 | 68.3 |
5
| 1995-01-03 | 61.8 | 1996-01-03 | 65.2 | 1997-01-03 | 65.2 | 1998-01-03 | 65.2 |
6
| 1995-01-04 | 55.2 | 1996-01-04 | 60.4 | 1997-01-04 | 60.4 | 1998-01-04 | 60.4 |
7
| 1995-01-05 | 52.2 | 1996-01-05 | 55.8 | 1997-01-05 | 55.8 | 1998-01-05 | 55.8 |
8
basically I want every year in a different column.
Here is what I already did :
JavaScript
1
43
43
1
import pandas as pd
2
import numpy as np
3
import openpyxl as op
4
import datetime as dt
5
6
def reader(file) :
7
return pd.read_csv(file,sep=";")
8
9
vierzon=reader("input/vierzon.csv")
10
output="output/writer.xlsx"
11
12
vierzon['dte']=pd.to_datetime(vierzon['dte'])
13
vierzon['date']=vierzon['dte'].dt.date
14
vierzon['year']=pd.DatetimeIndex(vierzon['date']).year
15
del vierzon['date']
16
17
currentYear=vierzon.iloc[0]['year']
18
myDict={}
19
count=0
20
21
for row in vierzon.itertuples():
22
23
if row[3]!=currentYear and count==0 :
24
25
df=pd.DataFrame(myDict,index=[row.dte])
26
df=df.transpose()
27
myDict={row.dte:row.res}
28
currentYear=row[3]
29
30
if row[3]!=currentYear and count!=0 :
31
32
df2=pd.DataFrame(myDict,index=[row.dte])
33
df2=df2.transpose()
34
df=df.append(df2)
35
myDict={row.dte:row.res}
36
currentYear=row[3]
37
38
myDict[row.dte]=row.res
39
40
print(df)
41
42
df.to_excel(output,sheet_name='vierzon')
43
when I write in my loop df=pd.DataFrame(myDict,index=[row.dte])
and df2=pd.DataFrame(myDict,index=[row.dte])
it doesn’t work at all.
so i tried : df=pd.DataFrame(myDict,index=[output])
and df2=pd.DataFrame(myDict,index=[output])
but my years are still on the same column as my first table.
I already checked, df and df2 seems fine, so maybe my issue is in my .append ?
I know it’s about the index, but i have no idea how to fix it, i don’t understand well how it works. Can you help me ?
thank you.
Advertisement
Answer
set a custom index then use unstack
– we can then flatten the subsequent multi index.
JavaScript
1
6
1
#df['dte'] = pd.to_datetime(df['dte'],errors='coerce')
2
df1 = df.set_index([df.groupby('year').cumcount(),'year']).unstack(1)
3
df1 = df1.sort_index(1,level=1)
4
5
df1.columns = [f"{x}-{y}" for x,y in df1.columns]
6
JavaScript
1
4
1
dte-1995 res-1995 dte-2019 res-2019
2
0 1995-02-01 65.3 2019-05-01 55.2
3
1 1995-03-01 65.5 2019-06-01 52.2
4