Skip to content
Advertisement

set an index while merging two dataframe

I have a dataframe like this :

dte res year
1995-01-01 65.3 1995
1995-01-02 65.5 1995
2019-01-03 55.2 2019
2019-01-04 52.2 2019

and I’m trying to create another file in this format :

| year-1995  |  res |  year-1996 |  res |  year-1997 |  res | year-1998  | res  |
|----------- |------|------------|------|------------|------|------------|------| 
| 1995-01-01 | 65.3 | 1996-01-01 | 70.5 | 1997-01-01 | 70.5 | 1998-01-01 | 70.5 |    
| 1995-01-02 | 65.5 | 1996-01-02 | 68.3 | 1997-01-02 | 68.3 | 1998-01-02 | 68.3 |     
| 1995-01-03 | 61.8 | 1996-01-03 | 65.2 | 1997-01-03 | 65.2 | 1998-01-03 | 65.2 |       
| 1995-01-04 | 55.2 | 1996-01-04 | 60.4 | 1997-01-04 | 60.4 | 1998-01-04 | 60.4 |        
| 1995-01-05 | 52.2 | 1996-01-05 | 55.8 | 1997-01-05 | 55.8 | 1998-01-05 | 55.8 |          

basically I want every year in a different column.

Here is what I already did :

import pandas as pd 
import numpy as np 
import openpyxl as op 
import datetime as dt 

def reader(file) : 
    return pd.read_csv(file,sep=";")

vierzon=reader("input/vierzon.csv") 
output="output/writer.xlsx"

vierzon['dte']=pd.to_datetime(vierzon['dte'])
vierzon['date']=vierzon['dte'].dt.date
vierzon['year']=pd.DatetimeIndex(vierzon['date']).year
del vierzon['date']

currentYear=vierzon.iloc[0]['year']
myDict={}
count=0

for row in vierzon.itertuples(): 

    if row[3]!=currentYear and count==0 :

        df=pd.DataFrame(myDict,index=[row.dte])
        df=df.transpose()    
        myDict={row.dte:row.res}
        currentYear=row[3]

    if row[3]!=currentYear and count!=0 :   

        df2=pd.DataFrame(myDict,index=[row.dte])
        df2=df2.transpose()
        df=df.append(df2)        
        myDict={row.dte:row.res}
        currentYear=row[3]

    myDict[row.dte]=row.res

print(df)

df.to_excel(output,sheet_name='vierzon')

when I write in my loop df=pd.DataFrame(myDict,index=[row.dte]) and df2=pd.DataFrame(myDict,index=[row.dte]) it doesn’t work at all. so i tried : df=pd.DataFrame(myDict,index=[output]) and df2=pd.DataFrame(myDict,index=[output]) but my years are still on the same column as my first table.

I already checked, df and df2 seems fine, so maybe my issue is in my .append ?

I know it’s about the index, but i have no idea how to fix it, i don’t understand well how it works. Can you help me ?

thank you.

Advertisement

Answer

set a custom index then use unstack – we can then flatten the subsequent multi index.

#df['dte'] = pd.to_datetime(df['dte'],errors='coerce')
df1 = df.set_index([df.groupby('year').cumcount(),'year']).unstack(1)
df1 = df1.sort_index(1,level=1)

df1.columns = [f"{x}-{y}" for x,y in df1.columns]

    dte-1995 res-1995   dte-2019 res-2019
0 1995-02-01     65.3 2019-05-01     55.2
1 1995-03-01     65.5 2019-06-01     52.2
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement