I have this excel file that I would like to “normalize” with the below code by dropping the A and B columns and then dropping the Rows 1,2,3,5 and making the row 4 as header.
df_weekly = pd.read_excel('Run_report.xlsm', sheet_name= "weekly") df_weekly = df_weekly.drop(df_weekly.columns[[0,1]], axis=1) df_weekly = df_weekly.drop(df_weekly.index[[0,1,2,4]]) print(df_weekly.head(8))
But I keep getting this weird df that I dont understand the reason why
Unnamed: 2 Unnamed: 3 ... Unnamed: 53 Unnamed: 54 3 NaN NaN ... NaN NaN 5 EU01 10 ... 5400537629432 NaN 6 EU01 10 ... 5400599118677 NaN 7 EU01 10 ... 5400599027382 NaN 8 EU01 10 ... 5400599028730 NaN 9 EU01 10 ... 5400599028761 NaN 10 EU01 10 ... 5400599613554 NaN 11 EU01 10 ... 5400599028945
Also:
top= df_weekly.columns.values.tolist() print(top) ['Unnamed: 2', 'Unnamed: 3', 'Unnamed: 4', 'Unnamed: 5', 'Unnamed: 6', 'Unnamed: 7', 'Unnamed: 8',....]
Advertisement
Answer
You must explicitely set the column names:
df_weekly = pd.read_excel('Run_report.xlsm', sheet_name= "weekly", header=None) df_weekly = df_weekly.drop(df_weekly.columns[[0,1]], axis=1) df_weekly.columns = df_weekly.loc[3].rename(None) df_weekly = df_weekly.drop(range(5)) print(df_weekly.head(8))