Skip to content
Advertisement

Importing multiple excel files with similar name, pivoting each excel file and then appending the results into a single file

My problem statement is as above. Below is my progress so far

  1. I want to extract multiple excel files from the same location namely

Test1 Test2 Test3…(I am using glob to do this) (DONE) 2. I want to iterate through the folder and find files starting with a string(DONE) 3. I then formed an empty dataframe. I want to then pivot the 1st file dataframe based on the date(as columns), go to the next file (do the same), and then append my results to a dataframe.

My problem right now is that I am appending all results to the pivot that I created using my first file.

Can someone please help.

import pandas as pd
import numpy as np
import glob
glob.glob("C:/Users/Tom/Desktop/DC")
all_data = pd.DataFrame()
for f in glob.glob("C:/Users/Tom/Desktop/DC/Test?.xlsx"):
    df = pd.read_excel(f)
    pivot = pd.pivot_table(df, index='DC Desc', columns='Est Wk End Date', values=['Shipped/Ordered Units',aggfunc='sum')
    all_data = all_data.append(pivot,ignore_index=True)
all_data.to_excel("outputappended2.xlsx")

Edit.

Thanks so much for your response. This helps a lot. Can you also tell me how before concatenating the next pivot, I can add a new line so that I can differentiate between the results and also sort by date.

Eg. I am getting the following result

DC Desc Apr 24,21 Dec 1,2020 Feb 6,2021 a 5000
b 2000 4000 c 1000

and I am looking for

DC Desc Dec 1,2020 Apr 24,21 Feb 6,2021 a 5000
b 2000 4000

c 1000 Lookingfor This was I can tell what information I am getting from the other files and also sort the columns. Any help is appreciated.

Advertisement

Answer

Your best alternative should be to use pd.concat. A simple approach that I like it is to create a processing function and then concatenate all the dataframes. Something like this:

import pandas as pd
import glob

def pivot_your_data(f):
    df = pd.read_excel(f)
    return pd.pivot_table(df, index='DC Desc', columns='Est Wk End Date', values='Shipped/Ordered Units', aggfunc='sum')

all_data = pd.concat([pivot_your_data(f) for f in glob.glob("C:/Users/Tom/Desktop/DC/Test*.xlsx")])

Then you could drop index or do more data processing, but the main point is to use pd.concat

Advertisement