My problem statement is as above. Below is my progress so far
- I want to extract multiple excel files from the same location namely
Test1 Test2 Test3…(I am using glob to do this) (DONE) 2. I want to iterate through the folder and find files starting with a string(DONE) 3. I then formed an empty dataframe. I want to then pivot the 1st file dataframe based on the date(as columns), go to the next file (do the same), and then append my results to a dataframe.
My problem right now is that I am appending all results to the pivot that I created using my first file.
Can someone please help.
import pandas as pd import numpy as np import glob glob.glob("C:/Users/Tom/Desktop/DC") all_data = pd.DataFrame() for f in glob.glob("C:/Users/Tom/Desktop/DC/Test?.xlsx"): df = pd.read_excel(f) pivot = pd.pivot_table(df, index='DC Desc', columns='Est Wk End Date', values=['Shipped/Ordered Units',aggfunc='sum') all_data = all_data.append(pivot,ignore_index=True) all_data.to_excel("outputappended2.xlsx")
Edit.
Thanks so much for your response. This helps a lot. Can you also tell me how before concatenating the next pivot, I can add a new line so that I can differentiate between the results and also sort by date.
Eg. I am getting the following result
DC Desc Apr 24,21 Dec 1,2020 Feb 6,2021
a 5000
b 2000 4000
c 1000
and I am looking for
DC Desc Dec 1,2020 Apr 24,21 Feb 6,2021
a 5000
b 2000 4000
c 1000 Lookingfor This was I can tell what information I am getting from the other files and also sort the columns. Any help is appreciated.
Advertisement
Answer
Your best alternative should be to use pd.concat
. A simple approach that I like it is to create a processing
function and then concatenate all the dataframes. Something like this:
import pandas as pd import glob def pivot_your_data(f): df = pd.read_excel(f) return pd.pivot_table(df, index='DC Desc', columns='Est Wk End Date', values='Shipped/Ordered Units', aggfunc='sum') all_data = pd.concat([pivot_your_data(f) for f in glob.glob("C:/Users/Tom/Desktop/DC/Test*.xlsx")])
Then you could drop index or do more data processing, but the main point is to use pd.concat