I’m trying to merge multiple file into one excel file using openpyxl in python
I know there is a way using panda, but my files have a problem there have been always 2 empty rows in the beginning of the excel file
So to avoid that I’m using openpyxl with the old way Just open all files and copy the specific rows and columns to a new one
The first step I find out how to do it by just copy the specific row’s and column of the new xlsx file
but I didn’t find a way to add the next file (only the value not the header) under the first one
this my code
So far it just copy the first file (the header and the value) But I didn’t find out how to add the next file (only the value) under the first one
import openpyxl as xl from openpyxl import Workbook import os def find_xlsx_files(): # the current path dir_path = os.path.dirname(os.path.abspath(__file__)) # list to store files res = [] # Iterate directory for file in os.listdir(dir_path): # check only xlsx files if file.endswith('.xlsx'): res.append(file) return (res) wb1 = xl.load_workbook (find_xlsx_files()[0]) ws1 = wb1.worksheets [0] # open target Excel file wb2 = Workbook() ws = wb2.active ws.title = "Changed Sheet" wb2.save(filename = 'sample_book.xlsx') ws2 = wb2.active # calculate the total rows and # columns in the Excel source file mr = ws1.max_row mc = ws1.max_column # copy cell values from source # Excel file to target Excel file for i in range ( 3 , mr + 1 ): for j in range ( 2 , mc + 1 ): # read cell value from Excel source file c = ws1.cell (row = i, column = j) # writing the read value to the target Excel file ws2.cell (row = i, column = j) .value = c.value # save target Excel file wb2.save ( str ('sample_book.xlsx'))
Advertisement
Answer
What you are doing is creating a list of the excel files in the default directory then just opening the first file ‘[0]’ in the list with the line;
wb1 = xl.load_workbook (find_xlsx_files()[0])
This will never attempt to access any other excel file in the list.
Having the list generation in the load book command isn’t good, you don’t want to be generating the list of available excel files each time you process a file. Calling of the function find_xlsx_files() should be done once.
The easiest fix to your code is to get your list of excel files and then iterate that list for processing.
excel_files = find_xlsx_files() for xl_file in excel_files: wb1 = xl.load_workbook(xl_file) ...
Also it should not be necessary to save the book until you have finished writing all data.
The function can be simplified using glob instead if you prefer.
import glob import os from openpyxl import Workbook, load_workbook dir_path = os.path.dirname(os.path.abspath(__file__)) excel_files = glob.glob(dir_path + "/[!~]*.xlsx") for xl_file in excel_files: wb1 = load_workbook(xl_file) ws1 = wb1.worksheets[0] # open target Excel file wb2 = Workbook() ws = wb2.active ws.title = "Changed Sheet" # wb2.save(filename='sample_book.xlsx') ws2 = wb2.active # calculate the total rows and # columns in the Excel source file mr = ws1.max_row mc = ws1.max_column # copy cell values from source # Excel file to target Excel file for i in range(3, mr+1): for j in range(2, mc+1): # read cell value from Excel source file c = ws1.cell(row=i, column=j) # writing the read value to the target Excel file ws2.cell(row=i, column=j).value = c.value # save target Excel file wb2.save(str('sample_book.xlsx'))
This is also assuming there is only one sheet in each excel file you want to process since you’re only opening the first sheet.
ws1 = wb1.worksheets[0]