How do I go about incrementally adding data using openpyxl?

Question

I have a folder full of PDFs which I have parsed using Apache Tika, and I have a template excel file which I use to gather specific information from those PDFs and store using openpyxl. The issue I am having is looping through using openpyxl rows. For example, if there is just one PDF in folder, the values go…

Accepted Answer

Edit: I can&#8217;t test this, but maybe it will work. Instead of looping through a range equal to pdfCounter, I&#8217;m looping over the files in input_file so I have access to the current file in the current iteration. The &#8216;value&#8217; variables should be updated with info based on the current iteration&#8217;s file and then written to worksheet.input_file = glob.glob(os.path.join(input_path, '*.pdf'))def get_values(file):    do_stuff()    return value, value2, value3, value4, value5, value6for x,file in enumerate(input_file):    C = UPPER[2]    D = UPPER[3]    F = UPPER[5]    first_cell = f'{C}{3 + (20 * x)}'    second_cell = f'{C}{4 + (20 * x)}'    third_cell = f'{F}{3 + (20 * x)}'    fourth_cell = f'{C}{13 + (20 * x)}'    fifth_cell = f'{C}{15 + (20 * x)}'    sixth_cell = f'{C}{17 + (20 * x)}'    value, value2, value3, value4, value5, value6 = get_values(file)    ws[first_cell] = value    ws[second_cell] = value2    ws[third_cell] = value3    ws[fourth_cell] = value4    ws[fifth_cell] = value5    ws[sixth_cell] = value6output_file = os.path.join(output_dir, "report.xlsm")wb.save(output_file)Original:from openpyxl import load_workbookimport openpyxlfrom string import ascii_uppercase as UPPER# file paths# results from pdfCounterpdfCounter = 5# extract your text# load your workbookwb = load_workbook()ws = wb['Sheet1']# do stufffor x in range(pdfCounter):    C = UPPER[2]    D = UPPER[3]    F = UPPER[5]    first_cell = f'{C}{3 + (20 * x)}'    second_cell = f'{C}{4 + (20 * x)}'    third_cell = f'{F}{3 + (20 * x)}'    fourth_cell = f'{C}{13 + (20 * x)}'    fifth_cell = f'{C}{15 + (20 * x)}'    sixth_cell = f'{C}{17 + (20 * x)}'    ws[first_cell] = value    ws[second_cell] = value2    ws[third_cell] = value3    ws[fourth_cell] = value4    ws[fifth_cell] = value5    ws[sixth_cell] = value6You can use f-strings to make the excel cell identifier easily: make a variable for the column letter component, a variable for the row number component, then combine in the f-string as shown below. For the row number component, you can add the product of the for-loop index and 20 to the starting row number to achieve a +20 increment per iteration.In order for this to work, you would need to do the value, value2, etc. calculations in each iteration of the for-loop. Something like this:def get_values():    return value, value2, value3, value4, value5, value6value, value2, value3, value4, value5, value6 = get_values()Code output with pdfCounter = 5 for the target cells is:C3 C4 F3 C13 C15 C17C23 C24 F23 C33 C35 C37C43 C44 F43 C53 C55 C57C63 C64 F63 C73 C75 C77C83 C84 F83 C93 C95 C97

Advertisement

Answer