Skip to content
Advertisement

How do I go about incrementally adding data using openpyxl?

I have a folder full of PDFs which I have parsed using Apache Tika, and I have a template excel file which I use to gather specific information from those PDFs and store using openpyxl.

The issue I am having is looping through using openpyxl rows.

For example, if there is just one PDF in folder, the values go in:

#C3, C4, F3, C13, C15, C17

but if there are more than one PDF’s, the index is just incremented by 20 for each additional PDF and stored in the same excel file so with 2 PDFs it will store the info in: C23, C24, F24 and so on.

JavaScript

I have a pdfCounter that counts the # of PDF’s in the folder and I am trying to figure out a way to increment the index based on that or if there is a better way to do this.

I just don’t understand how to loop based on # of pdf’s in file and increment the index by 20 so it doesn’t overwrite the same stuff as it is doing right now.

Advertisement

Answer

Edit: I can’t test this, but maybe it will work. Instead of looping through a range equal to pdfCounter, I’m looping over the files in input_file so I have access to the current file in the current iteration. The ‘value’ variables should be updated with info based on the current iteration’s file and then written to worksheet.

JavaScript

Original:

JavaScript

You can use f-strings to make the excel cell identifier easily: make a variable for the column letter component, a variable for the row number component, then combine in the f-string as shown below. For the row number component, you can add the product of the for-loop index and 20 to the starting row number to achieve a +20 increment per iteration.

In order for this to work, you would need to do the value, value2, etc. calculations in each iteration of the for-loop. Something like this:

JavaScript

Code output with pdfCounter = 5 for the target cells is:

JavaScript
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement