I am trying to use Python to rename PDF file using part of the file content. Here is the situation.
The PDF file is a commercial invoice, contains wordings “Commercial Invoice” and “Department”. I want to rename the file to “Commercial Invoice” and ” Department “, such as “353624 HR”.
Here is what I have so far:
from StringIO import StringIO import pyPdf import os # a function here def getPDFContent(path): content = "" num_pages = 10 p = file(path, "rb") pdf = pyPdf.PdfFileReader(p) for i in range(0, num_pages): content += pdf.getPage(i).extractText() + "n" content = " ".join(content.replace(u"xa0", " ").strip().split()) return content # name of the source PDF file PDF_name = '222' # picking texts from the PDF file pdfContent = StringIO(getPDFContent("C:\" + PDF_name + ".pdf").encode("ascii", "ignore")) for line in pdfContent: aaa = line.find(' Commercial Invoice ') CIN = line[aaa + 28: aaa + 38] bbb = line.find('Department') Dpt = line [bbb+20 : bbb+26] final_name = str(CIN + " " + Dpt) print final_name f = open("C:\" + PDF_name + ".pdf") f.close() os.rename("C:\" + PDF_name + ".pdf", "C:\" + final_name + ".pdf")
it works until print out the text extracted ‘ print final_name’, but at the last part when renaming the file, it gives an error ” WindowsError: [Error 32] The process cannot access the file because it is being used by another process”.
What went wrong here? it seems the file was once not closed properly?
Advertisement
Answer
in def getPDFContent(path)
,
after p = file(path, "rb")
,
when the content has been copied,
you need to close the file.
p.close()
put this just after the for loop but in the function.