In pypdf, len(reader.pages)
gives me the total number of pages of a PDF file.
How can I get this using PDFMiner?
Advertisement
Answer
I hate to just leave a code snippet. For context here is a link to the current pdfminer.six repo where you might be able to learn a little more about the resolve1
method.
As you’re working with PDFMiner, you might print and come across some PDFObjRef
objects. Essentially you can use resolve1
to expand those objects (they’re usually a dictionary).
from pdfminer.pdfparser import PDFParser from pdfminer.pdfdocument import PDFDocument from pdfminer.pdfpage import PDFPage from pdfminer.pdfinterp import resolve1 file = open('some_file.pdf', 'rb') parser = PDFParser(file) document = PDFDocument(parser) # This will give you the count of pages print(resolve1(document.catalog['Pages'])['Count'])