Convert PDF page to image with pyPDF2 and BytesIO

Question

I have a function that gets a page from a PDF file via pyPdf2 and should convert the first page to a png (or jpg) with Pillow (PIL Fork) That results in an error: OSError: cannot identify image file <_io.BytesIO object at 0x0000023440F3A8E0> I found some threads with a similar issue, (PIL open() method …

Accepted Answer

Per document:  write(stream) Writes the collection of pages added to this object out  as a PDF file.    Parameters:   stream – An object to write the file to. The object must  support the write method and the tell method, similar to a file  object.So the object pdf_bytes contains a PDF file, not an image file. The reason why there are codes like above work is: sometimes, the pdf file just contains a jpeg file as its content. If your pdf is just a normal pdf file, you can&#8217;t just read the bytes and parse it as an image.And refer to as a more robust implementation: https://stackoverflow.com/a/34116472/334999

Advertisement

Answer