I want to open a pdf in my Python program. So far that works.
existing_pdf = PdfFileReader(file(path_to_pdf, "rb"))
Right now I open the pdf from my local disk, but I want it to fetch the pdf from the internet, instead of opening it from my local drive. Note that I don’t wish to save the existing_pdf, once I fetched it from the internet I will manipulate it and then save it.
I think I need BytesIO + urllib2, but I cannot figure it out, can somebody help me?
So lets say I want to create the variable: existing_pdf with content http://tug.ctan.org/tex-archive/macros/latex/contrib/logpap/example.pdf in it, but I don’t wish to download that file first to the disk and then open it. I want to download it ‘in memory’ and create the variable existing_pdf, which I can later modify in my program.
EDIT:
response=urllib2.urlopen("URL") pdf_file = BytesIO(response.read()) existing_pdf = PdfFileReader(pdf_file)
It simply hangs and never finishes PdfFileReader(pdf_file)
.... existing_pdf = PdfFileReader(pdf_file) File "C:Python27libsite-packagespyPdfpdf.py", line 374, in __init__ self.read(stream) File "C:Python27libsite-packagespyPdfpdf.py", line 705, in read line = self.readNextEndLine(stream) File "C:Python27libsite-packagespyPdfpdf.py", line 870, in readNextEndLine line = x + line
Advertisement
Answer
Did you try the requests package?
import requests from StringIO import StringIO r = requests.get(URL) pdf_file = StringIO(r.content) existing_pdf = PdfFileReader(pdf_file)
This worked for me:
import os import urllib2 from io import BytesIO URL = "http://tug.ctan.org/tex-archive/macros/latex/contrib/logpap/example.pdf" response=urllib2.urlopen(URL) p = BytesIO(response.read()) p.seek(0, os.SEEK_END) print p.tell() # 79577