Tag: pdf

PDFminer get font size from headers per each page (iteration)

I am quite new to python and PDFminer which is a bit complex for me, what I am trying to achieve is extract the title each page from a pdf file or slides. My approach is getting a list of the text lines and the font size per page, then I will pick the highest number as the slide heading

Loop through folder and subfolders and merge pdf

pdf pypdf python

I tried to create a script to loop through parent folder and subfolders and merge all of the pdfs into one. Below if the code I wrote so far, but I don’t know how to combine them into one script. Reference: Merge PDF files The first function is to loop through all of the subfolders under parent folder and get

Prompt user to choose the name and location when saving a pdf (python)

pdf python tkinter user-interface

How can I change my code so I can save my final pdf (MergedFiles.pdf) with a name chosen by the user and in a location chosen by them. I would like to have a popup(maybe tkinter?) that will give the user the option of choosing the name and location to save the pdf file. Answer You can do this with

Python: password protect PDFs with random passwords and save file name-password

pdf python

I am pretty new to Python, what I am looking for is to bulk protect a series of PDFs files within a folder, each file with a unique password randomly generated – these file name-password combinations should then be saved somewhere (potentially CSV file). Currently using a code that protects all the files within the folder with the same password

Pythons library pdfreader for PDF extraction wont iterate trough pages

pdf python

I want to extract text from PDF file with Python’s lib called pdfreader. I followed the instructions here: https://pdfreader.readthedocs.io/en/latest/tutorial.html#how-to-browse-document-pages This is my code: The code does not give me any errors but the problem is that it does not iterate over pages. Variable total_page_num returns me number of pages (more than 1), but when I go in for loop it

PDF reading, returning empty rows

nlp pdf python

I have a function to read PDF as below: it is working fine on a normal PDF file (like books) I am able to extract the texts easily, but when I tried it at work on “meeting minutes” I got only empty lines like below: Very sorry that I can not share the original PDF however here is a picture

Python – Scraping a PDF file from a URL

pdf python web-scraping

I want to scrape pdf files from this site https://www.sigmaths.net/Reader.php?var=manuels/ph/physique_pilote_7b.pdf I tried this code for that but it doesn’t work. Can anybody tell me why, please? Answer your url is pointing to a reader https://www.sigmaths.net/Reader.php?var=manuels/ph/physique_7b.pdf, remove the ‘reader.php?var= for the actual pdf

How to convert docx to pdf on Mac OS with Python?

converters docx pdf python python-3.x

I’ve looked up several SO and other web pages but I haven’t found anything that works. The script I wrote, opens a docx, changes some words and then saves it in a certain folder as a docx. However, I want it to save it as a pdf but I don’t know how to. This is an example of the code

Use PyPDF2 to detect non-embedded fonts in PDF file generated by Google Docs

fonts google-docs pdf pypdf python

I was hoping someone could help me write a Python function to detect any fonts in the file which are not embedded in the file. I’ve attempted to use the script linked here, and it can detect the documents fonts, but it does not detect fonts which are embedded. I’ve pasted the script below for convenience: For example, I’ve downloaded

How to attach mulitple files in PDF?

attachment pdf pypdf python python-3.x

I have a list of objects: List = [‘Doc1.xlsx’,’Doc2.csv’,’Doc3.pdf’] and a list of their names: List1 = [‘Doc1_name.xlsx’,’Doc2_name.csv’,’Doc3_name.pdf’]. I need to attach them in existing PDF. I tried with the folowing code, which works only if I have one attachement. Now I am trying to iterate over the attachements to attach all of them but in the Final.pdf will be