I am quite new to python and PDFminer which is a bit complex for me, what I am trying to achieve is extract the title each page from a pdf file or slides. My approach is getting a list of the text lines and the font size per page, then I will pick the highest number as the slide heading
Tag: pdf
Loop through folder and subfolders and merge pdf
I tried to create a script to loop through parent folder and subfolders and merge all of the pdfs into one. Below if the code I wrote so far, but I don’t know how to combine them into one script. Reference: Merge PDF files The first function is to loop through all of the subfolders under parent folder and get
Prompt user to choose the name and location when saving a pdf (python)
How can I change my code so I can save my final pdf (MergedFiles.pdf) with a name chosen by the user and in a location chosen by them. I would like to have a popup(maybe tkinter?) that will give the user the option of choosing the name and location to save the pdf file. Answer You can do this with
Python: password protect PDFs with random passwords and save file name-password
I am pretty new to Python, what I am looking for is to bulk protect a series of PDFs files within a folder, each file with a unique password randomly generated – these file name-password combinations should then be saved somewhere (potentially CSV file). Currently using a code that protects all the files within the folder with the same password
Pythons library pdfreader for PDF extraction wont iterate trough pages
I want to extract text from PDF file with Python’s lib called pdfreader. I followed the instructions here: https://pdfreader.readthedocs.io/en/latest/tutorial.html#how-to-browse-document-pages This is my code: The code does not give me any errors but the problem is that it does not iterate over pages. Variable total_page_num returns me number of pages (more than 1), but when I go in for loop it
PDF reading, returning empty rows
I have a function to read PDF as below: it is working fine on a normal PDF file (like books) I am able to extract the texts easily, but when I tried it at work on “meeting minutes” I got only empty lines like below: Very sorry that I can not share the original PDF however here is a picture
Python – Scraping a PDF file from a URL
I want to scrape pdf files from this site https://www.sigmaths.net/Reader.php?var=manuels/ph/physique_pilote_7b.pdf I tried this code for that but it doesn’t work. Can anybody tell me why, please? Answer your url is pointing to a reader https://www.sigmaths.net/Reader.php?var=manuels/ph/physique_7b.pdf, remove the ‘reader.php?var= for the actual pdf
How to convert docx to pdf on Mac OS with Python?
I’ve looked up several SO and other web pages but I haven’t found anything that works. The script I wrote, opens a docx, changes some words and then saves it in a certain folder as a docx. However, I want it to save it as a pdf but I don’t know how to. This is an example of the code
Use PyPDF2 to detect non-embedded fonts in PDF file generated by Google Docs
I was hoping someone could help me write a Python function to detect any fonts in the file which are not embedded in the file. I’ve attempted to use the script linked here, and it can detect the documents fonts, but it does not detect fonts which are embedded. I’ve pasted the script below for convenience: For example, I’ve downloaded
How to attach mulitple files in PDF?
I have a list of objects: List = [‘Doc1.xlsx’,’Doc2.csv’,’Doc3.pdf’] and a list of their names: List1 = [‘Doc1_name.xlsx’,’Doc2_name.csv’,’Doc3_name.pdf’]. I need to attach them in existing PDF. I tried with the folowing code, which works only if I have one attachement. Now I am trying to iterate over the attachements to attach all of them but in the Final.pdf will be