Skip to content
Advertisement

Python Web Scraping – How to Skip Over Missing Entries?

I am working on a project that involves analyzing the text of political emails from this website: https://politicalemails.org/. I am attempting to scrape all the emails using BeautifulSoup and pandas. I have a working chunk right here:

JavaScript

The above results in pulling the data I want. However, I want to loop through larger chunks of the emails in this archive. Just checking out either one of the following links:

JavaScript

results in a ‘404 Not Found’ error. How can I build a “skip” logic that sees if there is no information to scrape from the website and then moves on to the next iteration? If I used the commented out chunk of code with the email_pages = 50, I will get an error that reads:

JavaScript

How should I approach editing my for loop to account for this behavior?

Advertisement

Answer

I’d advise using a switch case for situations like these.

JavaScript

If your Python version does not support switch-case statements, You could do just the same with an if-else clause.

JavaScript

continue instructs it to move to the next iteration, allowing you to skip the rest of the code since there are no resources retrieved in the request.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement