Python Web Scraping – handling page 404 errors

Question

I am performing web scraping in via Python  Selenium  Chrome headless driver which involves executing a loop: However, sometime the page might not exist when the customer ID is certain number. I have no control over this and the code stops with page not found 404 error. How do I ignore this though and just mo…

Accepted Answer

You can check the page body h1 tag what the text appeared when it comes 404 error and then you can put that in if clause to check if not then go inside the block.CustId=2000while (CustId<=3000):    urlg = f'https://mywebsite.com/customerRest/show/?id={CustId}'    driver.get(urlg)    soup = BeautifulSoup(driver.page_source,"lxml")    if not "Page not found" in soup.find("body").text:           dict_from_json = json.loads(soup.find("body").text)      #logic for webscraping is here......    CustId=CustId+1OrCustId=2000while (CustId<=3000):    urlg = f'https://mywebsite.com/customerRest/show/?id={CustId}'    driver.get(urlg)    soup = BeautifulSoup(driver.page_source,"lxml")    if not "404" in soup.find("body").text:           dict_from_json = json.loads(soup.find("body").text)      #logic for webscraping is here......    CustId=CustId+1

Advertisement

Answer