Connection timeouts as a protection from site scraping?

Question

I am new to Python and Web scraping but it's been two weeks that I periodically scrape one website and successfully download images from it. I use different proxies and sometimes change them. But starting yesterday all my proxies suddenly stopped working with a timeout error. I've tried a whole list of them and all fail. Could this be a

Accepted Answer

This will GET the URL and retry 3 times in case of ConnectTimeoutError. It will help to apply delays between attempts to avoid failing again in case of periodic request quota.Take a look at urllib3.util.retry.Retry, it has many options to simplify retries.import requestsfrom requests.adapters import HTTPAdapterfrom urllib3.util.retry import Retryfrom bs4 import BeautifulSoupheader = {    "User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36'}url = 'https://parovoz.com/newgallery/index.php?&LNG=RU&NO_ICONS=0&CATEG=-1&HOWMANY=192'session = requests.Session()retry = Retry(connect=3, backoff_factor=0.5)adapter = HTTPAdapter(max_retries=retry)session.mount('http://', adapter)session.mount('https://', adapter)html = session.get(url, headers=header).textsoup = BeautifulSoup(html, 'lxml')print(soup)

Advertisement

Answer