Skip to content
Advertisement

Python requests not redirecting

I’m trying to scrape word definitions, but can’t get python to redirect to the correct page. For example, I’m trying to get the definition for the word ‘agenesia’. When you load that page in a browser with https://www.lexico.com/definition/agenesia, the page which loads is https://www.lexico.com/definition/agenesis, however in Python the page doesn’t redirect and gives a 200 status code

URL = 'https://www.lexico.com/definition/agenesia'
page = requests.head(URL, allow_redirects=True)

This is how I’m currently retrieving the page content, I’ve also tried using requests.get but that also doesn’t work

EDIT: Because it isn’t clear, I’m aware that I could change the word to ‘agenesis’ in the URL to get the correct page, but I am scraping a list of words and would rather automatically follow the URL rather than searching in a browser for the redirect by hand first.

EDIT 2: I realised it might be easier to check solutions with the rest of my code, so far this works with agenesis but not agenesia:

soup = BeautifulSoup(page.content, 'html.parser')

print(soup.find("span", {"class": "ind"}).get_text(), 'n')
print(soup.find("span", {"class": "pos"}).get_text())

Advertisement

Answer

Other answers mentioned before doesn’t make your request redirect. The cause is you didn’t use the correct request header. Try code below:

import requests
from bs4 import BeautifulSoup

headers = {
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
}

page = requests.get('https://www.lexico.com/definition/agenesia', headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')

print(page.url)
print(soup.find("span", {"class": "ind"}).get_text(), 'n')
print(soup.find("span", {"class": "pos"}).get_text())

And print:

https://www.lexico.com/definition/agenesis?s=t
Failure of development, or incomplete development, of a part of the body. 

noun
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement