I am trying to write a simple Python scraper in order to save all the reviews of a specific place on TripAdvisor.
The specific link I am using as example is the following:
Here is the code I am using, that is supposed to print the relative html
:
from bs4 import BeautifulSoup import requests url = "https://www.tripadvisor.com/Attraction_Review-g319796-d5988326-Reviews-or50-Museo_de_Altamira-Santillana_del_Mar_Cantabria.html" r = requests.get(url) data = r.text soup = BeautifulSoup(data) print(soup)
If I run this code in the console it stays pending on the requests.get(url)
for long without any output. Using another url (for example url = "https://stackoverflow.com/"
) I get immediately the html correctly displayed. Why is TripAdvisor not working? How can I manage to obtain its html?
Advertisement
Answer
Adding an user-agent
should solve your issue in first step, cause some sites provides different content or use it for bot / automation detection – Open DEVTools in your browser an copy the user-agent from one of your requests:
headers = {'User-Agent': 'Mozilla/5.0'} r = requests.get(url,headers=headers)
Example
from bs4 import BeautifulSoup import requests url = "https://www.tripadvisor.com/Attraction_Review-g319796-d5988326-Reviews-or50-Museo_de_Altamira-Santillana_del_Mar_Cantabria.html" headers = {'User-Agent': 'Mozilla/5.0'} r = requests.get(url,headers=headers) data = r.text soup = BeautifulSoup(data) data = [] for e in soup.select('#tab-data-qa-reviews-0 [data-automation="reviewCard"]'): data.append({ 'rating':e.select_one('svg[aria-label]')['aria-label'], 'profilUrl':e.select_one('a[tabindex="0"]').get('href'), 'content':e.select_one('div:has(>a[tabindex="0"]) + div + div').text }) data
Output
[{'rating': '5.0 of 5 bubbles', 'profilUrl': '/ShowUserReviews-g319796-d5988326-r620396152-Museo_de_Altamira-Santillana_del_Mar_Cantabria.html', 'content': "We were fortunate to get in without pre-booking.What a find. A UNESCO site in the middle of the countryside.The replication cave is so awesome and authentic, hard to believe it's not the real thing.The museum is beautifully curated, great for students, and anyone interested in archeology and the beginnings of human existence.Definitely worth visiting. We nearly missed out 😕Read more"}, {'rating': '5.0 of 5 bubbles', 'profilUrl': '/ShowUserReviews-g319796-d5988326-r618358203-Museo_de_Altamira-Santillana_del_Mar_Cantabria.html', 'content': 'Beautiful site with great replica’s of the original cave, excellent exposition, poor film as an introduction however!The most urgent issue: long waiting because you need a slot to enter. This could be done 1000% better and in every decent museum it is done better! Staff probably civil servants with no great desire to make you enjoy the visit. Building urgently needs a revamp, no exposure at all!Read more'},...]