I am trying to scrape data from all the 37 web pages from this website.
The website I am scrapping doesn’t allow going to the next page through the search bar.
This is the HTML written for the next button.
<a href="javascript:void('Next')" class="next"> <svg viewBox="0 0 36 36" data-use="/cms/svg/site/icon_caret_right.36.svg"> (path tag and data) </svg> </a>
I know that this can be done with Selenium, but is there any way to do this with BeautifulSoup?
Is there any way to scrape data from the next page?
Advertisement
Answer
So you can go to each page using requests
here. It’s through a post request, that then uses the query page parameter to get back the data for sequential pages:
import requests from bs4 import BeautifulSoup import re url = 'https://www.stfrancismedicalcenter.com/find-a-provider/' for page in range(1, 38): print(f'ttPage: {page}') payload = { '_m_': 'FindAPhysician', 'PhysicianSearch$HDR0$PhysicianName': '', 'PhysicianSearch$HDR0$SpecialtyIDs': '', 'PhysicianSearch$HDR0$Distance': '5', 'PhysicianSearch$HDR0$ZipCodeSearch': '', 'PhysicianSearch$HDR0$Keywords': '', 'PhysicianSearch$HDR0$LanguageIDs': '', 'PhysicianSearch$HDR0$Gender': '', 'PhysicianSearch$HDR0$InsuranceIDs': '', 'PhysicianSearch$HDR0$AffiliationIDs': '', 'PhysicianSearch$HDR0$NewPatientsOnly': '', 'PhysicianSearch$HDR0$InNetwork': '', 'PhysicianSearch$HDR0$HasPhoto': '', 'PhysicianSearch$FTR01$PagingID': str(page)} response = requests.post(url, data=payload) soup = BeautifulSoup(response.text, 'html.parser') items = soup.find_all('li', {'class':re.compile("^half item-")}) for item in items: itemName = item.find('div', {'class':'info'}).find_all('span')[0].text itemType = item.find('div', {'class':'info'}).find_all('span')[1].text phone = item.find('li', {'class':'inline-svg phone'}).text.strip() address = item.find('address').text.strip().replace('t','') print(f'n{itemName}n{itemType}n{phone}n{address}n')