So here’s my problem. I wrote a program that is perfectly able to get all of the information I want on the first page that I load. But when I click on the nextPage
button it runs a script that loads the next bunch of products without actually moving to another page.
So when I run the next loop all that happens is that I get the same content of the first one, even when the ones on the browser I’m emulating itself is different.
This is the code I run:
from selenium import webdriver from selenium.webdriver.common.by import By from bs4 import BeautifulSoup import time driver.get("https://www.my-website.com/search/results-34y1i") soup = BeautifulSoup(driver.page_source, 'html.parser') time.sleep(2) # /////////// code to find total number of pages currentPage = 0 button_NextPage = driver.find_element(By.ID, 'nextButton') while currentPage != totalPages: # ///////// code to find the products currentPage += 1 button_NextPage = driver.find_element(By.ID, 'nextButton') button_NextPage.click() time.sleep(5)
Is there any way for me to scrape exactly what’s loaded on my browser?
Advertisement
Answer
The issue it seems to be because you’re just fetching the page 1 as shown in the next line:
driver.get("https://www.tcgplayer.com/search/magic/commander-streets-of-new-capenna?productLineName=magic&setName=commander-streets-of-new-capenna&page=1&view=grid")
But as you can see there’s a query parameter called page
in the url that determines which html’s page you are fetching. So what you’ll have to do is every time you’re looping to a new page you’ll have to fetch the new html content with the driver by changing the page
query parameter. For example in your loop it will be something like this:
driver.get("https://www.tcgplayer.com/search/magic/commander-streets-of-new-capenna?productLineName=magic&setName=commander-streets-of-new-capenna&page={page}&view=grid".format(page = currentPage))
And after you fetch the new html structure you’ll be able to access to the new elements that are present in the differente pages as you require.