How can I get Html of a website as seen on browser?

Question

A website loads a part of the site after the site is opened, when I use libraries such as request and urllib3, I cannot get the part that is loaded later, how can I get the html of this website as seen in the browser. I can&#8217;t open a browser using Selenium and get html because this process should not

Accepted Answer

You can use the BeautifulSoup library or Selenium to simulate a user-like page loading and waiting to load additional HTML elements.I would suggest using Selenium since it contains the WebDriverWait Class that can help you scrape the additional HTML elements.This is my simple example:from selenium import webdriverfrom selenium.webdriver.common.by import Byfrom selenium.webdriver.support.ui import WebDriverWaitfrom selenium.webdriver.support import expected_conditions as EC# Replace with the URL of the website you wanturl = "https://www.example.com"# Adding the option for headless browseroptions = webdriver.ChromeOptions()options.add_argument("headless")driver = webdriver.Chrome(options=options)# Create a new instance of the Chrome webdriverdriver = webdriver.Chrome()driver.get(url)# Wait for the additional HTML elements to loadwait = WebDriverWait(driver, 10)wait.until(EC.presence_of_all_elements_located((By.XPATH, "//*[contains(@class, 'lazy-load')]")))# Get  HTML html = driver.page_sourceprint(html)driver.close()In the example above you can see that I&#8217;m using an explicit wait to wait (10secs) for a specific condition to occur. More specifically, I&#8217;m waiting until the element with the &#8216;lazy-load&#8217; class is located By.XPath and then I retrieve the HTML elements.Finally, I would recommend checking both BeautifulSoup and Selenium since both have tremendous capabilities for scrapping websites and automating web-based tasks.

Advertisement

Answer