Extracting Carbon offset projects from website using beautiful soup and getting nothing

Question

I&#8217;m trying to extract the data from this website(&#8216;https://alliedoffsets.com/#/profile/2). It has many such projects and I want to get the values of Estimated Average Wholesale Price and Estimated Annual Emission Reduction. When, I trying to print the code using beautiful soup it is not giving thos…

Accepted Answer

The web page is rendered in JavaScript so the HTML elements cannot be extracted directly using BeautifulSoup. Selenium can be used to extract the rendered HTML then search for elements by ID, class, XPath, etc.from selenium import webdriverfrom selenium.webdriver.chrome.service import Servicefrom webdriver_manager.chrome import ChromeDriverManagerfrom selenium.webdriver.common.by import Byfrom selenium.webdriver.support import expected_conditions as ECfrom selenium.webdriver.support.ui import WebDriverWaitimport reurl = 'https://alliedoffsets.com/#/profile/1's = Service(ChromeDriverManager().install())driver = webdriver.Chrome(service=s)# web driver goes to pagedriver.get(url)# use WebDriverWait to wait until page is rendered# find Estimated Average Wholesale Priceelt = WebDriverWait(driver, 10).until(        EC.presence_of_element_located((By.ID, 'direct-price-panel'))    )# extract just the price from the textprint(re.sub(r'.*($S+).*', r'1', elt.text))# find Estimated Annual Emission Reductionelt = driver.find_element(By.XPATH, "//*[strong[contains(., 'Estimated Annual Emission Reduction')]]")print(elt.text.split(":")[1])Output: $5.06 11603 tCO2

Advertisement

Answer