Skip to content
Advertisement

Extracting Carbon offset projects from website using beautiful soup and getting nothing

I’m trying to extract the data from this website(‘https://alliedoffsets.com/#/profile/2). It has many such projects and I want to get the values of Estimated Average Wholesale Price and Estimated Annual Emission Reduction. When, I trying to print the code using beautiful soup it is not giving those tags and giving empty values. I know it could be a basic thing but I’m stuck. May be the data is getting populated on the website using javascript but I cannot figure out a way to do it.

import pandas as pd
import requests
from bs4 import BeautifulSoup

url='https://alliedoffsets.com/#/profile/1'
r=requests.get(url)
url=r.content
soup = BeautifulSoup(url,'html.parser')

tab=soup.find("thead",{"class":"sr-only"})
print(tab)

Advertisement

Answer

The web page is rendered in JavaScript so the HTML elements cannot be extracted directly using BeautifulSoup. Selenium can be used to extract the rendered HTML then search for elements by ID, class, XPath, etc.

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
import re

url = 'https://alliedoffsets.com/#/profile/1'

s = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=s)

# web driver goes to page
driver.get(url)

# use WebDriverWait to wait until page is rendered

# find Estimated Average Wholesale Price
elt = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.ID, 'direct-price-panel'))
    )
# extract just the price from the text
print(re.sub(r'.*($S+).*', r'1', elt.text))

# find Estimated Annual Emission Reduction
elt = driver.find_element(By.XPATH, "//*[strong[contains(., 'Estimated Annual Emission Reduction')]]")
print(elt.text.split(":")[1])

Output:

 $5.06
 11603 tCO2
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement