I am trying to webscrape the trulia estimate for a given address. Although some addresses do not have a trulia estimate. So I want to first try to find the text ‘Trulia estimate’ and if it is found then I will try to find the value. At the moment I cannot figure out how to find the Trulia Estimate text which is shown here:
Here is the code I have so far:
from selenium import webdriver from selenium.webdriver.remote import webelement from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.common.keys import Keys from selenium.webdriver.support import expected_conditions as EC import pandas as pd import time from bs4 import BeautifulSoup import os from datetime import datetime from selenium.webdriver import ActionChains driver = webdriver.Firefox(executable_path = 'C:\Users\Downloads\geckodriver-v0.24.0-win64\geckodriver.exe') def get_trulia_estimate(address): driver.get('https://www.trulia.com/') print(address) element = (By.ID, 'homepageSearchBoxTextInput') WebDriverWait(driver, 10).until(EC.element_to_be_clickable(element)).click() WebDriverWait(driver, 10).until(EC.element_to_be_clickable(element)).send_keys(address) search_button = (By.CSS_SELECTOR, "button[data-auto-test-id='searchButton']") WebDriverWait(driver, 50).until(EC.element_to_be_clickable(search_button)).click() time.sleep(3) soup = BeautifulSoup(driver.page_source, 'html.parser') results = soup.find('div', {'class', 'Text__TextBase-sc-1cait9d-0 OmRik'}) print(results) get_trulia_estimate('693 Bluebird Canyon Drive, Laguna Beach, CA 92651')
Any suggestions are greatly appreciated.
Advertisement
Answer
Version using beautifulsoup
:
import requests from bs4 import BeautifulSoup url = 'https://www.trulia.com/json/search/location/?query={}&searchType=for_sale' search_string = '693 Bluebird Canyon Drive, Laguna Beach, CA 92651' headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0'} d = requests.get(url.format(search_string), headers=headers).json() property_url = 'https://www.trulia.com' + d['url'] soup = BeautifulSoup(requests.get(property_url, headers=headers).text, 'lxml') print(soup.select_one('h3:has(+div span:contains("Trulia Estimate"))').text)
Prints:
$1,735,031
The CSS Selector h3:has(+div span:contains("Trulia Estimate"))
finds <h3>
that has tag <div>
containing <span>
with string “Trulia Estimate” as direct sibling.
Further reading: