I am trying to webscrape the trulia estimate for a given address. Although some addresses do not have a trulia estimate. So I want to first try to find the text ‘Trulia estimate’ and if it is found then I will try to find the value. At the moment I cannot figure out how to find the Trulia Estimate text which is shown here:
Here is the code I have so far:
JavaScript
x
29
29
1
from selenium import webdriver
2
from selenium.webdriver.remote import webelement
3
from selenium.webdriver.support.ui import WebDriverWait
4
from selenium.webdriver.common.by import By
5
from selenium.webdriver.common.keys import Keys
6
from selenium.webdriver.support import expected_conditions as EC
7
import pandas as pd
8
import time
9
from bs4 import BeautifulSoup
10
import os
11
from datetime import datetime
12
from selenium.webdriver import ActionChains
13
14
driver = webdriver.Firefox(executable_path = 'C:\Users\Downloads\geckodriver-v0.24.0-win64\geckodriver.exe')
15
def get_trulia_estimate(address):
16
driver.get('https://www.trulia.com/')
17
print(address)
18
element = (By.ID, 'homepageSearchBoxTextInput')
19
WebDriverWait(driver, 10).until(EC.element_to_be_clickable(element)).click()
20
WebDriverWait(driver, 10).until(EC.element_to_be_clickable(element)).send_keys(address)
21
search_button = (By.CSS_SELECTOR, "button[data-auto-test-id='searchButton']")
22
WebDriverWait(driver, 50).until(EC.element_to_be_clickable(search_button)).click()
23
time.sleep(3)
24
soup = BeautifulSoup(driver.page_source, 'html.parser')
25
results = soup.find('div', {'class', 'Text__TextBase-sc-1cait9d-0 OmRik'})
26
print(results)
27
28
get_trulia_estimate('693 Bluebird Canyon Drive, Laguna Beach, CA 92651')
29
Any suggestions are greatly appreciated.
Advertisement
Answer
Version using beautifulsoup
:
JavaScript
1
14
14
1
import requests
2
from bs4 import BeautifulSoup
3
4
url = 'https://www.trulia.com/json/search/location/?query={}&searchType=for_sale'
5
search_string = '693 Bluebird Canyon Drive, Laguna Beach, CA 92651'
6
7
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0'}
8
9
d = requests.get(url.format(search_string), headers=headers).json()
10
property_url = 'https://www.trulia.com' + d['url']
11
12
soup = BeautifulSoup(requests.get(property_url, headers=headers).text, 'lxml')
13
print(soup.select_one('h3:has(+div span:contains("Trulia Estimate"))').text)
14
Prints:
JavaScript
1
2
1
$1,735,031
2
The CSS Selector h3:has(+div span:contains("Trulia Estimate"))
finds <h3>
that has tag <div>
containing <span>
with string “Trulia Estimate” as direct sibling.
Further reading: