I am new to web scraping and i am facing a problem. In the appending part, it seems to append only the first row of the table I want to scrape! I am sure I am missing something. Any ideas? Thanks in advance! The code snippet is the following:
driver = visit_main_page() contents = driver.find_elements_by_xpath('//*[@id="mw-content-text"]/div[1]') tables = contents[0].find_elements_by_xpath('//*[@id="mw-content-text"]/div[1]/table') data = {"Date": [], "Time": [], "Place": [], "Latitude": [], "Longitude": [], "Fatalities": [], "Magnitude": []} for i in tables: try: dates = driver.find_elements_by_xpath('//*[@id="mw-content-text"]/div[1]/table[2]/tbody/tr[1]/td[1]') times = driver.find_elements_by_xpath('//*[@id="mw-content-text"]/div[1]/table[2]/tbody/tr[1]/td[2]') places = driver.find_elements_by_xpath('//*[@id="mw-content-text"]/div[1]/table[2]/tbody/tr[1]/td[3]') lat = driver.find_elements_by_xpath('//*[@id="mw-content-text"]/div[1]/table[2]/tbody/tr[1]/td[4]') long = driver.find_elements_by_xpath('//*[@id="mw-content-text"]/div[1]/table[2]/tbody/tr[1]/td[5]') fat = driver.find_elements_by_xpath('//*[@id="mw-content-text"]/div[1]/table[2]/tbody/tr[1]/td[6]') magn = driver.find_elements_by_xpath('//*[@id="mw-content-text"]/div[1]/table[2]/tbody/tr[1]/td[7]') except NoSuchElementException: print('No such content!') pass time.sleep(1) for d in dates: data['Date'].append(d.text) for t in times: data['Time'].append(t.text) for p in places: data['Place'].append(p.text) for la in lat: data['Latitude'].append(la.text) for lo in long: data['Longitude'].append(lo.text) for f in fat: data['Fatalities'].append(f.text) for m in magn: data['Magnitude'].append(m.text)
Advertisement
Answer
UPD
You are using a wrong locators.
All the parameters you are trying to grab are starting with //*[@id="mw-content-text"]/div[1]/table[2]
– this points to a specific table.
To collect the data you are looking for try this:
dates = driver.find_elements_by_xpath("//table[contains(@class,'wikitable')]//tbody//tr//td[1]") times = driver.find_elements_by_xpath("//table[contains(@class,'wikitable')]//tbody//tr//td[2]") places = driver.find_elements_by_xpath("//table[contains(@class,'wikitable')]//tbody//tr//td[3]") lat = driver.find_elements_by_xpath("//table[contains(@class,'wikitable')]//tbody//tr//td[4]") long = driver.find_elements_by_xpath("//table[contains(@class,'wikitable')]//tbody//tr//td[5]") fat = driver.find_elements_by_xpath("//table[contains(@class,'wikitable')]//tbody//tr//td[6]") magn = driver.find_elements_by_xpath("//table[contains(@class,'wikitable')]//tbody//tr//td[7]") dates = driver.find_elements_by_xpath("//table[contains(@class,'wikitable')]//tbody//tr//td[1]")
This is the main problem. The code after that looks correct.
You have no to get contents
and tables
with this approach