I have the following HTML code, I want to extract Years and names, I tried everything with no success :
<div class="Year"> <span class="date">2019</span> </div> <div class="cl2"> <span class="name">name1</span> </div> <div class="cl2"> <span class="name">name2</span> </div> <div class="cl2"> <span class="name">name3</span> </div> <div class="cl2"> <span class="name">name4</span> </div> <div class="Year"> <span class="date">2020</span> </div> <div class="cl2"> <span class="name">name5</span> </div> <div class="cl2"> <span class="name">name6</span> </div>
What I want to get is :
2019 name1 name2 name3 name4 2020 name5 name6
I tried the following, using xpath
years = driver.find_elements_by_xpath("//div[@class='year']") for year in years: print(year.find_element_by_xpath(".//span[@class='date']").text) names = driver.find_elements_by_xpath("//div[@class='name']") for name in names: print(name.find_element_by_xpath(".//span[@class='name']").text)
I got :
2019
2020
name1
name2
name3
name4
name5
name6
Advertisement
Answer
You can get them using xpath and preceding
:
names = dict() for e in driver.find_elements_by_class_name('name'): name = e.text year = e.find_element_by_xpath("(./preceding::span[@class='date'])[last()]").text names[name] = year
{‘name1’: ‘2019’, ‘name2’: ‘2019’, ‘name3’: ‘2019’, ‘name4’: ‘2019’, ‘name5’: ‘2020’, ‘name6’: ‘2020’}
Also you can get all elements and collect using class
:
names = dict() year = None for e in driver.find_elements_by_css_selector('.date, .name'): if 'name' in e.get_attribute('class'): names[e.text] = year if 'date' in e.get_attribute('class'): year = e.text
{‘name1’: ‘2019’, ‘name2’: ‘2019’, ‘name3’: ‘2019’, ‘name4’: ‘2019’, ‘name5’: ‘2020’, ‘name6’: ‘2020’}