I have the following HTML code, I want to extract Years and names, I tried everything with no success :
<div class="Year">
<span class="date">2019</span>
</div>
<div class="cl2">
<span class="name">name1</span>
</div>
<div class="cl2">
<span class="name">name2</span>
</div>
<div class="cl2">
<span class="name">name3</span>
</div>
<div class="cl2">
<span class="name">name4</span>
</div>
<div class="Year">
<span class="date">2020</span>
</div>
<div class="cl2">
<span class="name">name5</span>
</div>
<div class="cl2">
<span class="name">name6</span>
</div>
What I want to get is :
2019 name1 name2 name3 name4 2020 name5 name6
I tried the following, using xpath
years = driver.find_elements_by_xpath("//div[@class='year']")
for year in years:
print(year.find_element_by_xpath(".//span[@class='date']").text)
names = driver.find_elements_by_xpath("//div[@class='name']")
for name in names:
print(name.find_element_by_xpath(".//span[@class='name']").text)
I got :
2019
2020
name1
name2
name3
name4
name5
name6
Advertisement
Answer
You can get them using xpath and preceding:
names = dict()
for e in driver.find_elements_by_class_name('name'):
name = e.text
year = e.find_element_by_xpath("(./preceding::span[@class='date'])[last()]").text
names[name] = year
{‘name1’: ‘2019’, ‘name2’: ‘2019’, ‘name3’: ‘2019’, ‘name4’: ‘2019’, ‘name5’: ‘2020’, ‘name6’: ‘2020’}
Also you can get all elements and collect using class:
names = dict()
year = None
for e in driver.find_elements_by_css_selector('.date, .name'):
if 'name' in e.get_attribute('class'):
names[e.text] = year
if 'date' in e.get_attribute('class'):
year = e.text
{‘name1’: ‘2019’, ‘name2’: ‘2019’, ‘name3’: ‘2019’, ‘name4’: ‘2019’, ‘name5’: ‘2020’, ‘name6’: ‘2020’}