I have the following HTML code, I want to extract Years and names, I tried everything with no success :
JavaScript
x
34
34
1
<div class="Year">
2
3
<span class="date">2019</span>
4
5
</div>
6
7
8
9
<div class="cl2">
10
<span class="name">name1</span>
11
</div>
12
<div class="cl2">
13
<span class="name">name2</span>
14
</div>
15
<div class="cl2">
16
<span class="name">name3</span>
17
</div>
18
<div class="cl2">
19
<span class="name">name4</span>
20
</div>
21
22
23
24
<div class="Year">
25
<span class="date">2020</span>
26
</div>
27
28
<div class="cl2">
29
<span class="name">name5</span>
30
</div>
31
<div class="cl2">
32
<span class="name">name6</span>
33
</div>
34
What I want to get is :
JavaScript
1
9
1
2019
2
name1
3
name2
4
name3
5
name4
6
2020
7
name5
8
name6
9
I tried the following, using xpath
JavaScript
1
11
11
1
years = driver.find_elements_by_xpath("//div[@class='year']")
2
3
for year in years:
4
5
print(year.find_element_by_xpath(".//span[@class='date']").text)
6
7
names = driver.find_elements_by_xpath("//div[@class='name']")
8
9
for name in names:
10
print(name.find_element_by_xpath(".//span[@class='name']").text)
11
I got :
2019
2020
name1
name2
name3
name4
name5
name6
Advertisement
Answer
You can get them using xpath and preceding
:
JavaScript
1
6
1
names = dict()
2
for e in driver.find_elements_by_class_name('name'):
3
name = e.text
4
year = e.find_element_by_xpath("(./preceding::span[@class='date'])[last()]").text
5
names[name] = year
6
{‘name1’: ‘2019’, ‘name2’: ‘2019’, ‘name3’: ‘2019’, ‘name4’: ‘2019’, ‘name5’: ‘2020’, ‘name6’: ‘2020’}
Also you can get all elements and collect using class
:
JavaScript
1
8
1
names = dict()
2
year = None
3
for e in driver.find_elements_by_css_selector('.date, .name'):
4
if 'name' in e.get_attribute('class'):
5
names[e.text] = year
6
if 'date' in e.get_attribute('class'):
7
year = e.text
8
{‘name1’: ‘2019’, ‘name2’: ‘2019’, ‘name3’: ‘2019’, ‘name4’: ‘2019’, ‘name5’: ‘2020’, ‘name6’: ‘2020’}