I know similar questions were asked before but no adaptation of other solutions yielded the desired result. Suppose a bs4 soup contains many elements like the one below:
JavaScript
x
24
24
1
<a class="employee background-white text-center col-xs-6 col-sm-4 col-md-3" data-cid="74" href="extract_this_link">
2
<div class="image" style="background-image: url(xxx.jpg) !important">
3
<div class="overlay flex center">
4
<div class="background">
5
</div>
6
</div>
7
</div>
8
<div class="bubble-description">
9
<p>
10
<b>
11
content1
12
</b>
13
<br/>
14
content2
15
</p>
16
</div>
17
</a>
18
<a class="hidden" href="link1">
19
</a>
20
<a class="hidden" href="link2">
21
</a>
22
<a class="hidden" href="link3">
23
</a>
24
How can I extract the link in the very first line (href=”extract_this_link”) for all elements in the soup and store them in a list?
Any help is greatly appreciated!
Advertisement
Answer
JavaScript
1
2
1
goal = [x['href'] for x in soup.select_one('.employee')]
2