I know similar questions were asked before but no adaptation of other solutions yielded the desired result. Suppose a bs4 soup contains many elements like the one below:
<a class="employee background-white text-center col-xs-6 col-sm-4 col-md-3" data-cid="74" href="extract_this_link"> <div class="image" style="background-image: url(xxx.jpg) !important"> <div class="overlay flex center"> <div class="background"> </div> </div> </div> <div class="bubble-description"> <p> <b> content1 </b> <br/> content2 </p> </div> </a> <a class="hidden" href="link1"> </a> <a class="hidden" href="link2"> </a> <a class="hidden" href="link3"> </a>
How can I extract the link in the very first line (href=”extract_this_link”) for all elements in the soup and store them in a list?
Any help is greatly appreciated!
Advertisement
Answer
goal = [x['href'] for x in soup.select_one('.employee')]