Skip to content
Advertisement

Python bs4: select specific links in soup

I know similar questions were asked before but no adaptation of other solutions yielded the desired result. Suppose a bs4 soup contains many elements like the one below:

       <a class="employee background-white text-center col-xs-6 col-sm-4 col-md-3" data-cid="74" href="extract_this_link">
        <div class="image" style="background-image: url(xxx.jpg) !important">
         <div class="overlay flex center">
          <div class="background">
          </div>
         </div>
        </div>
        <div class="bubble-description">
         <p>
          <b>
           content1
          </b>
          <br/>
          content2
         </p>
        </div>
       </a>
       <a class="hidden" href="link1">
       </a>
       <a class="hidden" href="link2">
       </a>
       <a class="hidden" href="link3">
       </a>

How can I extract the link in the very first line (href=”extract_this_link”) for all elements in the soup and store them in a list?

Any help is greatly appreciated!

Advertisement

Answer

goal = [x['href'] for x in soup.select_one('.employee')]
Advertisement