Skip to content

How to scrape only a single href from a div class?

I would like to extract the content of the 1st <a href> from this <div>

<div class="tocDeliverFormatsLinks"><a href="/doi/abs/10.1080/03066150.2021.1956473">Abstract</a> | <a
   class="ref nowrap full" href="/doi/full/10.1080/03066150.2021.1956473">Full Text</a> | <a
   class="ref nowrap references" href="/doi/ref/10.1080/03066150.2021.1956473">References</a> | <a
   class="ref nowrap nocolwiz" target="_blank" title="Opens new window"
   href="/doi/pdf/10.1080/03066150.2021.1956473">PDF (2239 KB)</a> | <a class="ref nowrap epub"
   href="/doi/epub/10.1080/03066150.2021.1956473" target="_blank">EPUB</a> | <a
   class="rightslink" target="_blank" title="Opens new window">Permissions</a>xa0</div>
<a href="/doi/abs/10.1080/03066150.2021.1956473">

I’m using BeautifulSoup and I’m also scraping some other contents from the same page and by using the following solution as result for abstract I’m having None

for article_entry in article_list_items:
    title_article = article_entry.find('span', class_='hlFld-Title').text
    author = article_entry.find('span', class_='articleEntryAuthorsLinks').text
    abstract = article_entry.find('a', class_='tocDeliverFormatsLinks')
    print(author, title_article, abstract)

Saturnino M. Borras Jr., Ian Scoones, Amita Baviskar, Marc Edelman, Nancy Lee Peluso & Wendy Wolford Climate change and agrarian struggles: an invitation to contribute to a JPS Forum None

Is there a system to reach the first href by using something similar to 'a'[:1]?



You can select a list then slicing or use select_one as css selector to select single element as follows:

html_doc = '''<div class="tocDeliverFormatsLinks"><a href="/doi/abs/10.1080/03066150.2021.1956473">Abstract</a> | <a
   class="ref nowrap full" href="/doi/full/10.1080/03066150.2021.1956473">Full Text</a> | <a
   class="ref nowrap references" href="/doi/ref/10.1080/03066150.2021.1956473">References</a> | <a
   class="ref nowrap nocolwiz" target="_blank" title="Opens new window"
   href="/doi/pdf/10.1080/03066150.2021.1956473">PDF (2239 KB)</a> | <a class="ref nowrap epub"
   href="/doi/epub/10.1080/03066150.2021.1956473" target="_blank">EPUB</a> | <a
   class="rightslink" target="_blank" title="Opens new window">Permissions</a>xa0</div>
from bs4 import BeautifulSoup

soup = BeautifulSoup(html_doc, 'html.parser')

href = soup.select_one('div.tocDeliverFormatsLinks a').get('href')

