Hello everyone I'm having trouble while using BeautifulSoup , indeed i don't succeed to access the information that I want, here is my code : The output of this code is that : and what I want is the second 'p' with the information : '10. March 2021' however I don't know how to access this information, I tried :

How to access a specific p tag while using BeautifulSoup

Hello everyone I’m having trouble while using BeautifulSoup , indeed i don’t succeed to access the information that I want, here is my code :

import nltk 

import requests 

from bs4 import BeautifulSoup

import time 

from datetime import date, datetime, timedelta

url = 'https://www.richter-helm.eu/index.php?id=26'

page = requests.get(url)

soup = BeautifulSoup(page.content,'html.parser')

results_date=soup.find_all(class_='csc-textpic-text')[1]


print(results_date)

JavaScript
​x
 
import nltk 
​
import requests 
​
from bs4 import BeautifulSoup
​
import time 
​
from datetime import date, datetime, timedelta
​
url = 'https://www.richter-helm.eu/index.php?id=26'
​
page = requests.get(url)
​
soup = BeautifulSoup(page.content,'html.parser')
​
results_date=soup.find_all(class_='csc-textpic-text')[1]
​
​
print(results_date)
​

The output of this code is that :

</p> class="csc-textpic-text"><p class="bodytext">2021
</p>lass="bodytext">10. March 2021
</p>lass="bodytext">The NDR reports on the important role of Richter-Helm BioLogics as a Germany-based manufacturer of a DNA vaccine to support the Pharma industry and the fight against COVID-19. <a class="external-link-new-window" href="https://www.ndr.de/nachrichten/schleswig-holstein/Corona-In-Bovenau-wird-an-einem-neuen-Impfstoff-gearbeitet,bovenau124.html" target="_blank" title="Opens internal link in current window">See and read more</a> about our contribution to the fight against Corona. 
<p class="bodytext"> </p>
</p>lass="bodytext">15. February 2021

JavaScript
 
</p> class="csc-textpic-text"><p class="bodytext">2021
</p>lass="bodytext">10. March 2021
</p>lass="bodytext">The NDR reports on the important role of Richter-Helm BioLogics as a Germany-based manufacturer of a DNA vaccine to support the Pharma industry and the fight against COVID-19. <a class="external-link-new-window" href="https://www.ndr.de/nachrichten/schleswig-holstein/Corona-In-Bovenau-wird-an-einem-neuen-Impfstoff-gearbeitet,bovenau124.html" target="_blank" title="Opens internal link in current window">See and read more</a> about our contribution to the fight against Corona. 
<p class="bodytext"> </p>
</p>lass="bodytext">15. February 2021
​

and what I want is the second ‘p’ with the information : ’10. March 2021′ however I don’t know how to access this information, I tried : print(results_date.find(‘p’)) instead of print(results_date) but it gives me the first ‘p’ meaning it gives me: ‘p class=”bodytext”>2021’ and it’s stil not what I want, if anyone can help me it will be very nice.

Answer

You can use the :nth-of-type(n) CSS Selector.

To use a CSS Selector, use the .select() method instead of .find_all().

In your example, find the id="c51", and then the second p tag.

import requests
from bs4 import BeautifulSoup

URL = "https://www.richter-helm.eu/index.php?id=26"
soup = BeautifulSoup(requests.get(URL).content, "html.parser")

# The following will select the `id="c51"`, and than the second `p` tag.
print(soup.select_one("#c51 p:nth-of-type(2)").text)

JavaScript
 
import requests
from bs4 import BeautifulSoup
​
URL = "https://www.richter-helm.eu/index.php?id=26"
soup = BeautifulSoup(requests.get(URL).content, "html.parser")
​
# The following will select the `id="c51"`, and than the second `p` tag.
print(soup.select_one("#c51 p:nth-of-type(2)").text)
​

Output:

10. March 2021

JavaScript
 
10. March 2021
​

Advertisement

Answer