I am using beautiful soup and requests to print full text of the article of this wedsite
https://www.vanityfair.com/style/society/2014/06/monica-lewinsky-humiliation-culture
This is my code:
JavaScript
x
13
13
1
import requests
2
from bs4 import BeautifulSoup
3
4
url = requests.get("https://www.vanityfair.com/style/society/2014/06/monica-lewinsky-humiliation-culture")
5
html = url.text
6
page = BeautifulSoup(html, 'html.parser')
7
match = page.find_all('div', 'parbase cn_text')
8
page_list = [[k.get_text() for k in i.find_all('p')] for i in match]
9
10
for i in page_list[:-2]:
11
for k in i:
12
print(k + 'n')
13
My code run without any error but it does’t show any text in output plz help me find my error
Advertisement
Answer
What happens?
You try to find_all()
div tags with two classes that do not exists, so match
is empty.
How to fix that?
Use the correct pattern, I took the css selectors to avoid an additional loop:
JavaScript
1
2
1
select('article.article.main-content p')
2
List comprehension then looks like:
JavaScript
1
2
1
[p.get_text() for p in page.select('article.article.main-content p')]
2
Working example
JavaScript
1
10
10
1
import requests
2
from bs4 import BeautifulSoup
3
4
url = requests.get("https://www.vanityfair.com/style/society/2014/06/monica-lewinsky-humiliation-culture")
5
html = url.text
6
page = BeautifulSoup(html, 'html.parser')
7
8
9
print(*[p.get_text() for p in page.select('article.article.main-content p')], sep='n')
10