From this webpage I need to select all tags <b> </b>
with BeautifulSoup4
.
url = "http://lib.ru/GrepSearch?Search=%E3%E5%F0%EE%E9+%ED%E0%F8%E5%E3%EE+%E2%F0%E5%EC%E5%ED%E8" r = requests.get(url) soup = BeautifulSoup(r.text,'html.parser') author = soup.select('b') print(author)
I have tried using find_all()
and select()
but they fail to show all <b>
tags when used in the array
Advertisement
Answer
There are different parsers used in parsing a html document, the most used one is ‘html.parser’. I have used lxml here which uses both xml and html to parse through a document. This code here should give you the raw output you have asked for(Author and Book name). You still have to process it to get your desired output.
import requests from bs4 import BeautifulSoup requests = requests.get('http://lib.ru/GrepSearch?Search=history') src = requests.content soup = BeautifulSoup(src , 'lxml') b_tags = soup.find_all('b') for b in b_tags: print(b.text)
Output is like: