bs4 p tags returning as None

Question

// scraping link from title, then opening that link and trying to scrape the whole article, very new to this so I don't know what to do! Answer On some pages the

tags are not under an

, and therefor is returning None. Instead, to scrape all the paragraphs (and

tags if they exist) use the following CSS

Accepted Answer

On some pages the

tags are not under an

, and therefor is returning None. Instead, to scrape all the paragraphs (and

tags if they exist) use the following CSS Selector: .entry-content > p, .entry-content li.To use a CSS Selector, use the .select() method instead of .find_all().In your code example:import bs4import requestsr = requests.get("https://www.the961.com/latest-news/lebanon-news/").textsoup = bs4.BeautifulSoup(r, "lxml")for article in soup.find_all("article"): title = article.h3.text print(title) date = article.find("span", class_="byline-part date") if date: print("Date:", date.text) author = article.find("span", class_="byline-part author") if author: print("Author:", author.text, "n") link = article.find("h3", class_="title").a["href"] link_r = requests.get(link).text soup_link = bs4.BeautifulSoup(link_r, "lxml") # Select all `p` tags (and `li`) under the class `entry-content` for page in soup_link.select(".entry-content > p, .entry-content li"): print(page.get_text(strip=True)) print("-" * 80) print()

Advertisement

Answer