Skip to content
Advertisement

Struggling with BeautifulSoup and tags

I hate to trouble anyone with this, but I’ve been on this issue for days.

Basically, I want to scrape the Psychological Torture Methods from this web page: https://en.m.wikipedia.org/wiki/List_of_methods_of_torture

This is the exact information I would like to acquire:

Ego-Fragmentation

Learned Helplessness

Chinese water torture

Welcome parade (torture)


And below is my code:

from bs4 import BeautifulSoup
import requests

URL = 'https://en.m.wikipedia.org/wiki/List_of_methods_of_torture'
page = requests.get(URL)

html_soup = BeautifulSoup(page.content, 'html.parser')
type(html_soup)


print (html_soup.find("div", class_="mw-parser-output").find_all(text=True, recursive=False) )

I’m sure there is an easy fix to this that I can’t see. One you look at the sites html, you’ll probably find the answer.

Best wishes, truly.

Have a Beautiful day!

HomeMadeMusic.

Advertisement

Answer

Try this . Your expected output is in under section

from bs4 import BeautifulSoup
import requests

URL = 'https://en.m.wikipedia.org/wiki/List_of_methods_of_torture'
page = requests.get(URL)

html_soup = BeautifulSoup(page.content, 'html.parser')
print(html_soup.prettify())


print ([x.text for x in html_soup.find("section", class_="mf-section-1").find_all('a')])
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement