Struggling with BeautifulSoup and tags

I hate to trouble anyone with this, but I’ve been on this issue for days.

Basically, I want to scrape the Psychological Torture Methods from this web page: https://en.m.wikipedia.org/wiki/List_of_methods_of_torture

This is the exact information I would like to acquire:

Ego-Fragmentation

Learned Helplessness

Chinese water torture

Welcome parade (torture)

And below is my code:

from bs4 import BeautifulSoup
import requests

URL = 'https://en.m.wikipedia.org/wiki/List_of_methods_of_torture'
page = requests.get(URL)

html_soup = BeautifulSoup(page.content, 'html.parser')
type(html_soup)


print (html_soup.find("div", class_="mw-parser-output").find_all(text=True, recursive=False) )

JavaScript
​x
 
from bs4 import BeautifulSoup
import requests
​
URL = 'https://en.m.wikipedia.org/wiki/List_of_methods_of_torture'
page = requests.get(URL)
​
html_soup = BeautifulSoup(page.content, 'html.parser')
type(html_soup)
​
​
print (html_soup.find("div", class_="mw-parser-output").find_all(text=True, recursive=False) )
​

I’m sure there is an easy fix to this that I can’t see. One you look at the sites html, you’ll probably find the answer.

Best wishes, truly.

Have a Beautiful day!

HomeMadeMusic.

Answer

Try this . Your expected output is in under section

from bs4 import BeautifulSoup
import requests

URL = 'https://en.m.wikipedia.org/wiki/List_of_methods_of_torture'
page = requests.get(URL)

html_soup = BeautifulSoup(page.content, 'html.parser')
print(html_soup.prettify())


print ([x.text for x in html_soup.find("section", class_="mf-section-1").find_all('a')])

JavaScript
 
from bs4 import BeautifulSoup
import requests
​
URL = 'https://en.m.wikipedia.org/wiki/List_of_methods_of_torture'
page = requests.get(URL)
​
html_soup = BeautifulSoup(page.content, 'html.parser')
print(html_soup.prettify())
​
​
print ([x.text for x in html_soup.find("section", class_="mf-section-1").find_all('a')])
​

Advertisement

Answer