I hate to trouble anyone with this, but I’ve been on this issue for days.
Basically, I want to scrape the Psychological Torture Methods from this web page: https://en.m.wikipedia.org/wiki/List_of_methods_of_torture
This is the exact information I would like to acquire:
Ego-Fragmentation
Learned Helplessness
Chinese water torture
Welcome parade (torture)
And below is my code:
from bs4 import BeautifulSoup
import requests
URL = 'https://en.m.wikipedia.org/wiki/List_of_methods_of_torture'
page = requests.get(URL)
html_soup = BeautifulSoup(page.content, 'html.parser')
type(html_soup)
print (html_soup.find("div", class_="mw-parser-output").find_all(text=True, recursive=False) )
I’m sure there is an easy fix to this that I can’t see. One you look at the sites html, you’ll probably find the answer.
Best wishes, truly.
Have a Beautiful day!
HomeMadeMusic.
Advertisement
Answer
Try this . Your expected output is in under section
from bs4 import BeautifulSoup
import requests
URL = 'https://en.m.wikipedia.org/wiki/List_of_methods_of_torture'
page = requests.get(URL)
html_soup = BeautifulSoup(page.content, 'html.parser')
print(html_soup.prettify())
print ([x.text for x in html_soup.find("section", class_="mf-section-1").find_all('a')])
