I hate to trouble anyone with this, but I’ve been on this issue for days.
Basically, I want to scrape the Psychological Torture Methods from this web page: https://en.m.wikipedia.org/wiki/List_of_methods_of_torture
This is the exact information I would like to acquire:
Ego-Fragmentation
Learned Helplessness
Chinese water torture
Welcome parade (torture)
And below is my code:
from bs4 import BeautifulSoup import requests URL = 'https://en.m.wikipedia.org/wiki/List_of_methods_of_torture' page = requests.get(URL) html_soup = BeautifulSoup(page.content, 'html.parser') type(html_soup) print (html_soup.find("div", class_="mw-parser-output").find_all(text=True, recursive=False) )
I’m sure there is an easy fix to this that I can’t see. One you look at the sites html, you’ll probably find the answer.
Best wishes, truly.
Have a Beautiful day!
HomeMadeMusic.
Advertisement
Answer
Try this . Your expected output is in under section
from bs4 import BeautifulSoup import requests URL = 'https://en.m.wikipedia.org/wiki/List_of_methods_of_torture' page = requests.get(URL) html_soup = BeautifulSoup(page.content, 'html.parser') print(html_soup.prettify()) print ([x.text for x in html_soup.find("section", class_="mf-section-1").find_all('a')])