I hate to trouble anyone with this, but I’ve been on this issue for days.
Basically, I want to scrape the Psychological Torture Methods from this web page: https://en.m.wikipedia.org/wiki/List_of_methods_of_torture
This is the exact information I would like to acquire:
Ego-Fragmentation
Learned Helplessness
Chinese water torture
Welcome parade (torture)
And below is my code:
JavaScript
x
12
12
1
from bs4 import BeautifulSoup
2
import requests
3
4
URL = 'https://en.m.wikipedia.org/wiki/List_of_methods_of_torture'
5
page = requests.get(URL)
6
7
html_soup = BeautifulSoup(page.content, 'html.parser')
8
type(html_soup)
9
10
11
print (html_soup.find("div", class_="mw-parser-output").find_all(text=True, recursive=False) )
12
I’m sure there is an easy fix to this that I can’t see. One you look at the sites html, you’ll probably find the answer.
Best wishes, truly.
Have a Beautiful day!
HomeMadeMusic.
Advertisement
Answer
Try this . Your expected output is in under section
JavaScript
1
12
12
1
from bs4 import BeautifulSoup
2
import requests
3
4
URL = 'https://en.m.wikipedia.org/wiki/List_of_methods_of_torture'
5
page = requests.get(URL)
6
7
html_soup = BeautifulSoup(page.content, 'html.parser')
8
print(html_soup.prettify())
9
10
11
print ([x.text for x in html_soup.find("section", class_="mf-section-1").find_all('a')])
12