page scraping using beautiful soup, without links

Question

I am using the following code to extract text from a web page: The problem is, when I open text, I get all the links from the bottoms that exist at the top of the page, which I don&#8217;t want. How can i modify the above code to do so? I also gets the footnotes, which i may want, but

Accepted Answer

If you want to extract all the text then you can use get_text() methodfrom bs4 import BeautifulSoup import requestsurl = 'https://ordoabchao.ca/volume-one/babylon'res = requests.get(url) soup = BeautifulSoup(res.text, 'lxml')for p in soup.select('.sqs-block-content p'):    print(p.get_text(strip=True))To save as text file, you can use pandas DataFramelst = []for p in soup.select('.sqs-block-content p'):    txt= p.get_text(strip=True)    lst.append({'Text':txt})df=pd.DataFrame(lst).to_csv('out.txt',sep='t',index= False)#importimport pandas as pd

Advertisement

Answer