I have a feeling the information is somewhere on stack overflow, but I can’t find it :-/
I’m looking to get the text from this website: https://www.uniprot.org/uniprot/P28653.fasta but my code returns ‘None.’ All help is super appreciated!
import requests from bs4 import BeautifulSoup as bs r = requests.get('http://www.uniprot.org/uniprot/P28653_PGS1_MOUSE.fasta') soup = bs(r.content, 'html.parser') lst = soup.find_all('pre') print(lst)
returns
[]
Thanks!!
Advertisement
Answer
There is no html in the site. You can just print r.content
directly (however, I prefer r.text
as it is a string
not a bytes
object) , and it will contain the string on the page. Remember, when you use developer tools in chrome (or other browsers), the html you see when you inspect is not necessarily the same result that requests will get. Usually viewing the source code directly in your browser (or printing out the result of requests.get(url).text/.content
) will give a more accurate picture of what html you are dealing with.