Python Beautiful Soup html.parser returns none

Tags: , ,



I have a feeling the information is somewhere on stack overflow, but I can’t find it :-/

I’m looking to get the text from this website: https://www.uniprot.org/uniprot/P28653.fasta but my code returns ‘None.’ All help is super appreciated!

import requests
from bs4 import BeautifulSoup as bs

r = requests.get('http://www.uniprot.org/uniprot/P28653_PGS1_MOUSE.fasta')
soup = bs(r.content, 'html.parser')
lst = soup.find_all('pre')
print(lst)

returns

[]

Thanks!!

Answer

There is no html in the site. You can just print r.content directly (however, I prefer r.text as it is a string not a bytes object) , and it will contain the string on the page. Remember, when you use developer tools in chrome (or other browsers), the html you see when you inspect is not necessarily the same result that requests will get. Usually viewing the source code directly in your browser (or printing out the result of requests.get(url).text/.content) will give a more accurate picture of what html you are dealing with.



Source: stackoverflow