Whenever I curl this, I’m able to get the entire webpage. However, when I use the urllib
or even mechanize library in Python, I get a 403 error
. Any reason why?
Advertisement
Answer
Try this ,
import urllib2 from BeautifulSoup import BeautifulSoup site= "http://www.economist.com/blogs/schumpeter/2014/04/alstom-block" header = {'User-Agent': 'Mozilla/5.0'} req = urllib2.Request(site,headers=header) page = urllib2.urlopen(req) soup = BeautifulSoup(page) print soup
Output:
<!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" dir="ltr" xmlns:og="http://ogp.me/ns#" xmlns:fb="https://www.facebook.com/2008/fbml"> <head> .... ... ..