Whenever I curl this, I’m able to get the entire webpage. However, when I use the urllib
or even mechanize library in Python, I get a 403 error
. Any reason why?
Advertisement
Answer
Try this ,
JavaScript
x
9
1
import urllib2
2
from BeautifulSoup import BeautifulSoup
3
site= "http://www.economist.com/blogs/schumpeter/2014/04/alstom-block"
4
header = {'User-Agent': 'Mozilla/5.0'}
5
req = urllib2.Request(site,headers=header)
6
page = urllib2.urlopen(req)
7
soup = BeautifulSoup(page)
8
print soup
9
Output:
JavaScript
1
7
1
<!DOCTYPE html>
2
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" dir="ltr" xmlns:og="http://ogp.me/ns#" xmlns:fb="https://www.facebook.com/2008/fbml">
3
<head>
4
.
5
6
..
7