I want to search google using BeautifulSoup and open the first link. But when I opened the link it shows error. The reason i think is that because google is not providing exact link of website, it has added several parameters in url. How to get exact url?
When i tried to use cite tag it worked but for big urls its creating problem.
The first link which i get using soup.h3.a[‘href’][7:] is: ‘http://www.wikipedia.com/wiki/White_holes&sa=U&ved=0ahUKEwi_oYLLm_rUAhWJNI8KHa5SClsQFggbMAI&usg=AFQjCNGN-vlBvbJ9OPrnq40d0_b8M0KFJQ‘
Here is my code:
JavaScript
x
6
1
import requests
2
from bs4 import Beautifulsoup
3
r = requests.get('https://www.google.com/search?q=site:wikipedia.com+Black+hole&gbv=1&sei=YwHNVpHLOYiWmQHk3K24Cw')
4
soup = BeautifulSoup(r.text, "html.parser")
5
print(soup.h3.a['href'][7:])
6
Advertisement
Answer
You could split the returned string:
JavaScript
1
3
1
url = soup.h3.a['href'][7:].split('&')
2
print(url[0])
3