Skip to content
Advertisement

How to extract the URL of a webpage without knowing beforehand?

I’m trying to make an iterative web search that pulls up a google search page ONLY when it needs to. Therefore, I don’t know the URLs ahead of time. I am aware of the .current_url argument from Selenium but it does not give me what I want.

else:
     if boolean =='yes':
        self.append_csv('TP')
     elif boolean == 'no':
        driver.get('https://www.google.com/')
        search = driver.find_element_by_name('q')
        search.clear()
        search.send_keys('{}'.format(query[index]))
        search.send_keys(Keys.RETURN)
        print(driver.current_url)               

When I do print(driver.current_url) I only get https://www.google.com/ but I want to extract a full URL like https://www.google.com/search?source=hp&ei=x3kDX8rULsm4tQaa-6jwCw&q=Sycamore+Elementary+School%2CSugar+Hill%2C30518&btnK=Google+Search

I need to have this full link so I can use it with BeautifulSoup4.The end goal is to extract all links from google search.

Advertisement

Answer

Actually there is no need to go to the google home page to do a regular search. You can directly go on the page of your search like here:

def search(driver, text):
    driver.get("https://www.google.com/search?q={}".format(text))

But if you want to add several other parameters to your search I advise you to look at the module google. It will directly give you the links of the first results of your search like that:

>>> import googlesearch
>>> query = "A computer science portal"
>>> for j in googlesearch.search(query, tld="co.in", num=10, stop=10, pause=2):
    print(j)

    
https://www.geeksforgeeks.org/page/4/
https://www.geeksforgeeks.org/
https://en.wikipedia.org/wiki/Portal:Computer_programming
https://en.wikiversity.org/wiki/Portal:Computer_Science
http://www.pearltrees.com/u/17097488-geeksforgeeks-computer-science https://studentportal.gu.se/english/my-studies/cse https://www.computerscienceonline.org/ https://portal.cs.nuim.ie/ https://www.quora.com/What-are-the-top-websites-computer-science-students-must-visit

If you do not want to use it directly you can look at the code of the module. As it is not on github you can read the code at the location pip installed it. The code is not very complicated and the interesting part concerning how to produce google search urls is not more than 100 lignes.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement