Skip to content
Advertisement

How to extract the URL of a webpage without knowing beforehand?

I’m trying to make an iterative web search that pulls up a google search page ONLY when it needs to. Therefore, I don’t know the URLs ahead of time. I am aware of the .current_url argument from Selenium but it does not give me what I want.

JavaScript

When I do print(driver.current_url) I only get https://www.google.com/ but I want to extract a full URL like https://www.google.com/search?source=hp&ei=x3kDX8rULsm4tQaa-6jwCw&q=Sycamore+Elementary+School%2CSugar+Hill%2C30518&btnK=Google+Search

I need to have this full link so I can use it with BeautifulSoup4.The end goal is to extract all links from google search.

Advertisement

Answer

Actually there is no need to go to the google home page to do a regular search. You can directly go on the page of your search like here:

JavaScript

But if you want to add several other parameters to your search I advise you to look at the module google. It will directly give you the links of the first results of your search like that:

JavaScript

If you do not want to use it directly you can look at the code of the module. As it is not on github you can read the code at the location pip installed it. The code is not very complicated and the interesting part concerning how to produce google search urls is not more than 100 lignes.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement