Skip to content
Advertisement

How to parse Google custom search javascript output in python?

I am trying to fetch some articles from ACL website based on the keywords as input. The website is using google custom search API and the output of the API is a javascript object.

How I can parse the returned object in python and fetch the article name, URL, and abstract of the research paper from the object.

The script I am using to fetch articles :

JavaScript

output looks like this:

JavaScript

Although the output in the network tab of chrome is JSON while initiating the search command:

enter image description here

How can I get articles along with their link from the js object in python?

Advertisement

Answer

response.text gives you string and if you remove /*O_o*/ngoogle.search.cse.api12760( at the beginning, and ); at the end then you will have normal JSON which you can convert to Python dictionary using json.loads() – and then you can use [key] to get data from dictionary.


Minimal working example

JavaScript

Result:

JavaScript

BTW:

If you display item.keys() then you should see what else you can get:

JavaScript

Or you can use for-loop to display all keys and values

JavaScript

Some of them may have sub dictionaries – like item['richSnippet']['metatags']['author']

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement