Skip to content
Advertisement

Scrape information about Visit Page and App Name on google play store

I have created the code below to scrape the app name and Visit Page Url from the google play store page.

ASOS – Get ASOS (Line 1120)

Visit website – Get http://www.asos.com – (q=)(Line 1121 source code)

url = 'https://play.google.com/store/apps/details?id=com.asos.app'
r = requests.get(url)

final=[]
for line in r.iter_lines():
    if count == 1120:
        soup = BeautifulSoup(line)
        for row in soup.findAll('a'):
                u=row.find('span')
                t = u.string
                print t
    elif count == 1121:
        soup = BeautifulSoup(line)
        for row in soup.findAll('a'):
                u=row.get('href')
                print u
    count = count + 1  

I can’t seem to print the HTML here. Please open edits for that. But Please help me here!

Advertisement

Answer

BeautifulSoup provides a great deal of functions that you should be taking advantage of.

For starters, your script can be cut down to the following:

import requests
from bs4 import BeautifulSoup

url = 'https://play.google.com/store/apps/details?id=com.asos.app'
r = requests.get(url)

soup = BeautifulSoup(r.content, "html.parser")

for a in soup.find_all('a', {'class': 'dev-link'}):
    print "Found the URL:", a['href']

BS4 can parse the raw HTML content and you can iterate through it via the data type. In this scenario, you want a particular href link of class name dev-link. Doing so, gets you the following output:

Found the URL: https://www.google.com/url?q=http://www.asos.com&sa=D&usg=AFQjCNGl4lHIgnhUR3y414Q8idAzJvASqw
Found the URL: mailto:androiddev@asos.com
Found the URL: https://www.google.com/url?q=http://www.asos.com/infopages/pgeprivacy.aspx&sa=D&usg=AFQjCNH-hW1H0fYlsCjp4ERbVh29epqaXA

I’m sure you can tweak it a bit more to get the results you want but please refer to BS4 for more information ==> https://www.crummy.com/software/BeautifulSoup/bs4/doc/

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement