I have created the code below to scrape the app name and Visit Page Url from the google play store page.
ASOS – Get ASOS (Line 1120)
Visit website – Get http://www.asos.com – (q=)(Line 1121 source code)
url = 'https://play.google.com/store/apps/details?id=com.asos.app'
r = requests.get(url)
final=[]
for line in r.iter_lines():
if count == 1120:
soup = BeautifulSoup(line)
for row in soup.findAll('a'):
u=row.find('span')
t = u.string
print t
elif count == 1121:
soup = BeautifulSoup(line)
for row in soup.findAll('a'):
u=row.get('href')
print u
count = count + 1
I can’t seem to print the HTML here. Please open edits for that. But Please help me here!
Advertisement
Answer
BeautifulSoup provides a great deal of functions that you should be taking advantage of.
For starters, your script can be cut down to the following:
import requests
from bs4 import BeautifulSoup
url = 'https://play.google.com/store/apps/details?id=com.asos.app'
r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")
for a in soup.find_all('a', {'class': 'dev-link'}):
print "Found the URL:", a['href']
BS4 can parse the raw HTML content and you can iterate through it via the data type. In this scenario, you want a particular href
link of class name dev-link
. Doing so, gets you the following output:
Found the URL: https://www.google.com/url?q=http://www.asos.com&sa=D&usg=AFQjCNGl4lHIgnhUR3y414Q8idAzJvASqw
Found the URL: mailto:androiddev@asos.com
Found the URL: https://www.google.com/url?q=http://www.asos.com/infopages/pgeprivacy.aspx&sa=D&usg=AFQjCNH-hW1H0fYlsCjp4ERbVh29epqaXA
I’m sure you can tweak it a bit more to get the results you want but please refer to BS4 for more information ==> https://www.crummy.com/software/BeautifulSoup/bs4/doc/