I have created the code below to scrape the app name and Visit Page Url from the google play store page.
ASOS – Get ASOS (Line 1120)
Visit website – Get http://www.asos.com – (q=)(Line 1121 source code)
url = 'https://play.google.com/store/apps/details?id=com.asos.app' r = requests.get(url) final=[] for line in r.iter_lines(): if count == 1120: soup = BeautifulSoup(line) for row in soup.findAll('a'): u=row.find('span') t = u.string print t elif count == 1121: soup = BeautifulSoup(line) for row in soup.findAll('a'): u=row.get('href') print u count = count + 1
I can’t seem to print the HTML here. Please open edits for that. But Please help me here!
Advertisement
Answer
BeautifulSoup provides a great deal of functions that you should be taking advantage of.
For starters, your script can be cut down to the following:
import requests from bs4 import BeautifulSoup url = 'https://play.google.com/store/apps/details?id=com.asos.app' r = requests.get(url) soup = BeautifulSoup(r.content, "html.parser") for a in soup.find_all('a', {'class': 'dev-link'}): print "Found the URL:", a['href']
BS4 can parse the raw HTML content and you can iterate through it via the data type. In this scenario, you want a particular href
link of class name dev-link
. Doing so, gets you the following output:
Found the URL: https://www.google.com/url?q=http://www.asos.com&sa=D&usg=AFQjCNGl4lHIgnhUR3y414Q8idAzJvASqw Found the URL: mailto:androiddev@asos.com Found the URL: https://www.google.com/url?q=http://www.asos.com/infopages/pgeprivacy.aspx&sa=D&usg=AFQjCNH-hW1H0fYlsCjp4ERbVh29epqaXA
I’m sure you can tweak it a bit more to get the results you want but please refer to BS4 for more information ==> https://www.crummy.com/software/BeautifulSoup/bs4/doc/