I am trying to run this code to scrape reviews from the google play store – but I keep getting the following error:
DevTools listening on ws://127.0.0.1:53044/devtools/browser/9de3e58b-6384-4809-bf01-31d47a57879f
Traceback (most recent call last):
File "c:/Users/Emil/Documents/Guatrain_Reviews/guatrain_reviews.py", line 20, in <module>
Ptitle = driver.find_element_by_class_name('id-app-title').text.replace(' ','')
File "C:UsersEmilMiniconda3envsdata_analysislibsite-packagesseleniumwebdriverremotewebdriver.py", line 564, in find_element_by_class_name
return self.find_element(by=By.CLASS_NAME, value=name)
File "C:UsersEmilMiniconda3envsdata_analysislibsite-packagesseleniumwebdriverremotewebdriver.py", line 978, in find_element
'value': value})['value']
File "C:UsersEmilMiniconda3envsdata_analysislibsite-packagesseleniumwebdriverremotewebdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:UsersEmilMiniconda3envsdata_analysislibsite-packagesseleniumwebdriverremoteerrorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"class name","selector":"id-app-title"}
(Session info: chrome=71.0.3578.98)
(Driver info: chromedriver=2.46.628402 (536cd7adbad73a3783fdc2cab92ab2ba7ec361e1),platform=Windows NT 10.0.17134 x86_64)
I suspect it has something to do with the id-app-title in
Ptitle = driver.find_element_by_class_name('id-app-title').text.replace(' ','')
Could someone point out where I would find that Id for the app I am interested in OR help me identify where I am going wrong.
Thanks
EDIT
The final result I want needs to look something like this:
where for which ever app url I insert – it will extract the rating and reviews:
Thanks
Advertisement
Answer
That code is from 2016, so I’m assuming they changed the structure which is why there is no ‘id-app-title’ or anything from the original code. That’s just my assumption.
There’s a lot of work that still needs to be done with this code (like changing out the time.sleep for implicit waits by selenium, and quite frankly just to make it more robust, as I only was looking at this particular app review.EDIT SEE BELOW) It’s really complex html with tons of nested div
and span
tags with no specific meaning associated with the attributes/ class, etc. So I had trouble pulling out each user review element.
But essentially, I was able to open the page with the browser, have it continue to scroll down until it can click “Show More”, and just continue an x amount of times.
Once it does that, it iterates the span tags. Now I figured out every 10 span tags is relating to a single user. However if the app owner responds to a review, it offsets then by 2 so had to account for that.
I’m fairly newer to programming, so I apologize for messy code and inefficiency. I’m sure an expert would be able to provide a better solution, however, this can hopefully get you started or playing around:
#load webdriver function from selenium
from selenium import webdriver
from time import sleep
import bs4
import pandas as pd
import requests
from selenium.webdriver.common.keys import Keys
import time
# Change this number to get more or less reviews
# Current set of x=100 yielded 11,312 reviews
x = 100
link = "https://play.google.com/store/apps/details?id=uk.co.o2.android.myo2&hl=en_GB"
driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
driver.get(link + '&showAllReviews=true')
num_clicks = 0
num_scrolls = 0
while num_clicks <= x and num_scrolls <= x*5:
try:
show_more = driver.find_element_by_xpath('//*[@id="fcxH9b"]/div[4]/c-wiz/div/div[2]/div/div[1]/div/div/div[1]/div[2]/div[2]/div/content/span')
show_more.click()
num_clicks += 1
except:
html = driver.find_element_by_tag_name('html')
html.send_keys(Keys.END)
num_scrolls +=1
time.sleep(2)
soup = bs4.BeautifulSoup(driver.page_source, 'html.parser')
h2 = soup.find_all('h2')
results_df = pd.DataFrame()
for ele in h2:
if ele.text == 'Reviews':
c_wiz = ele.parent.parent.find_all('c-wiz')
for sibling in c_wiz[0].next_siblings:
try:
#print (sibling)
comment_shift = 0
spans = sibling.find_all('span')
for user_block in range(0,len(spans)):
i = user_block *10
name = spans[i+0+comment_shift].text
try:
rating = spans[i+1+comment_shift].div.next_element['aria-label']
rating = str(''.join(filter(str.isdigit, rating)))
except:
comment_shift += 2
continue
date = spans[i+2+comment_shift].text
review = spans[i+8+comment_shift].text
print ('Name: %snRating: %snDate: %snReview: %sn' %(name, rating, date, review))
temp_df = pd.DataFrame([[date, rating, name, review]], columns = ['Date','Rating','User','Review'])
results_df = results_df.append(temp_df)
except:
continue
results_df = results_df.reset_index(drop=True)
results_df.to_csv('C:/reviews.csv', index=False)
driver.close()
Output:
print (results_df)
Date Review
0 31 January 2019 Was broken for pay as you go customers. Has no
1 2 February 2019 o2 just won't be happy until their customer se...
2 1 February 2019 Excellent quality piece of kit
3 6 February 2019 Gud 😁
4 23 December 2018 Can't get into the app using correct log in de...
5 16 December 2018 The update is rubbish. I can't use MyO2 anymor...
6 6 December 2018 Stop logging me out with every update, they ad
7 25 December 2018 cant use this app anymore. shame i use to use
8 16 December 2018 Started receiving texts from 02 immediately af
9 10 January 2019 havent been with the network long nor have i u
10 22 December 2018 update has killed this app. why do I have to p
11 9 January 2019 This app is now unusable for pay as you go cus
12 26 January 2019 Wouldn't it be nice to find an app that the de...
13 19 December 2018 wont let me log in now since the latest update
14 13 January 2019 it was ok for a while wen u needed to put in y
15 6 January 2019 from last update I can't login anymore. not ev...
16 24 January 2019 I'm having 2 change review again coz I can't g
17 5 January 2019 Changed my rating for this down from five to o
18 22 December 2018 no longer works for me. shame as it was useful
19 31 January 2019 total waste of time since update. not able to
20 23 January 2019 Despite what the description states the curren
21 24 December 2018 When it finally lets you log in it then says t
22 17 January 2019 Update breaks it, can't log in, log in on webs...
23 5 January 2019 02 what have you done to app cant log in chang
24 30 November 2018 Simple easy to use and all info available of m
25 30 November 2018 No longer works for pay and go customers so co
26 8 December 2018 Will not log me in after downloading the lates
27 15 January 2019 Unable to log on to the app since the update.
28 1 January 2019 Very easy to use. Keeps me up to date.
29 1 December 2018 Good app maybe it should be as colourful as th
11282 12 February 2017 Just re installed this a on my new device. Ha
11283 18 December 2016 Since updating this app on my Samsung S3 mini
11284 19 January 2017 Lately the app gives intermittent server error
11285 7 December 2016 New update
11286 12 December 2016 O2 needs to put right fast
11287 12 February 2017 Although unlimited minutes/texts I would still
11288 30 December 2016 Never works
11289 13 August 2017 I have a Samsung galaxy 7 and the o2 app is no
11290 6 December 2016 Doesn't work anymore
11291 4 December 2016 Since the last update this app does not work f
11292 3 December 2016 O2
11293 5 December 2016 Good app (when it opens)
11294 11 January 2017 Stopped working and when it does work
11295 1 December 2016 Nothing but a blue screen. Not happy.
11296 2 December 2016 Worst app ever
11297 18 January 2017 It's easier than trying to keep track of my ac...
11298 16 February 2017 The new update only shows blue screen before t
11299 15 January 2017 Mr Dimitrov
11300 8 February 2017 Code 4 error frequently
11301 4 January 2017 Won't work at all
11302 27 January 2017 O2 GURU , EXCELLENT, ESQISET , PHANOMAL, SE
11303 15 February 2017 Works well enough.
11304 1 December 2016 Great app keeps you up to.date
11305 28 December 2016 My 02
11306 16 December 2016 This is a "APPY APP""
11307 22 November 2016 Doesn't work for business account. Only shows ...
11308 25 November 2016 Doesn't work anymore
11309 11 November 2016 The ap won't open its just a blue screen I've
11310 24 November 2016 Doesn't work
11311 12 November 2016 My 02
[11312 rows x 4 columns]
Edit:
I tried with a couple different links:
link = "https://play.google.com/store/apps/details?id=com.outfit7.mytalkingtom2"
link = "https://play.google.com/store/apps/details?id=com.ingeniooz.hercule"
and it appeared to work:
Output:
print (results_df)
Date Review
0 February 5, 2019 after update it is not workin before it was ev
1 February 4, 2019 no word to describe simply 😍
2 February 6, 2019 I loved this game
3 February 6, 2019 it is very funny game and very nice game also
4 February 6, 2019 😎
5 February 6, 2019 relaxing effect
6 February 6, 2019 this is a cool game
7 February 6, 2019 Good game
8 February 6, 2019 Beast
9 February 1, 2019 Love this game, it is so much better then the
10 February 1, 2019 The recent updates are epic. The blender and d
11 February 1, 2019 i like this funny game because tom is jumping
12 February 2, 2019 tom 2 is a great game
13 February 3, 2019 Very very nice game
14 February 3, 2019 I like it very much
15 February 5, 2019 Nice and superb game.
16 February 2, 2019 Tom is a cutipie
17 February 2, 2019 it is so cute
18 February 2, 2019 tr ty0
19 February 2, 2019 so good
20 February 2, 2019 nice game
21 February 1, 2019 Nice game
22 February 3, 2019 i love this game
23 February 6, 2019 l love this game as it is fun and enjoyable to
24 February 2, 2019 love it
25 February 5, 2019 it is so awesome 👍😍😊
26 February 2, 2019 Amazing
27 February 3, 2019 nice
28 February 6, 2019 good
29 January 30, 2019 Anish Biswa 3 to be a bit. I'm not a good idea...
1770 February 2, 2019 fun
1771 February 5, 2019 ect,
1772 February 6, 2019 tom. is so cute
1773 February 6, 2019 nice
1774 January 5, 2019 urguuhtr
1775 January 14, 2019 Very interesting game 👌😀😀
1776 January 10, 2019 It s very very very nice
1777 January 21, 2019 supab game😘😘😘😘
1778 January 16, 2019 it's too funny 🐹🐹🐹🐰🐰🐰
1779 January 20, 2019 wow Best game
1780 January 27, 2019 It's damn good
1781 January 28, 2019 this a good and supper game. very nice game. ,
1782 February 4, 2019 i love this game very very very much
1783 January 5, 2019 super
1784 January 12, 2019 It's fun Lol
1785 January 16, 2019 it ,s so good
1786 January 23, 2019 fun game for kids .loved it
1787 January 27, 2019 It's so nice
1788 February 1, 2019 Nice The Baby games i like 😀😀😀😀
1789 January 29, 2019 it's funny and it's fun to play
1790 January 10, 2019 best game so cute
1791 January 10, 2019 So Cute!
1792 January 24, 2019 i lv this game very nice game ..
1793 January 25, 2019 Its superb I love this game 😘
1794 January 27, 2019 It is best game ever played😀😀😀😁😁😁
1795 January 19, 2019 I love it!
1796 January 20, 2019 good game!
1797 January 16, 2019 i love this game 😍.
1798 January 25, 2019 It is a good game for kids ..
1799 January 31, 2019 my talking tom is fun😊😊😊
[1800 rows x 4 columns]
And
print (results_df)
Date Review
0 December 2, 2018 It's a very well-thought-out an all rounded ap...
1 January 1, 2019 L'application est superbe et hyper complète! B...
2 December 6, 2017 Great workout diary with statistics. Easy to u
3 June 13, 2017 I love this app! I've tried so many others, bu...
4 March 28, 2017 Works great at what it does. You can add exerc
5 March 21, 2017 Great
6 December 8, 2016 Has all I need to build & adjust my workouts
7 October 23, 2016 Goodish
8 September 23, 2016 Great app
9 July 18, 2016 Excellent
10 March 9, 2016 great app.
11 July 10, 2015 Amazing and easy to use
12 June 5, 2015 I dreamt of this app, Hercule made it. Best ap
13 March 18, 2015 Really good, but
[14 rows x 4 columns]