when i try to scrap data from this amazon link. I got AttributeError: 'NoneType' object has no attribute 'text'
My Code:
JavaScript
x
25
25
1
headers = ({'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:103.0) Gecko/20100101 Firefox/103.0',
2
'Accept-Language' : 'en-US,en;q=0.5'})
3
lap_site = requests.get('https://www.amazon.in/s?k=laptops&sprefix=%2Caps%2C634&ref=nb_sb_ss_recent_3_0_recent',headers = headers)
4
lap_soup = bs(lap_site.content,'lxml')
5
content = lap_soup.find('div',class_ = 's-desktop-width-max s-desktop-content s-opposite-dir sg-row')
6
lap_detail_block = content.find_all('div',class_ = 'a-section a-spacing-small a-spacing-top-small')
7
lap_name = lap_price = lap_rating = []
8
for i in lap_detail_block:
9
10
laptop_name = i.find('h2').a.span.text
11
lap_name.append(laptop_name)
12
13
laptop_rating = i.find('span',class_ = 'a-icon-alt').text
14
lap_rating.append(laptop_rating)
15
16
laptop_price = i.find('span',class_ = 'a-price-whole').text
17
lap_price.append(laptop_price)
18
19
laptop_details = {
20
'Laptop':lap_name,
21
'Price':lap_price,
22
'Rating':lap_rating }
23
24
print(laptop_details)
25
I think that the laptop_rating
variable store the content in string format even if we not include .text
. I’m thinking that might be the reason for getting NoneType
error, as we are extracting text from text. Anyway that’s not the issue. How to extract the price or rating from that link?
Advertisement
Answer
At least from my tests, that page is recognizing automated access and blocks it. You need to use something like cloudscraper
to do it. The following code will return the expected results (adapt to your own circumstances):
JavaScript
1
33
33
1
import cloudscraper
2
import pandas as pd
3
from bs4 import BeautifulSoup
4
5
scraper = cloudscraper.create_scraper()
6
7
r = scraper.get('https://www.amazon.in/s?k=laptops&sprefix=%2Caps%2C634&ref=nb_sb_ss_recent_3_0_recent')
8
soup = BeautifulSoup(r.content, 'html.parser')
9
# print(soup)
10
content = soup.find('div',class_ = 's-desktop-width-max s-desktop-content s-opposite-dir sg-row')
11
lap_detail_block = content.find_all('div',class_ = 'a-section a-spacing-small a-spacing-top-small')
12
lap_name = lap_price = lap_rating = []
13
for i in lap_detail_block:
14
try:
15
laptop_name = i.find('h2').a.span.text
16
lap_name.append(laptop_name)
17
18
laptop_rating = i.find('span',class_ = 'a-icon-alt').text
19
lap_rating.append(laptop_rating)
20
21
laptop_price = i.find('span',class_ = 'a-price-whole').text
22
lap_price.append(laptop_price)
23
24
laptop_details = {
25
'Laptop':lap_name,
26
'Price':lap_price,
27
'Rating':lap_rating
28
}
29
print(laptop_name, laptop_rating, laptop_price)
30
except Exception as e:
31
print(e)
32
print('_____________')
33
This will print out in terminal:
JavaScript
1
8
1
HP 15s, 12th Gen Intel Core i5 8GB RAM/512GB SSD 15.6-inch(39.6 cm) FHD,Micro-Edge, Anti- Glare Display/Win 11/Intel Iris Xe Graphics/Dual Speakers/Alexa/Backlit KB/MSO/Fast Charge, 15s- fq5111TU 4.2 out of 5 stars 58,699
2
_____________
3
Acer Predator Helios 500 Gaming Laptop (11Th Gen Intel Core I9/17.3 Inches 4K Uhd Display/64Gb Ddr4 Ram/2Tb Ssd/1Tb HDD/RTX 3080 Graphics/Windows 10 Home/Per Key RGB Backlit Keyboard) Ph517-52 3.0 out of 5 stars 3,79,990
4
_____________
5
ASUS VivoBook 14 (2021), 14-inch (35.56 cm) HD, Intel Core i3-1005G1 10th Gen, Thin and Light Laptop (8GB/1TB HDD/Windows 11/Integrated Graphics/Grey/1.6 kg), X415JA-BV301W 3.8 out of 5 stars 27,990
6
_____________
7
[ ]
8
Cloudscraper’s details and install instructions: https://pypi.org/project/cloudscraper/