My question is same as Scraping all mobiles of Flipkart.com. I tried the solution given over there, but that change in the start variable is not working , and I can only scrape the starting twenty mobile information only.
JavaScript
x
26
26
1
import urllib#.request # for py 3.x
2
import re #regural expression for data manipulation
3
from bs4 import BeautifulSoup
4
5
6
url="http://www.flipkart.com/mobiles/samsung~brand/pr?sid=tyy%2C4io&start=50"
7
8
regex = '<a href=(.+?)>' # it will find the title
9
pattern=re.compile(regex)
10
11
htmlfile = urllib.urlopen(url) #//.request is in 3.0x
12
13
htmltext= htmlfile.read()
14
15
docSoup=BeautifulSoup(htmltext)
16
abc=docSoup.findAll('a')
17
18
19
20
21
22
title=re.findall(pattern,c)
23
24
for i in title:
25
print i
26
The initial value of start was 21, so increased to 50, but still I am getting the same result.
Advertisement
Answer
There are 4 ajax request for the page,check the screenshot,try to code which dynamically change the start in each request,use try catch to handle exception handling