My question is same as Scraping all mobiles of Flipkart.com. I tried the solution given over there, but that change in the start variable is not working , and I can only scrape the starting twenty mobile information only.
import urllib#.request # for py 3.x import re #regural expression for data manipulation from bs4 import BeautifulSoup url="http://www.flipkart.com/mobiles/samsung~brand/pr?sid=tyy%2C4io&start=50" regex = '<a href=(.+?)>' # it will find the title pattern=re.compile(regex) htmlfile = urllib.urlopen(url) #//.request is in 3.0x htmltext= htmlfile.read() docSoup=BeautifulSoup(htmltext) abc=docSoup.findAll('a') title=re.findall(pattern,c) for i in title: print i
The initial value of start was 21, so increased to 50, but still I am getting the same result.
Advertisement
Answer
There are 4 ajax request for the page,check the screenshot,try to code which dynamically change the start in each request,use try catch to handle exception handling