I scraped some links from a website and I’m using scrapy spider for scraping purpose.
JavaScript
x
6
1
# image urls
2
look_inside_image_urls = response.xpath('//ul[@class="list-unstyled pages"]/li').extract_first()
3
4
for i in look_inside_image_urls:
5
print("============> look_inside_image_urls ===============>", i)
6
But I got none type value. Just I am any number of image link of li. I download via loop.
This is my HTML code
JavaScript
1
29
29
1
<div class="lookInsideDiv" style="display: block;">
2
<div class="exitBtn"><i class="ion-close-round"></i></div>
3
<div class="pagesArea">
4
<ul class="list-unstyled pages">
5
6
<li><img src="https://s3-ap-southeast-1.amazonaws.com/rokomari110/LookInside20190827/fc955fd4b_117698-1.jpg"></li>
7
8
<li><img src="https://s3-ap-southeast-1.amazonaws.com/rokomari110/LookInside20190827/11f94595e_117698-2.jpg"></li>
9
10
<li><img src="https://s3-ap-southeast-1.amazonaws.com/rokomari110/LookInside20190827/555959ec2_117698-3.jpg"></li>
11
12
<li><img src="https://s3-ap-southeast-1.amazonaws.com/rokomari110/LookInside20190827/81b071d0c_117698-4.jpg"></li>
13
14
<li><img src="https://s3-ap-southeast-1.amazonaws.com/rokomari110/LookInside20190827/30ef8b806_117698-5.jpg"></li>
15
16
<li><img src="https://s3-ap-southeast-1.amazonaws.com/rokomari110/LookInside20190827/6cb40391f_117698-6.jpg"></li>
17
18
<li><img src="https://s3-ap-southeast-1.amazonaws.com/rokomari110/LookInside20190827/a41c97880_117698-7.jpg"></li>
19
20
<li><img src="https://s3-ap-southeast-1.amazonaws.com/rokomari110/LookInside20190827/d1a4bff6e_117698-8.jpg"></li>
21
22
<li><img src="https://s3-ap-southeast-1.amazonaws.com/rokomari110/LookInside20190827/9503cfda1_117698-9.jpg"></li>
23
24
<li><img src="https://s3-ap-southeast-1.amazonaws.com/rokomari110/LookInside20190827/54f1774ee_117698-10.jpg"></li>
25
26
</ul>
27
</div>
28
</div>
29
I just want to get all link inside li like this
JavaScript
1
5
1
https://s3-ap-southeast-1.amazonaws.com/rokomari110/LookInside20190827/fc955fd4b_117698-1.jpg
2
https://s3-ap-southeast-1.amazonaws.com/rokomari110/LookInside20190827/fc955fd4b_117698-1.jpg
3
https://s3-ap-southeast-1.amazonaws.com/rokomari110/LookInside20190827/fc955fd4b_117698-1.jpg
4
https://s3-ap-southeast-1.amazonaws.com/rokomari110/LookInside20190827/fc955fd4b_117698-1.jpg
5
Advertisement
Answer
Try this, to extract the all image use extract() (its return list) instead of extract_first()(return first item) method.
JavaScript
1
5
1
look_inside_image_urls = response.xpath('//ul[@class="list-unstyled pages"]/li/img/@src').extract()
2
3
for i in look_inside_image_urls:
4
print("============> look_inside_image_urls ===============>", i)
5
Edit
JavaScript
1
38
38
1
from scrapy.selector import Selector
2
3
html ="""<div class="lookInsideDiv" style="display: block;">
4
<div class="exitBtn"><i class="ion-close-round"></i></div>
5
<div class="pagesArea">
6
<ul class="list-unstyled pages">
7
<li><img src="https://s3-ap-southeast-1.amazonaws.com/rokomari110/LookInside20190827/fc955fd4b_117698-1.jpg"></li>
8
<li><img src="https://s3-ap-southeast-1.amazonaws.com/rokomari110/LookInside20190827/11f94595e_117698-2.jpg"></li>
9
<li><img src="https://s3-ap-southeast-1.amazonaws.com/rokomari110/LookInside20190827/555959ec2_117698-3.jpg"></li>
10
<li><img src="https://s3-ap-southeast-1.amazonaws.com/rokomari110/LookInside20190827/81b071d0c_117698-4.jpg"></li>
11
<li><img src="https://s3-ap-southeast-1.amazonaws.com/rokomari110/LookInside20190827/30ef8b806_117698-5.jpg"></li>
12
<li><img src="https://s3-ap-southeast-1.amazonaws.com/rokomari110/LookInside20190827/6cb40391f_117698-6.jpg"></li>
13
<li><img src="https://s3-ap-southeast-1.amazonaws.com/rokomari110/LookInside20190827/a41c97880_117698-7.jpg"></li>
14
<li><img src="https://s3-ap-southeast-1.amazonaws.com/rokomari110/LookInside20190827/d1a4bff6e_117698-8.jpg"></li>
15
<li><img src="https://s3-ap-southeast-1.amazonaws.com/rokomari110/LookInside20190827/9503cfda1_117698-9.jpg"></li>
16
<li><img src="https://s3-ap-southeast-1.amazonaws.com/rokomari110/LookInside20190827/54f1774ee_117698-10.jpg"></li>
17
</ul>
18
</div>
19
</div>"""
20
21
22
data = Selector(text=html)
23
look_inside_image_urls = data.xpath('//*/ul[@class="list-unstyled pages"]/li/img/@src').extract()
24
for i in look_inside_image_urls:
25
print("============> look_inside_image_urls ===============>", i)
26
27
28
============> look_inside_image_urls ===============> https://s3-ap-southeast-1.amazonaws.com/rokomari110/LookInside20190827/fc955fd4b_117698-1.jpg
29
============> look_inside_image_urls ===============> https://s3-ap-southeast-1.amazonaws.com/rokomari110/LookInside20190827/11f94595e_117698-2.jpg
30
============> look_inside_image_urls ===============> https://s3-ap-southeast-1.amazonaws.com/rokomari110/LookInside20190827/555959ec2_117698-3.jpg
31
============> look_inside_image_urls ===============> https://s3-ap-southeast-1.amazonaws.com/rokomari110/LookInside20190827/81b071d0c_117698-4.jpg
32
============> look_inside_image_urls ===============> https://s3-ap-southeast-1.amazonaws.com/rokomari110/LookInside20190827/30ef8b806_117698-5.jpg
33
============> look_inside_image_urls ===============> https://s3-ap-southeast-1.amazonaws.com/rokomari110/LookInside20190827/6cb40391f_117698-6.jpg
34
============> look_inside_image_urls ===============> https://s3-ap-southeast-1.amazonaws.com/rokomari110/LookInside20190827/a41c97880_117698-7.jpg
35
============> look_inside_image_urls ===============> https://s3-ap-southeast-1.amazonaws.com/rokomari110/LookInside20190827/d1a4bff6e_117698-8.jpg
36
============> look_inside_image_urls ===============> https://s3-ap-southeast-1.amazonaws.com/rokomari110/LookInside20190827/9503cfda1_117698-9.jpg
37
============> look_inside_image_urls ===============> https://s3-ap-southeast-1.amazonaws.com/rokomari110/LookInside20190827/54f1774ee_117698-10.jpg
38