I’d like to to grab links from this page and put them in a list.
I have this code:
JavaScript
x
9
1
import bs4 as bs
2
import urllib.request
3
4
source = urllib.request.urlopen('http://www.gcoins.net/en/catalog/236').read()
5
soup = bs.BeautifulSoup(source,'lxml')
6
7
links = soup.find_all('a', attrs={'class': 'view'})
8
print(links)
9
It produces following output:
JavaScript
1
10
10
1
[<a class="view" href="/en/catalog/view/514">
2
<img alt="View details" height="32" src="/img/actions/file.png" title="View details" width="32"/>
3
</a>,
4
5
"""There are 28 lines more"""
6
7
<a class="view" href="/en/catalog/view/565">
8
<img alt="View details" height="32" src="/img/actions/file.png" title="View details" width="32"/>
9
</a>]
10
I need to get following: [/en/catalog/view/514, ... , '/en/catalog/view/565']
But then I go ahead and add following: href_value = links.get('href')
I got an error.
Advertisement
Answer
Try:
JavaScript
1
5
1
soup = bs.BeautifulSoup(source,'lxml')
2
3
links = [i.get("href") for i in soup.find_all('a', attrs={'class': 'view'})]
4
print(links)
5
Output:
JavaScript
1
2
1
['/en/catalog/view/514', '/en/catalog/view/515', '/en/catalog/view/179080', '/en/catalog/view/45518', '/en/catalog/view/521', '/en/catalog/view/111429', '/en/catalog/view/522', '/en/catalog/view/182223', '/en/catalog/view/168153', '/en/catalog/view/523', '/en/catalog/view/524', '/en/catalog/view/60228', '/en/catalog/view/525', '/en/catalog/view/539', '/en/catalog/view/540', '/en/catalog/view/31642', '/en/catalog/view/553', '/en/catalog/view/558', '/en/catalog/view/559', '/en/catalog/view/77672', '/en/catalog/view/560', '/en/catalog/view/55377', '/en/catalog/view/55379', '/en/catalog/view/32001', '/en/catalog/view/561', '/en/catalog/view/562', '/en/catalog/view/72185', '/en/catalog/view/563', '/en/catalog/view/564', '/en/catalog/view/565']
2