I am trying to get the string value for each link. (For example, like Pennsylvania)
JavaScript
x
6
1
<li class="facetbox-shownrow ">
2
<a href="/bill/116th-congress/house-bill/9043/cosponsors?r=1&s=1&q=%7B%22search%22%3A%5B%22H.R.9043%22%2C%22H.R.9043%22%5D%2C%22cosponsor-state%22%3A%22Pennsylvania%22%7D" title="include this search constraint" id="facetItemcosponsor-statePennsylvania">
3
Pennsylvania <span id="facetItemcosponsor-statePennsylvaniacount" class="count">[1]</span> </a>
4
</li>
5
</a>
6
But since there are title and id attributes, I am a bit confused about how to do it. I get a null result when I display my array. Here is my code :
JavaScript
1
13
13
1
for link in links_array:
2
3
main_url_link = base_url_link + link
4
html_page_link = requests.get(main_url_link)
5
soup_link = BeautifulSoup(html_page_link.text, 'html.parser')
6
allData_link = soup_link.findAll('li',{'class':'facetbox-shownrow'})
7
8
distric = [y.text_content() for y in allData_link]
9
district_array.append(distric)
10
11
12
district_array
13
Advertisement
Answer
Use .stripped_strings
to generate a list of strings of elements in your selection and pick / slice the result – In this case pick first element to get Pennsylvania:
JavaScript
1
2
1
[list(x.stripped_strings)[0] for x in soup.find_all('li',{'class':'facetbox-shownrow'})]
2
Note In new code find_all()
should be used, findAll()
actually still works but is very old syntax
To get the href
:
JavaScript
1
2
1
[x.a['href'] for x in soup.find_all('li',{'class':'facetbox-shownrow'})]
2
Example
With multiple li
tags:
JavaScript
1
20
20
1
from bs4 import BeautifulSoup
2
3
html="""
4
<li class="facetbox-shownrow ">
5
<a href="/bill/116th-congress/house-bill/9043/cosponsors?r=1&s=1&q=%7B%22search%22%3A%5B%22H.R.9043%22%2C%22H.R.9043%22%5D%2C%22cosponsor-state%22%3A%22Pennsylvania%22%7D" title="include this search constraint" id="facetItemcosponsor-statePennsylvania">
6
Pennsylvania <span id="facetItemcosponsor-statePennsylvaniacount" class="count">[1]</span> </a>
7
</li>
8
<li class="facetbox-shownrow ">
9
<a href="/bill/116th-congress/house-bill/9043/cosponsors?r=1&s=1&q=%7B%22search%22%3A%5B%22H.R.9043%22%2C%22H.R.9043%22%5D%2C%22cosponsor-state%22%3A%22Pennsylvania%22%7D" title="include this search constraint" id="facetItemcosponsor-statePennsylvania">
10
Main <span id="facetItemcosponsor-statePennsylvaniacount" class="count">[1]</span> </a>
11
</li>
12
<li class="facetbox-shownrow ">
13
<a href="/bill/116th-congress/house-bill/9043/cosponsors?r=1&s=1&q=%7B%22search%22%3A%5B%22H.R.9043%22%2C%22H.R.9043%22%5D%2C%22cosponsor-state%22%3A%22Pennsylvania%22%7D" title="include this search constraint" id="facetItemcosponsor-statePennsylvania">
14
California <span id="facetItemcosponsor-statePennsylvaniacount" class="count">[1]</span> </a>
15
</li>
16
"""
17
soup=BeautifulSoup(html,"html.parser")
18
19
[list(x.stripped_strings)[0] for x in soup.find_all('li',{'class':'facetbox-shownrow'})]
20
Output
JavaScript
1
2
1
['Pennsylvania', 'Main', 'California']
2