How to use BeautifulSoup
to find an html element that contains spaces in its attributes
JavaScript
x
11
11
1
<h1 class='td p1'>
2
title that i want
3
</h1>
4
<h1 class='td p2'>
5
title that i don't want
6
</h1>
7
<h1 class='p1'>
8
title that i don't want
9
</h1>
10
11
I would like to know how to use soup.find
to find the title that i want
.
Because beautifulsoup
considers the attribute attrs of title 'that i want'
like this: {'class': ['td', 'p1']}.<br>
But not like this: {'class': ['td p1']}
Advertisement
Answer
Note Different approaches but both have in common to select the classes explicitly.
find()
JavaScript
1
2
1
soup.find('h1', attrs={'class':'td p1'})
2
select_one()
JavaScript
1
2
1
soup.select_one('h1.td.p1')
2
Example
JavaScript
1
18
18
1
from bs4 import BeautifulSoup
2
data="""
3
<h1 class='td p1'>
4
title that i want
5
</h1>
6
<h1 class='td p2'>
7
title that i don't want
8
</h1>
9
<h1 class='p1'>
10
title that i don't want
11
</h1>
12
"""
13
soup=BeautifulSoup(data,"html.parser")
14
15
title = soup.select_one('h1.td.p1')
16
17
print(title)
18
Output
JavaScript
1
4
1
<h1 class="td p1">
2
title that i want
3
</h1>
4