Hi All I have written a python program to retrieve the title of a page it works fine but with some pages, it also receives some unwanted text how to avoid that
here is my program
JavaScript
x
19
19
1
# importing the modules
2
import requests
3
from bs4 import BeautifulSoup
4
5
# target url
6
url = 'https://atlasobscura.com'
7
8
# making requests instance
9
reqs = requests.get(url)
10
11
# using the BeaitifulSoup module
12
soup = BeautifulSoup(reqs.text, 'html.parser')
13
14
# displaying the title
15
print("Title of the website is : ")
16
for title in soup.find_all('title'):
17
title_data = title.get_text().lower().strip()
18
print(title_data)
19
here is my output
JavaScript
1
77
77
1
atlas obscura - curious and wondrous travel destinations
2
aoc-full-screen
3
aoc-heart-solid
4
aoc-compass
5
aoc-flipboard
6
aoc-globe
7
aoc-pocket
8
aoc-share
9
aoc-cancel
10
aoc-video
11
aoc-building
12
aoc-clock
13
aoc-clipboard
14
aoc-help
15
aoc-arrow-right
16
aoc-arrow-left
17
aoc-ticket
18
aoc-place-entry
19
aoc-facebook
20
aoc-instagram
21
aoc-reddit
22
aoc-rss
23
aoc-twitter
24
aoc-accommodation
25
aoc-activity-level
26
aoc-add-a-photo
27
aoc-add-box
28
aoc-add-shape
29
aoc-arrow-forward
30
aoc-been-here
31
aoc-chat-bubbles
32
aoc-close
33
aoc-expand-more
34
aoc-expand-less
35
aoc-forum-flag
36
aoc-group-size
37
aoc-heart-outline
38
aoc-heart-solid
39
aoc-home
40
aoc-important
41
aoc-knife-fork
42
aoc-library-books
43
aoc-link
44
aoc-list-circle-bullets
45
aoc-list
46
aoc-location-add
47
aoc-location
48
aoc-mail
49
aoc-map
50
aoc-menu
51
aoc-more-horizontal
52
aoc-my-location
53
aoc-near-me
54
aoc-notifications-alert
55
aoc-notifications-mentions
56
aoc-notifications-muted
57
aoc-notifications-tracking
58
aoc-open-in-new
59
aoc-pencil
60
aoc-person
61
aoc-pinned
62
aoc-plane-takeoff
63
aoc-plane
64
aoc-print
65
aoc-reply
66
aoc-search
67
aoc-shuffle
68
aoc-star
69
aoc-subject
70
aoc-trip-style
71
aoc-unpinned
72
aoc-send
73
aoc-phone
74
aoc-apps
75
aoc-lock
76
aoc-verified
77
instead of this I suppose to receive only this line
JavaScript
1
2
1
"atlas obscura - curious and wondrous travel destinations"
2
please help me with some idea all other websites are working only some websites gives these problem
Advertisement
Answer
Your problem is that you’re finding all the occurences of “title” in the page. Beautiful soup has an attribute title
specifically for what you’re trying to do. Here’s your modified code:
JavaScript
1
18
18
1
# importing the modules
2
import requests
3
from bs4 import BeautifulSoup
4
5
# target url
6
url = 'https://atlasobscura.com'
7
8
# making requests instance
9
reqs = requests.get(url)
10
11
# using the BeaitifulSoup module
12
soup = BeautifulSoup(reqs.text, 'html.parser')
13
title_data = soup.title.text.lower()
14
15
# displaying the title
16
print("Title of the website is : ")
17
print(title_data)
18