Hi All I have written a python program to retrieve the title of a page it works fine but with some pages, it also receives some unwanted text how to avoid that
here is my program
# importing the modules import requests from bs4 import BeautifulSoup # target url url = 'https://atlasobscura.com' # making requests instance reqs = requests.get(url) # using the BeaitifulSoup module soup = BeautifulSoup(reqs.text, 'html.parser') # displaying the title print("Title of the website is : ") for title in soup.find_all('title'): title_data = title.get_text().lower().strip() print(title_data)
here is my output
atlas obscura - curious and wondrous travel destinations aoc-full-screen aoc-heart-solid aoc-compass aoc-flipboard aoc-globe aoc-pocket aoc-share aoc-cancel aoc-video aoc-building aoc-clock aoc-clipboard aoc-help aoc-arrow-right aoc-arrow-left aoc-ticket aoc-place-entry aoc-facebook aoc-instagram aoc-reddit aoc-rss aoc-twitter aoc-accommodation aoc-activity-level aoc-add-a-photo aoc-add-box aoc-add-shape aoc-arrow-forward aoc-been-here aoc-chat-bubbles aoc-close aoc-expand-more aoc-expand-less aoc-forum-flag aoc-group-size aoc-heart-outline aoc-heart-solid aoc-home aoc-important aoc-knife-fork aoc-library-books aoc-link aoc-list-circle-bullets aoc-list aoc-location-add aoc-location aoc-mail aoc-map aoc-menu aoc-more-horizontal aoc-my-location aoc-near-me aoc-notifications-alert aoc-notifications-mentions aoc-notifications-muted aoc-notifications-tracking aoc-open-in-new aoc-pencil aoc-person aoc-pinned aoc-plane-takeoff aoc-plane aoc-print aoc-reply aoc-search aoc-shuffle aoc-star aoc-subject aoc-trip-style aoc-unpinned aoc-send aoc-phone aoc-apps aoc-lock aoc-verified
instead of this I suppose to receive only this line
"atlas obscura - curious and wondrous travel destinations"
please help me with some idea all other websites are working only some websites gives these problem
Advertisement
Answer
Your problem is that you’re finding all the occurences of “title” in the page. Beautiful soup has an attribute title
specifically for what you’re trying to do. Here’s your modified code:
# importing the modules import requests from bs4 import BeautifulSoup # target url url = 'https://atlasobscura.com' # making requests instance reqs = requests.get(url) # using the BeaitifulSoup module soup = BeautifulSoup(reqs.text, 'html.parser') title_data = soup.title.text.lower() # displaying the title print("Title of the website is : ") print(title_data)