Skip to content
Advertisement

how to remove unwanted text from retrieving title of a page using python

Hi All I have written a python program to retrieve the title of a page it works fine but with some pages, it also receives some unwanted text how to avoid that

here is my program

# importing the modules
import requests
from bs4 import BeautifulSoup

# target url
url = 'https://atlasobscura.com'

# making requests instance
reqs = requests.get(url)

# using the BeaitifulSoup module
soup = BeautifulSoup(reqs.text, 'html.parser')

# displaying the title
print("Title of the website is : ")
for title in soup.find_all('title'):
    title_data = title.get_text().lower().strip()
    print(title_data)

here is my output

atlas obscura - curious and wondrous travel destinations
aoc-full-screen
aoc-heart-solid
aoc-compass
aoc-flipboard
aoc-globe
aoc-pocket
aoc-share
aoc-cancel
aoc-video
aoc-building
aoc-clock
aoc-clipboard
aoc-help
aoc-arrow-right
aoc-arrow-left
aoc-ticket
aoc-place-entry
aoc-facebook
aoc-instagram
aoc-reddit
aoc-rss
aoc-twitter
aoc-accommodation
aoc-activity-level
aoc-add-a-photo
aoc-add-box
aoc-add-shape
aoc-arrow-forward
aoc-been-here
aoc-chat-bubbles
aoc-close
aoc-expand-more
aoc-expand-less
aoc-forum-flag
aoc-group-size
aoc-heart-outline
aoc-heart-solid
aoc-home
aoc-important
aoc-knife-fork
aoc-library-books
aoc-link
aoc-list-circle-bullets
aoc-list
aoc-location-add
aoc-location
aoc-mail
aoc-map
aoc-menu
aoc-more-horizontal
aoc-my-location
aoc-near-me
aoc-notifications-alert
aoc-notifications-mentions
aoc-notifications-muted
aoc-notifications-tracking
aoc-open-in-new
aoc-pencil
aoc-person
aoc-pinned
aoc-plane-takeoff
aoc-plane
aoc-print
aoc-reply
aoc-search
aoc-shuffle
aoc-star
aoc-subject
aoc-trip-style
aoc-unpinned
aoc-send
aoc-phone
aoc-apps
aoc-lock
aoc-verified

instead of this I suppose to receive only this line

"atlas obscura - curious and wondrous travel destinations"

please help me with some idea all other websites are working only some websites gives these problem

Advertisement

Answer

Your problem is that you’re finding all the occurences of “title” in the page. Beautiful soup has an attribute title specifically for what you’re trying to do. Here’s your modified code:

# importing the modules
import requests
from bs4 import BeautifulSoup

# target url
url = 'https://atlasobscura.com'

# making requests instance
reqs = requests.get(url)

# using the BeaitifulSoup module
soup = BeautifulSoup(reqs.text, 'html.parser')
title_data = soup.title.text.lower()

# displaying the title
print("Title of the website is : ")
print(title_data)
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement