Tag: beautifulsoup

Beautifulsoup/Writer returns an empty cell when exported to a CSV

beautifulsoup csvwriter python web-scraping

I’m scraping a website to get the name, birth + death dates, and the name of the cemetery someone is buried in. For the most part, it is working quite well; however, when I exported the text to a CSV, I noticed that there’s a blank cell inserted in the name column after each page. I have a feeling this

How to get the a href link from under the div class? using beautiful soup

beautifulsoup css-selectors html python web-scraping

I am trying to scrape the href attribute from links from a page, but I end up with [] as the output The HTML code is My desired output is: https://www.pigiame.co.ke/listings/nissan-latio-2016-36000-kms-5300124 Answer You can try: Prints:

Can’t get href from Selenium webdriver scraping youtube

beautifulsoup python python-requests selenium

I am trying to scrape youtube videos from a channel by doing the following code below however, it seems that my element_titles don’t have a href attribute. This worked about a year ago and I am unsure why it doesn’t work now? Did youtube change the way we can get href? The following attribtues are what is found in the

Amazon Web Scraping – retrieving price data

beautifulsoup jupyter-notebook python web-scraping

I’m currently working on my first project experimenting with web scraping on python. I am attempting to retrieve price data from an amazon url but am having some issues. When I print the price variable, my output is a bit weird: There’s alot of whitespace and the numbers are formatted in weird way with a alot of newline. How do

How to check if there is a picture on a website or not with Python and Selenium

beautifulsoup python selenium

I want to check with a boolean request, if there is a picture on the website: https://portal.dnb.de/opac/mvb/cover?isbn=9783442472352 or not: https://portal.dnb.de/opac/mvb/cover?isbn=3499239663 I don`t know how that is possible. Thank you for your help!! Answer Looks like you should deal with response status – for selenium, take a read: How to get status code by using selenium.py (python code) Alternative approach, get

Replace span tags with whitespace or parse contents as new column with pandas.read_html

beautifulsoup pandas python selenium

I want to scrape Congressional stock trades from Capitol Trades. I can scrape the data, but the column that contains stock tickers has a span tag that separates company names from company tickers. pandas.read_html() removes this span tag, which concatenates company names and tickers and makes it difficult to recover tickers. For example, company names that end with an “INC”

find_element(By.CLASS_NAME…) InvalidSelectorException

beautifulsoup python selenium

I need to navigate to the object with special class, that changes every page refresh So i decided to use bs to find the element class, that works, but selenium raises an exception about invalid selector. class is existing, i can find it in page source. There are some spaces at the beginning and at the ending of class name

Getting number of childs of an element with no ID with beatifulsoup

beautifulsoup python web-scraping

So basically I’m trying to scrape a webpage with Python and I’m getting stucked at finding the number of childs of one element in a list using BeautifulSoup, the HTML of the list follows this: In my case I want to get the number of tr inside tag tbody, but since it has no id, I found no way to

Scraping data through changing Xpaths

beautifulsoup python selenium web-scraping xpath

I can’t figure out how to scrape data, I am trying to scrape the product name, price and other information from the website, the product names are easy to access as they have similar xpath with only one tag that changes but for the prices the there are multiple changes to the tags.Is there an alternative to how I can

How do I export a read_html df to Excel, when it related to table ID rather than data in the code?

beautifulsoup dataframe pandas python web-scraping

I am experiencing this error with the code below: I want to save the table I am scraping from wikipedia to an Excel file – but I can’t work out how to adjust the code to get the data list from the terminal to the Excel file using to_excel. I can see it works for a similar problem when a