I’m scraping a website to get the name, birth + death dates, and the name of the cemetery someone is buried in. For the most part, it is working quite well; however, when I exported the text to a CSV, I noticed that there’s a blank cell inserted in the name column after each page. I have a feeling this
Tag: beautifulsoup
How to get the a href link from under the div class? using beautiful soup
I am trying to scrape the href attribute from links from a page, but I end up with [] as the output The HTML code is My desired output is: https://www.pigiame.co.ke/listings/nissan-latio-2016-36000-kms-5300124 Answer You can try: Prints:
Can’t get href from Selenium webdriver scraping youtube
I am trying to scrape youtube videos from a channel by doing the following code below however, it seems that my element_titles don’t have a href attribute. This worked about a year ago and I am unsure why it doesn’t work now? Did youtube change the way we can get href? The following attribtues are what is found in the
Amazon Web Scraping – retrieving price data
I’m currently working on my first project experimenting with web scraping on python. I am attempting to retrieve price data from an amazon url but am having some issues. When I print the price variable, my output is a bit weird: There’s alot of whitespace and the numbers are formatted in weird way with a alot of newline. How do
How to check if there is a picture on a website or not with Python and Selenium
I want to check with a boolean request, if there is a picture on the website: https://portal.dnb.de/opac/mvb/cover?isbn=9783442472352 or not: https://portal.dnb.de/opac/mvb/cover?isbn=3499239663 I don`t know how that is possible. Thank you for your help!! Answer Looks like you should deal with response status – for selenium, take a read: How to get status code by using selenium.py (python code) Alternative approach, get
Replace span tags with whitespace or parse contents as new column with pandas.read_html
I want to scrape Congressional stock trades from Capitol Trades. I can scrape the data, but the column that contains stock tickers has a span tag that separates company names from company tickers. pandas.read_html() removes this span tag, which concatenates company names and tickers and makes it difficult to recover tickers. For example, company names that end with an “INC”
find_element(By.CLASS_NAME…) InvalidSelectorException
I need to navigate to the object with special class, that changes every page refresh So i decided to use bs to find the element class, that works, but selenium raises an exception about invalid selector. class is existing, i can find it in page source. There are some spaces at the beginning and at the ending of class name
Getting number of childs of an element with no ID with beatifulsoup
So basically I’m trying to scrape a webpage with Python and I’m getting stucked at finding the number of childs of one element in a list using BeautifulSoup, the HTML of the list follows this: In my case I want to get the number of tr inside tag tbody, but since it has no id, I found no way to
Scraping data through changing Xpaths
I can’t figure out how to scrape data, I am trying to scrape the product name, price and other information from the website, the product names are easy to access as they have similar xpath with only one tag that changes but for the prices the there are multiple changes to the tags.Is there an alternative to how I can
How do I export a read_html df to Excel, when it related to table ID rather than data in the code?
I am experiencing this error with the code below: I want to save the table I am scraping from wikipedia to an Excel file – but I can’t work out how to adjust the code to get the data list from the terminal to the Excel file using to_excel. I can see it works for a similar problem when a