Skip to content
Advertisement

Problem with python web-scraping when reading a URL using “get” function

I have trouble using the “get” function to read the URL taken from cell A1 in the excel file “tennis3.xlsx”. I have tried different solutions and I have no idea how to get it to read it and use it to get a webpage response. The problem probably starts at at ‘sheet[“A1”].value’.

I have applied this program through visual studio, with it using the chrome browser. The URL thats in cell A1 is https://www.betexplorer.com/tennis/atp-singles/paris/evans-daniel-nakashima-brandon/WAqNf5ao/.

edit: the actual issue I had with this was that I forgot to include the save function.

import requests
from bs4 import BeautifulSoup
from openpyxl import load_workbook

workbook = load_workbook(filename="tennis3.xlsx")
sheet = workbook.active
urlcell = sheet["A1"].value

response = requests.get(urlcell)
webpage = response.content

soup = BeautifulSoup(webpage, "html.parser")

sheet["B1"] = soup.select('h1 a')[0].text.replace(' ','_')

Advertisement

Answer

You need to save the changes you made:

import requests
from bs4 import BeautifulSoup
from openpyxl import load_workbook

filename = r"tennis3.xlsx"

workbook = load_workbook(filename=filename)
sheet= workbook['Sheet1']
urlcell = sheet["A1"].value
print(urlcell)

response = requests.get(urlcell)
webpage = response.content

soup = BeautifulSoup(webpage, "html.parser")

sheet["B1"] = soup.select('h1 a')[0].text.replace(' ','_')
workbook.save(filename=filename)
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement