I am rather new to Web Scraping I have scrapped one of the zip files seen here. The goal is to append them into a final data frame called final_df. Below is a snip of my code that runs well.
zip_url = "https://www.omie.es/es/file-download?parents%5B0%5D=marginalpdbc&filename=marginalpdbc_2017.zip" dfs = [] with ZipFile(BytesIO(requests.get(zip_url, verify=False).content)) as zf: for file in zf.namelist(): df = pd.read_csv( zf.open(file), sep=";", skiprows=1, skipfooter=1, engine="python", header=None, ) dfs.append(df) final_df = pd.concat(dfs) # print first 10 rows: print(final_df.head(10).to_markdown(index=False))
This works well for one year of zip files such as 2017 however I am curious if we could get it all in one swoop. My thinking is to create a F string and change the year in each iteration.
date_list = ['2017','2018','2019','2020','2021'] dfs = [] for dates in date_list: with ZipFile(BytesIO(requests.get(f'"https://www.omie.es/es/file-download?parents%5B0%5D=marginalpdbc&filename=marginalpdbc_{dates}.zip"', verify=False).content)) as zf: for file in zf.namelist(): df = pd.read_csv( zf.open(file), sep=";", skiprows=1, skipfooter=1, engine="python", header=None, ) dfs.append(df) final_df = pd.concat(dfs) # print first 10 rows: print(final_df.head(10).to_markdown(index=False))
If we just isolate the f string we will see an output such as
“https://www.omie.es/es/file-download?parents%5B0%5D=marginalpdbc&filename=marginalpdbc_2020.zip” “https://www.omie.es/es/file-download?parents%5B0%5D=marginalpdbc&filename=marginalpdbc_2021.zip”
…etc.
Yet when I feed this using the above loop I get an error saying “InvalidSchema: No connection adapters were found for ‘”https://www.omie.es/es/file-download?parents%5B0%5D=marginalpdbc&filename=marginalpdbc_2017.zip”‘”
What would be the best workaround?
Advertisement
Answer
This error means that requests
module cannot identify what sort of protocol your requests needs (e.g. http, https, ftp etc.)
This happens in your case because you have a leading "
character in your url:
with ZipFile(BytesIO(requests.get(f'"https://www.omie.es/es/file-download?parents%5B0%5D=marginalpdbc&filename=marginalpdbc_{dates}.zip"', verify=False).content)) as zf: # ^^^
Requests is looking for an adapter for "https
protocol which doesn’t exist :)
Just delete the extra quotes.