I want to create a df with an historical dataset by scrapping a website, but I struggle to accumulate the full period within the loop. I am able to download a day, but when I try to create a loop to storage a set of iterations I am not able to accumulate the data in the dataframe.
The df I want to create from the start_date
to the end_date
is as follows:
Fecha | PeríodeTU | TM°C | HRM% |
---|---|---|---|
single_date |
Where Fecha is a result of adding a columns with the single_date
of the code below, and the rest of the columns are actual data from the website scrapped.
I have tried this:
def daterange(start_date, end_date): for n in range(int ((end_date - start_date).days)): yield start_date + timedelta(n) start_date = date(2020, 6, 1) end_date = date(2021, 3, 3) for single_date in daterange(start_date, end_date): #URL API Meteo.cat con la fecha url = "https://www.meteo.cat/observacions/xema/dades?codi=V3&dia="+str(single_date)+"T00:00Z" # GET a la API res = requests.get(url) soup = BeautifulSoup(res.content,'lxml') table = soup.find_all('table')[2] df_table = pd.read_html(str(table))[0] df_table['Fecha'] = single_date data['Fecha'] = df['Fecha'] data['Hora'] = df['PeríodeTU'] data['Temperatura_Media'] = df['TM°C'] data['Humedad_Relativa'] = df['HRM%'] data.to_csv('Data/tempset.csv', header=True, index=False)
df_table
only saves the last date, and I want to save the full period iterated.
Does anyone know how to deal with this situation?
Advertisement
Answer
You can create a list and the concatenate it:
dfs = [] for single_date in daterange(start_date, end_date): #URL API Meteo.cat con la fecha url = "https://www.meteo.cat/observacions/xema/dades?codi=V3&dia="+str(single_date)+"T00:00Z" # GET a la API res = requests.get(url) soup = BeautifulSoup(res.content,'lxml') table = soup.find_all('table')[2] dfs.append(pd.read_html(str(table))[0].assign(Fecha = single_date))
And finally after running the loop:
df_table = pd.concat(dfs)
This will create df_table with all the individual observations from the dataframes based on your loop.