I’ve created a list named “list_data” which contains variables from many files. I also have a dataframe named “observation_data”. I’m trying to merge these 2 files with the key “time”, but nothing to do, all my tentatives fail. Here is my code and my results
path = "v9/As CA-Previsions-" path_previsions = ["D S.csv", "Map.csv", "We.csv", "Wu.csv"] path_observations = "v9/As CA-Observations.csv" def get_forecast(path, path_previsions, path_observations): list_data = [] for forecaster in path_previsions: dataframe = pd.read_csv(path + forecaster, sep=";").dropna(subset=["temperature"]).dropna() dataframe["time"] = pd.to_datetime(dataframe['time'], format='%d-%m-%Y %H:%M:%S') dataframe.sort_values(by=['time']) dataframe['time'] = dataframe['time'].apply(lambda x: x.replace(minute=0, second=0)) #Conserve just hour dataframe = dataframe.groupby(['time']).mean() dataframe.columns = [x + "_" + forecaster.split('.')[0] for x in dataframe.columns] list_data.append(dataframe) observation_data = pd.read_csv(path_observations, sep=";", index_col=False).drop(columns=["station"]).dropna() observation_data["time"] = pd.to_datetime(observation_data['time'], format='%d-%m-%Y %H:%M:%S') observation_data.sort_values(by='time') observation_data['time'] = observation_data['time'].apply(lambda x: x.replace(minute=0, second=0)) observation_data = observation_data.groupby(['time']).mean() observation_data=observation_data.rename(index=str, columns={"humidity": "humidity_Y", "precipitation": "precipitation_Y", "temperature":"temperature_Y"}) return list_data, observation_data
And I’ve tried:
list_data, observation_data = get_forecast(path, path_previsions, path_observations) X = pd.concat(list_data, axis=1, join='inner') Y = observation_data df_forcast_cap = pd.concat([X,Y], axis=1, join='inner')
Which return an element of 0 row and 35 columns
I’ve also tried:
X = [list_data] X = pd.merge(X, how='inner')
and no success too:
TypeError: merge() missing 1 required positional argument: ‘right’
Before the merge and concact tentatives, my list_data and observation_data are not empty here is an example:
list_data : (list)
[[ cl_co_D S hu_D S time 2019-02-20 12:00:00 0.00 58.000000 2019-02-20 13:00:00 0.00 55.000000 2019-02-20 14:00:00 0.00 53.000000
observation_data : (pandas.core.frame.DataFrame)
humidity_Y precipitation_Y temperature_Y time 2019-02-28 10:00:00 61.000000 0.0 16.125000 2019-02-28 11:00:00 45.250000 0.0 19.925000
I’ve also tried to convert my list in a dataframe:
X = pd.DataFrame(list_data) print(X)
but I get something like that which is not good at all:
0 0 cloud_cover_Dark Sky hum... 1 cloud_cover_OpenWeatherMa... 2 cloud_cover_Weatherbit h... 3 cloud_cover_Wunderground ...
What could I do to merge this list and the dataframe together?
Advertisement
Answer
If list_data
is a list of pandas data frames, you can use pd.concat
to concatenate them all into a single data frame. Use axis=0
to concatenate along the row axis, or axis=1
to concatenate along the column axis.
all_list_data = pd.concat(list_data, axis=...)
This guide may also be useful to you.