I’ve created a list named “list_data” which contains variables from many files. I also have a dataframe named “observation_data”. I’m trying to merge these 2 files with the key “time”, but nothing to do, all my tentatives fail. Here is my code and my results
path = "v9/As CA-Previsions-"
path_previsions = ["D S.csv", "Map.csv", "We.csv", "Wu.csv"]
path_observations = "v9/As CA-Observations.csv"
def get_forecast(path, path_previsions, path_observations):
list_data = []
for forecaster in path_previsions:
dataframe = pd.read_csv(path + forecaster, sep=";").dropna(subset=["temperature"]).dropna()
dataframe["time"] = pd.to_datetime(dataframe['time'], format='%d-%m-%Y %H:%M:%S')
dataframe.sort_values(by=['time'])
dataframe['time'] = dataframe['time'].apply(lambda x: x.replace(minute=0, second=0)) #Conserve just hour
dataframe = dataframe.groupby(['time']).mean()
dataframe.columns = [x + "_" + forecaster.split('.')[0] for x in dataframe.columns]
list_data.append(dataframe)
observation_data = pd.read_csv(path_observations, sep=";", index_col=False).drop(columns=["station"]).dropna()
observation_data["time"] = pd.to_datetime(observation_data['time'], format='%d-%m-%Y %H:%M:%S')
observation_data.sort_values(by='time')
observation_data['time'] = observation_data['time'].apply(lambda x: x.replace(minute=0, second=0))
observation_data = observation_data.groupby(['time']).mean()
observation_data=observation_data.rename(index=str, columns={"humidity": "humidity_Y", "precipitation": "precipitation_Y", "temperature":"temperature_Y"})
return list_data, observation_data
And I’ve tried:
list_data, observation_data = get_forecast(path, path_previsions, path_observations)
X = pd.concat(list_data, axis=1, join='inner')
Y = observation_data
df_forcast_cap = pd.concat([X,Y], axis=1, join='inner')
Which return an element of 0 row and 35 columns
I’ve also tried:
X = [list_data]
X = pd.merge(X, how='inner')
and no success too:
TypeError: merge() missing 1 required positional argument: ‘right’
Before the merge and concact tentatives, my list_data and observation_data are not empty here is an example:
list_data : (list)
[[ cl_co_D S hu_D S
time
2019-02-20 12:00:00 0.00 58.000000
2019-02-20 13:00:00 0.00 55.000000
2019-02-20 14:00:00 0.00 53.000000
observation_data : (pandas.core.frame.DataFrame)
humidity_Y precipitation_Y temperature_Y
time
2019-02-28 10:00:00 61.000000 0.0 16.125000
2019-02-28 11:00:00 45.250000 0.0 19.925000
I’ve also tried to convert my list in a dataframe:
X = pd.DataFrame(list_data)
print(X)
but I get something like that which is not good at all:
0
0 cloud_cover_Dark Sky hum
1 cloud_cover_OpenWeatherMa
2 cloud_cover_Weatherbit h
3 cloud_cover_Wunderground
What could I do to merge this list and the dataframe together?
Advertisement
Answer
If list_data
is a list of pandas data frames, you can use pd.concat
to concatenate them all into a single data frame. Use axis=0
to concatenate along the row axis, or axis=1
to concatenate along the column axis.
all_list_data = pd.concat(list_data, axis= )
This guide may also be useful to you.