How to merge a list composed of many variables and a DataFrame in a single Python Dataframe?

I’ve created a list named “list_data” which contains variables from many files. I also have a dataframe named “observation_data”. I’m trying to merge these 2 files with the key “time”, but nothing to do, all my tentatives fail. Here is my code and my results

path = "v9/As CA-Previsions-"
path_previsions = ["D S.csv", "Map.csv", "We.csv", "Wu.csv"]
path_observations = "v9/As CA-Observations.csv"

def get_forecast(path, path_previsions, path_observations):
    list_data = []
    for forecaster in path_previsions:
        dataframe = pd.read_csv(path + forecaster, sep=";").dropna(subset=["temperature"]).dropna()
        dataframe["time"] = pd.to_datetime(dataframe['time'], format='%d-%m-%Y %H:%M:%S')
        dataframe.sort_values(by=['time'])
        dataframe['time'] = dataframe['time'].apply(lambda x: x.replace(minute=0, second=0)) #Conserve just hour
        dataframe = dataframe.groupby(['time']).mean()
        dataframe.columns = [x + "_" + forecaster.split('.')[0] for x in dataframe.columns]
        list_data.append(dataframe)   
    
    observation_data = pd.read_csv(path_observations, sep=";", index_col=False).drop(columns=["station"]).dropna()
    observation_data["time"] = pd.to_datetime(observation_data['time'], format='%d-%m-%Y %H:%M:%S')
    observation_data.sort_values(by='time')
    observation_data['time'] = observation_data['time'].apply(lambda x: x.replace(minute=0, second=0))
    observation_data = observation_data.groupby(['time']).mean()
    observation_data=observation_data.rename(index=str, columns={"humidity": "humidity_Y", "precipitation": "precipitation_Y", "temperature":"temperature_Y"})
    
    return list_data, observation_data

JavaScript
​x
 
path = "v9/As CA-Previsions-"
path_previsions = ["D S.csv", "Map.csv", "We.csv", "Wu.csv"]
path_observations = "v9/As CA-Observations.csv"
​
def get_forecast(path, path_previsions, path_observations):
    list_data = []
    for forecaster in path_previsions:
        dataframe = pd.read_csv(path + forecaster, sep=";").dropna(subset=["temperature"]).dropna()
        dataframe["time"] = pd.to_datetime(dataframe['time'], format='%d-%m-%Y %H:%M:%S')
        dataframe.sort_values(by=['time'])
        dataframe['time'] = dataframe['time'].apply(lambda x: x.replace(minute=0, second=0)) #Conserve just hour
        dataframe = dataframe.groupby(['time']).mean()
        dataframe.columns = [x + "_" + forecaster.split('.')[0] for x in dataframe.columns]
        list_data.append(dataframe)   
    
    observation_data = pd.read_csv(path_observations, sep=";", index_col=False).drop(columns=["station"]).dropna()
    observation_data["time"] = pd.to_datetime(observation_data['time'], format='%d-%m-%Y %H:%M:%S')
    observation_data.sort_values(by='time')
    observation_data['time'] = observation_data['time'].apply(lambda x: x.replace(minute=0, second=0))
    observation_data = observation_data.groupby(['time']).mean()
    observation_data=observation_data.rename(index=str, columns={"humidity": "humidity_Y", "precipitation": "precipitation_Y", "temperature":"temperature_Y"})
    
    return list_data, observation_data
​

And I’ve tried:

list_data, observation_data = get_forecast(path, path_previsions, path_observations)
X = pd.concat(list_data, axis=1, join='inner')
Y = observation_data
df_forcast_cap = pd.concat([X,Y], axis=1, join='inner')

JavaScript
 
list_data, observation_data = get_forecast(path, path_previsions, path_observations)
X = pd.concat(list_data, axis=1, join='inner')
Y = observation_data
df_forcast_cap = pd.concat([X,Y], axis=1, join='inner')
​

Which return an element of 0 row and 35 columns

I’ve also tried:

X = [list_data]
X = pd.merge(X, how='inner')

JavaScript
 
X = [list_data]
X = pd.merge(X, how='inner')
​

and no success too:

TypeError: merge() missing 1 required positional argument: ‘right’

Before the merge and concact tentatives, my list_data and observation_data are not empty here is an example:

list_data : (list)

[[                                 cl_co_D S        hu_D S  
time                                                           
2019-02-20 12:00:00                  0.00          58.000000   
2019-02-20 13:00:00                  0.00          55.000000   
2019-02-20 14:00:00                  0.00          53.000000

JavaScript
 
[[                                 cl_co_D S        hu_D S  
time                                                           
2019-02-20 12:00:00                  0.00          58.000000   
2019-02-20 13:00:00                  0.00          55.000000   
2019-02-20 14:00:00                  0.00          53.000000
​

observation_data : (pandas.core.frame.DataFrame)

                    humidity_Y      precipitation_Y  temperature_Y
time                                                           
2019-02-28 10:00:00   61.000000              0.0      16.125000
2019-02-28 11:00:00   45.250000              0.0      19.925000

JavaScript
 
                    humidity_Y      precipitation_Y  temperature_Y
time                                                           
2019-02-28 10:00:00   61.000000              0.0      16.125000
2019-02-28 11:00:00   45.250000              0.0      19.925000
​

I’ve also tried to convert my list in a dataframe:

X = pd.DataFrame(list_data) 
print(X)

JavaScript
 
X = pd.DataFrame(list_data) 
print(X)
​

but I get something like that which is not good at all:

                                                   0
0                       cloud_cover_Dark Sky  hum...
1                       cloud_cover_OpenWeatherMa...
2                       cloud_cover_Weatherbit  h...
3                       cloud_cover_Wunderground ...

JavaScript
 
                                                   0
0                       cloud_cover_Dark Sky  hum...
1                       cloud_cover_OpenWeatherMa...
2                       cloud_cover_Weatherbit  h...
3                       cloud_cover_Wunderground ...
​

What could I do to merge this list and the dataframe together?

Answer

If list_data is a list of pandas data frames, you can use pd.concat to concatenate them all into a single data frame. Use axis=0 to concatenate along the row axis, or axis=1 to concatenate along the column axis.

all_list_data = pd.concat(list_data, axis=...)

JavaScript
 
all_list_data = pd.concat(list_data, axis=...)
​

This guide may also be useful to you.

Advertisement

Answer