Skip to content
Advertisement

python pandas merge multiple csv files

I have around 600 csv file datasets, all have the very same column names [‘DateTime’, ‘Actual’, ‘Consensus’, ‘Previous’, ‘Revised’], all economic indicators and all-time series data sets.

the aim is to merge them all together in one csv file.

With ‘DateTime’ as an index.

The way I wanted this file to indexed in is the time line way which means let’s say the first event in the first csv dated in 12/18/2017 10:00:00 and first event in the second csv dated in 12/29/2017 09:00:00 and first event in the third csv dated in 12/20/2017 09:00:00.

So, I want to index them the later first and the newer after it, etc. despite the source csv it originally from.

I tried to merge just 3 of them as an experiment and the problem is the ‘DateTime’ because it prints the 3 of them together like this (’12/18/2017 10:00:00′, ’12/29/2017 09:00:00′, ’12/20/2017 09:00:00′) Here is the code:

import pandas as pd


df1 = pd.read_csv("E:BusinessEconomic IndicatorsConsumer Price Index - Core (YoY) - European Monetary Union.csv")
df2 = pd.read_csv("E:BusinessEconomic IndicatorsPrivate loans (YoY) - European Monetary Union.csv")
df3 = pd.read_csv("E:BusinessEconomic IndicatorsCurrent Account s.a - European Monetary Union.csv")

df = pd.concat([df1, df2, df3], axis=1, join='inner')
df.set_index('DateTime', inplace=True)

print(df.head())
df.to_csv('df.csv')

Advertisement

Answer

Consider using read_csv() args, index_col and parse_dates, to create indices during import and format as datetime. Then run your needed horizontal merge. Below assumes date is in first column of csv. And at the end use sort_index() on final dataframe to sort the datetimes.

df1 = pd.read_csv(r"E:BusinessEconomic IndicatorsConsumer Price Index - Core (YoY) - European Monetary Union.csv",
                  index_col=[0], parse_dates=[0])
df2 = pd.read_csv(r"E:BusinessEconomic IndicatorsPrivate loans (YoY) - European Monetary Union.csv",
                  index_col=[0], parse_dates=[0])
df3 = pd.read_csv(r"E:BusinessEconomic IndicatorsCurrent Account s.a - European Monetary Union.csv",
                  index_col=[0], parse_dates=[0])

finaldf = pd.concat([df1, df2, df3], axis=1, join='inner').sort_index()

And for DRY-er approach especially across the hundreds of csv files, use a list comprehension

import os
...
os.chdir('E:\Business\Economic Indicators')

dfs = [pd.read_csv(f, index_col=[0], parse_dates=[0])
        for f in os.listdir(os.getcwd()) if f.endswith('csv')]

finaldf = pd.concat(dfs, axis=1, join='inner').sort_index()
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement