I have around 600 csv file datasets, all have the very same column names [‘DateTime’, ‘Actual’, ‘Consensus’, ‘Previous’, ‘Revised’], all economic indicators and all-time series data sets.
the aim is to merge them all together in one csv file.
With ‘DateTime’ as an index.
The way I wanted this file to indexed in is the time line way which means let’s say the first event in the first csv dated in 12/18/2017 10:00:00 and first event in the second csv dated in 12/29/2017 09:00:00 and first event in the third csv dated in 12/20/2017 09:00:00.
So, I want to index them the later first and the newer after it, etc. despite the source csv it originally from.
I tried to merge just 3 of them as an experiment and the problem is the ‘DateTime’ because it prints the 3 of them together like this (’12/18/2017 10:00:00′, ’12/29/2017 09:00:00′, ’12/20/2017 09:00:00′) Here is the code:
import pandas as pd df1 = pd.read_csv("E:BusinessEconomic IndicatorsConsumer Price Index - Core (YoY) - European Monetary Union.csv") df2 = pd.read_csv("E:BusinessEconomic IndicatorsPrivate loans (YoY) - European Monetary Union.csv") df3 = pd.read_csv("E:BusinessEconomic IndicatorsCurrent Account s.a - European Monetary Union.csv") df = pd.concat([df1, df2, df3], axis=1, join='inner') df.set_index('DateTime', inplace=True) print(df.head()) df.to_csv('df.csv')
Advertisement
Answer
Consider using read_csv()
args, index_col and parse_dates, to create indices during import and format as datetime. Then run your needed horizontal merge. Below assumes date is in first column of csv. And at the end use sort_index()
on final dataframe to sort the datetimes.
df1 = pd.read_csv(r"E:BusinessEconomic IndicatorsConsumer Price Index - Core (YoY) - European Monetary Union.csv", index_col=[0], parse_dates=[0]) df2 = pd.read_csv(r"E:BusinessEconomic IndicatorsPrivate loans (YoY) - European Monetary Union.csv", index_col=[0], parse_dates=[0]) df3 = pd.read_csv(r"E:BusinessEconomic IndicatorsCurrent Account s.a - European Monetary Union.csv", index_col=[0], parse_dates=[0]) finaldf = pd.concat([df1, df2, df3], axis=1, join='inner').sort_index()
And for DRY-er approach especially across the hundreds of csv files, use a list comprehension
import os ... os.chdir('E:\Business\Economic Indicators') dfs = [pd.read_csv(f, index_col=[0], parse_dates=[0]) for f in os.listdir(os.getcwd()) if f.endswith('csv')] finaldf = pd.concat(dfs, axis=1, join='inner').sort_index()