Skip to content
Advertisement

Merging csv files in order of date created Python Pandas

I am merging 3700 csv files with a total of 10 million rows. The files do not have a sequential naming but the date in which they were created(Descending) is sequential. I use the following code to merge them but do not know how to add pick them in that sequence.

import pandas as pd
import glob

path = r'C:UsersUser' #  path

all_files = glob.glob(path + "/*.csv")

li = []

for filename in all_files:

    df = pd.read_csv(filename, index_col=None, header=0)

    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

The following are names of files arranged in date modified(oldest to newest) from the os

V100(1005).csv V100(1).csv V100(778).csv V100(2).csv

The file names are not sequential because while downloading other files were deleted in between.

Advertisement

Answer

If I understand the question correctly, you need this before the loop (based on an answer here), which sorts list of files by creation/modification date in Python 3:

import os
all_files = sorted(all_files, key=os.path.getmtime)
#all_files = sorted(all_files, key=os.path.getctime) #works too

Test on MacOs:

  • I created 5 files similar to OP: enter image description here

  • Using glob to read files

import glob
all_files = glob.glob('V*.csv')
print(all_files)
#['V100-778.csv', 'V100-778-.csv', 'V100-1005-1.csv', 'V100-1.csv', 'V100-2.csv']
  • Sorting the files based on modification time:
import os
all_files = sorted(all_files, key=os.path.getmtime)
print(all_files)
#['V100-778-.csv', 'V100-1005-1.csv', 'V100-1.csv', 'V100-778.csv', 'V100-2.csv']
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement