Skip to content
Advertisement

Read multiple csv files into a single dataframe and rename columns based on file of origin – Pandas

I have around 100 csv files with each one containing the same three columns. There are several ways to read the files into a single dataframe, but is there a way that I could append the file name to the column names in order to keep track of the origin of the columns?

I have now tried to import the files using the following code:

import glob
import os
import pandas as pd   

df = pd.concat(map(pd.read_csv, glob.glob(os.path.join('', "my_files*.csv"))))

For example, if the inital files are:

“A_reduced.csv” and “B_increased.csv” and each file contains three columns (Time, X, Y)

The expected output would be:

Time X_A_reduced X_B_increased Y_A_reduced Y_B_increased
1 34
2 42

Advertisement

Answer

You coud add a prefix (or suffix) to the column names prior to concatenating the dataframes, eg:

def f(i):
    return pd.read_csv(i).add_prefix(i.split('_')[0] + '_')

df = pd.concat(map(f, glob.glob(os.path.join('', "my_files*.csv")))
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement