I have around 100 csv files with each one containing the same three columns. There are several ways to read the files into a single dataframe, but is there a way that I could append the file name to the column names in order to keep track of the origin of the columns?
I have now tried to import the files using the following code:
import glob import os import pandas as pd df = pd.concat(map(pd.read_csv, glob.glob(os.path.join('', "my_files*.csv"))))
For example, if the inital files are:
“A_reduced.csv” and “B_increased.csv” and each file contains three columns (Time, X, Y)
The expected output would be:
Time | X_A_reduced | X_B_increased | Y_A_reduced | Y_B_increased |
---|---|---|---|---|
1 | 34 | |||
2 | 42 |
Advertisement
Answer
You coud add a prefix (or suffix) to the column names prior to concatenating the dataframes, eg:
def f(i): return pd.read_csv(i).add_prefix(i.split('_')[0] + '_') df = pd.concat(map(f, glob.glob(os.path.join('', "my_files*.csv")))