I have multiple csv files (which I’ve moved into pandas dataframes) in a folder, each of which holds monthly website data
January.csv:
URL Value
page1 10
page2 52
page3 17
February.csv:
URL Value
page1 20
page2 7
page3 15
March.csv:
URL Value
page1 7
page2 15
page3 23
and need to combine them by copying the Value column from each to make a new dataframe (which will ultimately be exported to another csv)
URL January February March
page1 10 20 7
page2 52 7 15
page3 17 15 23
A new csv file will be added to the folder each month, so I need to keep it as dynamic as possible. I’m currently using
all_filenames = [i for i in glob.glob('*.{}'.format('csv'))]
to get the files with the hope that I can then use something like pd.read_csv(f)['URL'] for f in all_filenames
, but that may be totally the wrong approach?
Can anyone point me in the right direction?
Thanks
Advertisement
Answer
While @robinood’s answer is correct, I think it’s more efficient to use generator expression (in round brackets):
all_filenames = [i for i in glob.glob('*.{}'.format('csv'))]
result = pd.concat((pd.read_csv(f, index_col='URL', usecols=['URL', 'Value']) for f in all_filenames), axis=1)
Note that the above snippet also sets URL
as index to avoid potential concat errors due to different sorting of the rows.