Reading Csvs, adding a column, putting data into the new column and then saving csvs in python

Tags: ,



I currently have a folder with about 300 csv files in them. What I want to do is to read all the csv files within the folder.

The Csv files currently have two columns in them. I want to add a third column to each file.

In this third column I want there to be either a 1 or a 0 depending on whether the corresponding value is greater than or equal to or less than 64

So I have this code which does what I want to an individual file but I need help in how to iterate over all the files in the directory and do it for all of them

import pandas as pd

df = pd.read_csv("test.csv")

df['Above_64'] = ''

df.loc[df.value < 64, 'Above_64'] = '0'
df.loc[df.value >= 64, 'Above_64'] = '1'

df.to_csv('test2.csv')

Answer

Have you started working on this and have code already that you are stuck on? That would help people give some specific help.

Generally, though, you can use the glob module (https://docs.python.org/3/library/glob.html) to help identify all files in a folder. You can do a for-loop in that folder to grab each file and perform the transformations on it.

If you do glob.glob(filepath) (with whatever wildcards), that will return a list of filepaths-as-strings that you can then loop (or iterate) through. So you can do something like

You can use pandas (https://pandas.pydata.org/) to pull the data from the csv into a “dataframe” in python, then pandas can add columns on whatever your criteria is, and then you re-save the file.

Then your for-loop goes into the next file, on and on to the end.

files = glob.glob(filepath)
for file in files:
   df = pandas.read_csv(file) #makes a dataframe from the file contents
   df['new column'] = df[column_name] >= 64 #makes the new column a true/false based on the criteria. you can put whatever new column name in the quotes, but you need the old column name in the second-half
   df['new column'].replace([True,False],[1,0],inplace=True) #replace true/false with 1/0. inplace=True makes it change the column rather than return the info as a series
   df.to_csv(file) #save this new dataframe as the same file you read it from


Source: stackoverflow