I have a root folder
/home/project/data
With multiple folders in it with and ultimate paths to csv files:
/home/project/data/2020-12-05/John_Smith/data.csv /home/project/data/2020-12-05/Robert_White/data.csv /home/project/data/2020-12-06/John_Smith/data.csv /home/project/data/2020-12-06/Sam_Walberg/data.csv /home/project/data/2020-12-06/Garry_Oswald/data.csv ...
I was managed to create a Dataframe containing all csv files concatenated using the following code:
full_path = []
for subdir, dirs, files in os.walk(rootdir):
for file in files:
full_path.append(os.path.join(subdir, file))
dfs = [pd.read_csv(csv_path) for csv_path in full_path]
df = pd.concat(dfs)
Result:
df= pr_id quantity 0 27 4 1 89 1 2 33 2 3 8 3 4 16 1 ...
But I am now struggling to add the respective date+name to the Dataframe, so it would look like this:
df= pr_id quantity name date 0 27 4 John_Smith 2020-12-05 1 89 1 Robert_White 2020-12-05 2 33 2 John_Smith 2020-12-06 3 8 3 Sam_Walberg 2020-12-06 4 16 1 Garry_Oswald 2020-12-06 ...
How can I do it?
Advertisement
Answer
With pathlib, you can go 1 & 2 directories up and get the name and date. Since this involves two things, an explicit for loop might be more readable than the list comprehension:
from pathlib import Path
# ...above are the same
dfs = []
for csv_path in full_path:
# generate a `Path` object and get parents
p = Path(csv_path)
parents = p.parents
# get the desired values from "parent" dirs
name = parents[0].name
date = parents[1].name
# read in the CSV as is
frame = pd.read_csv(csv_path)
# assign the `name` and `date` columns
frame["name"] = name
frame["date"] = date
# store in the list
dfs.append(frame)
# lastly concating as you did
df = pd.concat(dfs)
Or equivalently, the list comprehension counterpart is:
dfs = [pd.read_csv(csv_path).assign(name=csv_path.parents[0].name,
date=csv_path.parents[1].name)
for csv_path in map(Path, full_path)]
df = pd.concat(dfs)
where we use assign to put new columns to each dataframe.
It depends on you to choose between explicit for loop or list comprehension.