I have a dataframe (df) that looks like this (highly simplified):
ID A B C VALUE 1 10 462 2241 217 2 11 498 6953 217 3 67 120 6926 654 4 68 898 7153 654 5 87 557 4996 654 6 88 227 6475 911 7 47 875 5097 911 8 48 143 8953 111 9 65 157 4470 111 10 66 525 9328 111
The 'VALUE'
column contains a variable number of rows with identical values. I am trying to output a series of csv files that contain all of the rows that contain a 'VALUE'
length == 2, ==3 etc. For example:
to_csv('/Path/to/VALUE_len_2.csv') ID A B C VALUE 1 10 462 2241 217 2 11 498 6953 217 6 88 227 6475 911 7 47 875 5097 911 to_csv('/Path/to/VALUE_len_3.csv') ID A B C VALUE 3 67 120 6926 654 4 68 898 7153 654 5 87 557 4996 654 to_csv('/Path/to/VALUE_len_4.csv') ID A B C VALUE 7 47 875 5097 111 8 48 143 8953 111 9 65 157 4470 111 10 66 525 9328 111
I can get the desired output of one length value at a time, e.g., using:
df = pd.concat(v for _, v in df.groupby("VALUE") if len(v) == 2) df.to_csv("/Path/to/VALUE_len_2.csv")
However, I have dozens of values to test. I would like to put this in a for loop on the order of:
mylist = [2,3,4,5,6,7,8,9] or len([2,3,4,5,6,7,8,9]) grouped = df.groupby(['VALUE']) output = '/Path/to/VALUE_len_{}.csv' for loop here: if item in my list found in grouped: output rows to csv else: pass
- I’ve tried various constructions to iterate the groupby object using the items in the list, and I haven’t been able to make anything work.
- It might be an issue with trying to use a groupby object this way, but it is more than likely it is my inability to get the syntax right to complete the iteration.
Advertisement
Answer
- It doesn’t make sense to use a predetermined list to create the filenames.
df_len
will be used to generate a filename using anf-string
.Path.exists()
is used to determine if the file exists or not
import pandas as pd from pathlib import Path # test data data = {'ID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'A': [10, 11, 67, 68, 87, 88, 47, 48, 65, 66], 'B': [462, 498, 120, 898, 557, 227, 875, 143, 157, 525], 'C': [2241, 6953, 6926, 7153, 4996, 6475, 5097, 8953, 4470, 9328], 'VALUE': [217, 217, 654, 654, 654, 911, 911, 111, 111, 111]} df = pd.DataFrame(data) # groupby value for group, data in df.groupby('VALUE'): # get the length of the dataframe df_len = len(data) # create a filename with df_len file = Path(f'/path/to/VALUE_len_{df_len}.csv') # if the file exists, append without the header if file.exists(): data.to_csv(file, index=False, mode='a', header=False) # create a new file else: data.to_csv(file, index=False)
- If you must only create a file for dataframes of a specific length
desired_length = [2, 3, 4, 5, 6, 7, 8, 9] # groupby value for group, data in df.groupby('VALUE'): # get the length of the dataframe df_len = len(data) # create a filename with df_len file = Path(f'/path/to/VALUE_len_{df_len}.csv') # check if the length of the dataframe is in the desired length if df_len in desired_length: # if the file exists, append without the header if file.exists(): data.to_csv(file, index=False, mode='a', header=False) # create a new file else: data.to_csv(file, index=False)