How to create files from a groupby object, based on the length of the dataframe

Question

I have a dataframe (df) that looks like this (highly simplified): The 'VALUE' column contains a variable number of rows with identical values. I am trying to output a series of csv files that contain all of the rows that contain a 'VALUE' length == 2, ==3 etc. For example: I can get the desired output of one length value

Accepted Answer

It doesn&#8217;t make sense to use a predetermined list to create the filenames.df_len will be used to generate a filename using an f-string.Path.exists() is used to determine if the file exists or notimport pandas as pdfrom pathlib import Path# test datadata = {'ID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'A': [10, 11, 67, 68, 87, 88, 47, 48, 65, 66], 'B': [462, 498, 120, 898, 557, 227, 875, 143, 157, 525], 'C': [2241, 6953, 6926, 7153, 4996, 6475, 5097, 8953, 4470, 9328], 'VALUE': [217, 217, 654, 654, 654, 911, 911, 111, 111, 111]}df = pd.DataFrame(data)# groupby valuefor group, data in df.groupby('VALUE'):        # get the length of the dataframe    df_len = len(data)        # create a filename with df_len    file = Path(f'/path/to/VALUE_len_{df_len}.csv')        # if the file exists, append without the header    if file.exists():        data.to_csv(file, index=False, mode='a', header=False)            # create a new file    else:        data.to_csv(file, index=False) If you must only create a file for dataframes of a specific lengthdesired_length = [2, 3, 4, 5, 6, 7, 8, 9]# groupby valuefor group, data in df.groupby('VALUE'):        # get the length of the dataframe    df_len = len(data)        # create a filename with df_len    file = Path(f'/path/to/VALUE_len_{df_len}.csv')        # check if the length of the dataframe is in the desired length    if df_len in desired_length:                # if the file exists, append without the header        if file.exists():            data.to_csv(file, index=False, mode='a', header=False)        # create a new file        else:            data.to_csv(file, index=False)

Advertisement

Answer