Skip to content
Advertisement

What is the best way to combine dataframes that have been created through a for loop?

I am trying to combine dataframes with 2 columns into a single dataframe. The initial dataframes are generated through a for loop and stored in a list. I am having trouble getting the data from the list of dataframes into a single dataframe. Right now when I run my code, it treats each full dataframe as a row.

def linear_reg_function(category):
     df = pd.read_csv(file)
     df = df[df['category_column'] == category]`
     df1 = df[['category_column', 'value_column']]
     df_export.append(df1)


df_export = []

for category in category_list:
    linear_reg_function(category)

when I run this block of code I get a list of dataframes that have 2 columns. When I try to convert df_export to a dataframe, it ends up with 12 rows (the number of categories in category_list). I tried:

df_export = pd.DataFrame()

but the result was:

_

I would like to have a single dataframe with 2 columns, [Category, Value] that includes the values of all 12 categories generated in the for loop.

Advertisement

Answer

You can use pd.concat to merge a list of DataFrames into a single big DataFrame.

appended_data = []
for infile in glob.glob("*.xlsx"):
    data = pandas.read_excel(infile)
    # store DataFrame in list
    appended_data.append(data)
# see pd.concat documentation for more info
appended_data = pd.concat(appended_data)
# write DataFrame to an excel sheet 
appended_data.to_excel('appended.xlsx')

you can manipulate it to your proper demande

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement