Skip to content
Advertisement

How to concatenate dataframes without overwriting column values

I am trying to create a list of DataFrames in a for loop, and then concatenate them outside of the for loop and write to excel. I have gotten the code to be close, but for some reason the column created inside the for loop ends up being the final value rather each individual value as it loops through. I am sure this is extremely simple and I’m just missing something easy but cannot seem to figure it out tonight. Here is some dummy code to illustrate and help with problem solving. At the end, I’d like an excel output with wks 1 – 9. However, in my actual output all values in the wk column are equal to 9.

import pandas as pd
sample = []
area = pd.Series({'California':423967, 'Texas':695662})
pop = pd.Series({'California':38332521, 'Texas':26448193})

data = pd.DataFrame({'area':area, 'pop':pop})
print(data)

for i in range(10):
    data['wk'] = i
    sample.append(data)
    print(data)

res = pd.concat(sample)
res.to_excel("Concatenating DataFrames.xlsx", index=False)

Advertisement

Answer

The problem lies here that when you append the data to list it does not create the separate memory for it rather they all refer to the same memory that’s why after last updation in the loop it becomes 9 or final value.

just replace

sample.append(data)

with

sample.append(data.copy())

.copy() will enforce pandas to create new memory for each df in loop

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement