Lets say I have a dataframe like this:
| Column1 | Column2 | Column 3 | Column 4 | Column 5 | Column 6 | Column 7 | Platform_key | 
|---|---|---|---|---|---|---|---|
| amazonwebservicesaws | asiapacificmumbai | 38.33 | nan | nan | nan | nan | amazonwebservicesaws_asiapacificmumbai | 
| amazonwebservicesaws | asiapacificmumbai | nan | nan | nan | nan | 1.83 | amazonwebservicesaws_asiapacificmumbai | 
| amazonwebservicesaws | asiapacificmumbai | nan | nan | nan | 5 | nan | amazonwebservicesaws_asiapacificmumbai | 
| amazonwebservicesaws | asiapacificmumbai | nan | nan | 2.21 | nan | nan | amazonwebservicesaws_asiapacificmumbai | 
| amazonwebservicesaws | asiapacificmumbai | nan | 20.83 | nan | nan | nan | amazonwebservicesaws_asiapacificmumbai | 
And I want to combine all these rows (there are 5 in the example, but more in the real dataset) and columns (also more than shown in the dataset below) based on the platform key. So like this:
| Column1 | Column2 | Column 3 | Column 4 | Column 5 | Column 6 | Column 7 | Platform_key | 
|---|---|---|---|---|---|---|---|
| amazonwebservicesaws | asiapacificmumbai | 38.33 | 20.83 | 2.21 | 5 | 1.83 | amazonwebservicesaws_asiapacificmumbai | 
What is the best way to do this?
Advertisement
Answer
We can just groupby with first , which will pick the first not NaN value per col
out = df.groupby(['Platform_key'],as_index=False).first()