Pandas, drop duplicates but merge certain columns

Question

I'm looking for a way to drop duplicate rows based one a certain column subset, but merge some data, so it does not get removed. Parcel Res Bill Year 001 Henry 4,100 1995 002 Nick 2,300 1990 003 Paul 5,200 2008 003 Bill 4,000 2008 Some pseudo code would look something like this: Parcel Res Bill Year 001 Henry 4,100

Accepted Answer

You can use .groupby with .agg:df = (    df.groupby("Parcel")    .agg({"Res": ", ".join, "Bill": ", ".join, "Year": "first"})    .reset_index())print(df)Prints:  Parcel         Res          Bill  Year0    001       Henry         4,100  19951    002        Nick         2,300  19902    003  Paul, Bill  5,200, 4,000  2008EDIT: If you have many columns, you can aggregate all values by &#8220;first&#8221; and then update the dataframe:g = df.groupby("Parcel")x = g.agg("first")x.update(g.agg({"Res": ", ".join, "Bill": ", ".join}))print(x.reset_index())  Parcel         Res          Bill  Year0    001       Henry         4,100  19951    002        Nick         2,300  19902    003  Paul, Bill  5,200, 4,000  2008

Parcel	Res	Bill	Year
001	Henry	4,100	1995
002	Nick	2,300	1990
003	Paul	5,200	2008
003	Bill	4,000	2008

Advertisement

Answer