Skip to content
Advertisement

How to combine rows that have the same values in two columns (Python)?

I currently have a csv file as follows. The first part just shows the columns names.

"f","p","g"
"foo","in","void"
"foo","out","void"
"foo","length","void"
...

The g column values are the same for every f value. The only unique part is p. Using python, how could I combine this as follows:

"foo","in","out","length","void"

One thing to note is that the csv file is much larger and that some f values might have more p values. For example, it could be like this:

"goo","a","int"
"goo","b","int"
"goo","c","int"
"goo","d","int"
"goo","e","int"
"goo","f","int"
...

Advertisement

Answer

I hope I’ve understood your question right. You can group by “f”, “g” column and then aggregate the rows:

x = df.groupby(["f", "g"], as_index=False)["p"].agg(list)
for vals in x.apply(lambda x: [x["f"], *x["p"], x["g"]], axis=1):
    print(vals)

Prints:

['foo', 'in', 'out', 'length', 'void']
['goo', 'a', 'b', 'c', 'd', 'e', 'f', 'int']
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement