I found an answer to my question in another request on this site, however the answer provided doesn’t work for me so I’m asking in a different request. I will use the same data and show results I’m getting.
So basically, I have a dataframe that one column has repeated values that I want to group in a single row, and I want to concatenate the values of another column as the value of the combined rows.
Here’s the example:
data = np.array([['John', 'Smith', 1], ['John', 'Smith', 7], ['Eric', 'Adams', 9], ['Jane', 'Doe', 14], ['Jane', 'Doe', 16], ['John', 'Smith', 19]]) df = pd.DataFrame(data, columns=['FIRST_NM', 'LAST_NM', 'PAGE_NUM'])
The output is as expected:
FIRST_NM LAST_NM PAGE_NUM 0 John Smith 1 1 John Smith 7 2 Eric Adams 9 3 Jane Doe 14 4 Jane Doe 16 5 John Smith 19
This is what I want to get once I group on values for FIRST_NM
and LAST_NM
, and concatenate the values of PAGE_NUM
:
FIRST_NM LAST_NM PAGE_NUM 0 John Smith 1,7,19 1 Eric Adams 9 2 Jane Doe 14,16
The proposed solution, and it makes a lot of sense to me, is to use this:
df.groupby(['FIRST_NM', 'LAST_NM']).apply(lambda group: ','.join(group['PAGE_NUM']))
However and unfortunately, this seems to have no effect at all, I’m getting the exact same results:
FIRST_NM LAST_NM PAGE_NUM 0 John Smith 1 1 John Smith 7 2 Eric Adams 9 3 Jane Doe 14 4 Jane Doe 16 5 John Smith 19
Anyone can help pointing out what I’m doing wrong?
Many thanks in advance!
JF
Advertisement
Answer
try via groupby()
and agg()
:
out=df.groupby(['FIRST_NM','LAST_NM'],sort=False,as_index=False).agg({'PAGE_NUM':','.join})
If needed unique value then use:
out=df.groupby(['FIRST_NM','LAST_NM'],sort=False,as_index=False).agg({'PAGE_NUM':set}) out['PAGE_NUM']=out['PAGE_NUM'].astype(str).str[1:-1].str.replace("'",'')
Output of out
:
FIRST_NM LAST_NM PAGE_NUM 0 John Smith 1,7,19 1 Eric Adams 9 2 Jane Doe 14,16