Grouping and concatening values in Pandas dataframes

I found an answer to my question in another request on this site, however the answer provided doesn’t work for me so I’m asking in a different request. I will use the same data and show results I’m getting.

So basically, I have a dataframe that one column has repeated values that I want to group in a single row, and I want to concatenate the values of another column as the value of the combined rows.

Here’s the example:

data = np.array([['John', 'Smith', 1], ['John', 'Smith', 7], ['Eric', 'Adams', 9], ['Jane', 'Doe', 14], ['Jane', 'Doe', 16], ['John', 'Smith', 19]])

df = pd.DataFrame(data, columns=['FIRST_NM', 'LAST_NM', 'PAGE_NUM'])

JavaScript
​x
 
data = np.array([['John', 'Smith', 1], ['John', 'Smith', 7], ['Eric', 'Adams', 9], ['Jane', 'Doe', 14], ['Jane', 'Doe', 16], ['John', 'Smith', 19]])
​
df = pd.DataFrame(data, columns=['FIRST_NM', 'LAST_NM', 'PAGE_NUM'])
​

The output is as expected:

    FIRST_NM    LAST_NM     PAGE_NUM
0   John        Smith       1
1   John        Smith       7
2   Eric        Adams       9
3   Jane        Doe         14
4   Jane        Doe         16
5   John        Smith       19

JavaScript
 
    FIRST_NM    LAST_NM     PAGE_NUM
0   John        Smith       1
1   John        Smith       7
2   Eric        Adams       9
3   Jane        Doe         14
4   Jane        Doe         16
5   John        Smith       19
​

This is what I want to get once I group on values for FIRST_NM and LAST_NM, and concatenate the values of PAGE_NUM:

    FIRST_NM    LAST_NM     PAGE_NUM
0   John        Smith       1,7,19
1   Eric        Adams       9
2   Jane        Doe         14,16

JavaScript
 
    FIRST_NM    LAST_NM     PAGE_NUM
0   John        Smith       1,7,19
1   Eric        Adams       9
2   Jane        Doe         14,16
​

The proposed solution, and it makes a lot of sense to me, is to use this:

df.groupby(['FIRST_NM', 'LAST_NM']).apply(lambda group: ','.join(group['PAGE_NUM']))

JavaScript
 
df.groupby(['FIRST_NM', 'LAST_NM']).apply(lambda group: ','.join(group['PAGE_NUM']))
​

However and unfortunately, this seems to have no effect at all, I’m getting the exact same results:

    FIRST_NM    LAST_NM     PAGE_NUM
0   John        Smith       1
1   John        Smith       7
2   Eric        Adams       9
3   Jane        Doe         14
4   Jane        Doe         16
5   John        Smith       19

JavaScript
 
    FIRST_NM    LAST_NM     PAGE_NUM
0   John        Smith       1
1   John        Smith       7
2   Eric        Adams       9
3   Jane        Doe         14
4   Jane        Doe         16
5   John        Smith       19
​

Anyone can help pointing out what I’m doing wrong?

Many thanks in advance!

Answer

try via groupby() and agg():

out=df.groupby(['FIRST_NM','LAST_NM'],sort=False,as_index=False).agg({'PAGE_NUM':','.join})

JavaScript
 
out=df.groupby(['FIRST_NM','LAST_NM'],sort=False,as_index=False).agg({'PAGE_NUM':','.join})
​

If needed unique value then use:

out=df.groupby(['FIRST_NM','LAST_NM'],sort=False,as_index=False).agg({'PAGE_NUM':set})
out['PAGE_NUM']=out['PAGE_NUM'].astype(str).str[1:-1].str.replace("'",'')

JavaScript
 
out=df.groupby(['FIRST_NM','LAST_NM'],sort=False,as_index=False).agg({'PAGE_NUM':set})
out['PAGE_NUM']=out['PAGE_NUM'].astype(str).str[1:-1].str.replace("'",'')
​

Output of out:

    FIRST_NM    LAST_NM     PAGE_NUM
0   John        Smith       1,7,19
1   Eric        Adams       9
2   Jane        Doe         14,16

JavaScript
 
    FIRST_NM    LAST_NM     PAGE_NUM
0   John        Smith       1,7,19
1   Eric        Adams       9
2   Jane        Doe         14,16
​

Advertisement

Answer