Skip to content
Advertisement

How to handle Pandas columns where elements are Lists?

I have loaded some JSON API data as a Pandas dataframe, as such, there are some columns that come out as lists. I also have some NaN values.

First and foremost I want to replace the NaN with a single word such as ’empty’ but the rest of the data are already in list forms. I want to ultimately create a new column that operates on this list structure and essentially turns it into a string since I will be using the strings to perform mapping logic later on.

Here is some sample data and logic:

import pandas as pd
import numpy as np

df_test = pd.DataFrame(data={'id': [1,2,3,4],
                             'name': [['amanda','jen','edward','ralph'],
                                      np.NaN,
                                      ['megan','roger','greg','donald'],
                                      ['teddy','ellie','greg','jamie']]
                            })

# issue is here trying to coerce the element of data to a list.
# it takes in the elements of the string and creates a list of characters for the one I replace NaNs on
df_test['name'] = df_test['name'].fillna('empty').apply(list)

# here I take the lists and sort and rearrange them into a string so I can later use this format as a dictionary key. 
# Maybe there is a smarter way to do this
df_test['name_str'] = df_test['name'].apply(lambda x: ", ".join(sorted(x)).lower())
print(df_test.head())

   id                          name                    name_str
0   1  [amanda, jen, edward, ralph]  amanda, edward, jen, ralph
1   2               [e, m, p, t, y]               e, m, p, t, y
2   3  [megan, roger, greg, donald]  donald, greg, megan, roger
3   4   [teddy, ellie, greg, jamie]   ellie, greg, jamie, teddy

Any ideas on how to handle the NaNs in a fashion that makes them still ‘list-like’? I cant perform my lambda function on the column since NaNs are treated like a float.

EDIT: Solution provided by @SimonHawe in the comments. Instead of using fillna at all, the solution is to use if else within the lambda function to handle the NaN case.

SOLUTION:

import pandas as pd
import numpy as np

df_test = pd.DataFrame(data={'id': [1,2,3,4],
                             'name': [['amanda','jen','edward','ralph'],
                                      np.NaN,
                                      ['megan','roger','greg','donald'],
                                      ['teddy','ellie','greg','jamie']]
                            })


# here I take the lists and sort and rearrange them into a string so I can later use this format as a dictionary key. 
# Maybe there is a smarter way to do this
df_test['name_str'] = df_test['name'].apply(lambda x: ", ".join(sorted(x)).lower() if isinstance(x,list) else 'empty')
print(df_test.head())

   id                          name                    name_str
0   1  [amanda, jen, edward, ralph]  amanda, edward, jen, ralph
1   2               [e, m, p, t, y]                       empty
2   3  [megan, roger, greg, donald]  donald, greg, megan, roger
3   4   [teddy, ellie, greg, jamie]   ellie, greg, jamie, teddy

Advertisement

Answer

IIUC, you can get all the rows with NaN and fill them with ['empty'] which you can then pass through the eval function:

m = df_test['name'].isna()
df_test.loc[m, 'name'] = df_test.loc[mask, 'name'].fillna("['empty']").apply(eval)
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement