I have loaded some JSON API data as a Pandas dataframe, as such, there are some columns that come out as lists. I also have some NaN
First and foremost I want to replace the NaN with a single word such as ’empty’ but the rest of the data are already in list forms. I want to ultimately create a new column that operates on this list
structure and essentially turns it into a string since I will be using the strings to perform mapping logic later on.
Here is some sample data and logic:
import pandas as pd
import numpy as np
df_test = pd.DataFrame(data={'id': [1,2,3,4],
'name': [['amanda','jen','edward','ralph'],
# issue is here trying to coerce the element of data to a list.
# it takes in the elements of the string and creates a list of characters for the one I replace NaNs on
df_test['name'] = df_test['name'].fillna('empty').apply(list)
# here I take the lists and sort and rearrange them into a string so I can later use this format as a dictionary key.
# Maybe there is a smarter way to do this
df_test['name_str'] = df_test['name'].apply(lambda x: ", ".join(sorted(x)).lower())
id name name_str
0 1 [amanda, jen, edward, ralph] amanda, edward, jen, ralph
1 2 [e, m, p, t, y] e, m, p, t, y
2 3 [megan, roger, greg, donald] donald, greg, megan, roger
3 4 [teddy, ellie, greg, jamie] ellie, greg, jamie, teddy
Any ideas on how to handle the NaNs in a fashion that makes them still ‘list-like’? I cant perform my lambda function on the column since NaNs are treated like a float.
EDIT: Solution provided by @SimonHawe in the comments. Instead of using fillna
at all, the solution is to use if else within the lambda function to handle the NaN case.
import pandas as pd
import numpy as np
df_test = pd.DataFrame(data={'id': [1,2,3,4],
'name': [['amanda','jen','edward','ralph'],
# here I take the lists and sort and rearrange them into a string so I can later use this format as a dictionary key.
# Maybe there is a smarter way to do this
df_test['name_str'] = df_test['name'].apply(lambda x: ", ".join(sorted(x)).lower() if isinstance(x,list) else 'empty')
id name name_str
0 1 [amanda, jen, edward, ralph] amanda, edward, jen, ralph
1 2 [e, m, p, t, y] empty
2 3 [megan, roger, greg, donald] donald, greg, megan, roger
3 4 [teddy, ellie, greg, jamie] ellie, greg, jamie, teddy
IIUC, you can get all the rows with NaN
and fill them with ['empty']
which you can then pass through the eval
m = df_test['name'].isna()
df_test.loc[m, 'name'] = df_test.loc[mask, 'name'].fillna("['empty']").apply(eval)