I have loaded some JSON API data as a Pandas dataframe, as such, there are some columns that come out as lists. I also have some NaN
values.
First and foremost I want to replace the NaN with a single word such as ’empty’ but the rest of the data are already in list forms. I want to ultimately create a new column that operates on this list
structure and essentially turns it into a string since I will be using the strings to perform mapping logic later on.
Here is some sample data and logic:
import pandas as pd import numpy as np df_test = pd.DataFrame(data={'id': [1,2,3,4], 'name': [['amanda','jen','edward','ralph'], np.NaN, ['megan','roger','greg','donald'], ['teddy','ellie','greg','jamie']] }) # issue is here trying to coerce the element of data to a list. # it takes in the elements of the string and creates a list of characters for the one I replace NaNs on df_test['name'] = df_test['name'].fillna('empty').apply(list) # here I take the lists and sort and rearrange them into a string so I can later use this format as a dictionary key. # Maybe there is a smarter way to do this df_test['name_str'] = df_test['name'].apply(lambda x: ", ".join(sorted(x)).lower()) print(df_test.head()) id name name_str 0 1 [amanda, jen, edward, ralph] amanda, edward, jen, ralph 1 2 [e, m, p, t, y] e, m, p, t, y 2 3 [megan, roger, greg, donald] donald, greg, megan, roger 3 4 [teddy, ellie, greg, jamie] ellie, greg, jamie, teddy
Any ideas on how to handle the NaNs in a fashion that makes them still ‘list-like’? I cant perform my lambda function on the column since NaNs are treated like a float.
EDIT: Solution provided by @SimonHawe in the comments. Instead of using fillna
at all, the solution is to use if else within the lambda function to handle the NaN case.
SOLUTION:
import pandas as pd import numpy as np df_test = pd.DataFrame(data={'id': [1,2,3,4], 'name': [['amanda','jen','edward','ralph'], np.NaN, ['megan','roger','greg','donald'], ['teddy','ellie','greg','jamie']] }) # here I take the lists and sort and rearrange them into a string so I can later use this format as a dictionary key. # Maybe there is a smarter way to do this df_test['name_str'] = df_test['name'].apply(lambda x: ", ".join(sorted(x)).lower() if isinstance(x,list) else 'empty') print(df_test.head()) id name name_str 0 1 [amanda, jen, edward, ralph] amanda, edward, jen, ralph 1 2 [e, m, p, t, y] empty 2 3 [megan, roger, greg, donald] donald, greg, megan, roger 3 4 [teddy, ellie, greg, jamie] ellie, greg, jamie, teddy
Advertisement
Answer
IIUC, you can get all the rows with NaN
and fill them with ['empty']
which you can then pass through the eval
function:
m = df_test['name'].isna() df_test.loc[m, 'name'] = df_test.loc[mask, 'name'].fillna("['empty']").apply(eval)