I have this pandas dataframe. I am interested to convert every row into a dict. So i use the above function and this trick But the result i get is weird. The last row had Language and Discount both as Nan. And i expected that both should have been filtered out but i see only Language is filtered out. How

Filter Nulls when converting pandas dataframe to dict

I have this pandas dataframe.

 technologies = [
 ("Spark", 22000,'30days',1000.0, 'Scala'),
         ("PySpark",25000,'50days',2300.0, 'Python'),
 ("Hadoop",23000,'55days',np.nan,np.nan)
 ]
 df = pd.DataFrame(technologies,columns = ['Courses','Fee','Duration','Discount', 'Language'])
 print(df)

   Courses    Fee Duration  Discount Language
0    Spark  22000   30days    1000.0    Scala
1  PySpark  25000   50days    2300.0   Python
2   Hadoop  23000   55days       NaN      NaN

JavaScript
​x
 
 technologies = [
 ("Spark", 22000,'30days',1000.0, 'Scala'),
         ("PySpark",25000,'50days',2300.0, 'Python'),
 ("Hadoop",23000,'55days',np.nan,np.nan)
 ]
 df = pd.DataFrame(technologies,columns = ['Courses','Fee','Duration','Discount', 'Language'])
 print(df)
​
   Courses    Fee Duration  Discount Language
0    Spark  22000   30days    1000.0    Scala
1  PySpark  25000   50days    2300.0   Python
2   Hadoop  23000   55days       NaN      NaN
​

I am interested to convert every row into a dict.

def convert_to_dict(row) -> dict:
    result = dict(row)
    final_result = {k:v for k, v in result.items() if v is not np.nan}
    print(final_result)

JavaScript
 
def convert_to_dict(row) -> dict:
    result = dict(row)
    final_result = {k:v for k, v in result.items() if v is not np.nan}
    print(final_result)
​

So i use the above function and this trick

df.apply(lambda row: convert_to_dict(row), axis=1)

JavaScript
 
df.apply(lambda row: convert_to_dict(row), axis=1)
​

But the result i get is weird.

{'Courses': 'Spark', 'Fee': 22000, 'Duration': '30days', 'Discount': 1000.0, 'Language': 'Scala'}
{'Courses': 'PySpark', 'Fee': 25000, 'Duration': '50days', 'Discount': 2300.0, 'Language': 'Python'}
{'Courses': 'Hadoop', 'Fee': 23000, 'Duration': '55days', 'Discount': nan}

JavaScript
 
{'Courses': 'Spark', 'Fee': 22000, 'Duration': '30days', 'Discount': 1000.0, 'Language': 'Scala'}
{'Courses': 'PySpark', 'Fee': 25000, 'Duration': '50days', 'Discount': 2300.0, 'Language': 'Python'}
{'Courses': 'Hadoop', 'Fee': 23000, 'Duration': '55days', 'Discount': nan}
​

The last row had Language and Discount both as Nan.

And i expected that both should have been filtered out but i see only Language is filtered out.

How do i filter out all columns from the final result which are nan to filter out please ?

Answer

Use notna for filtering missing values:

final_result = {k:v for k, v in result.items() if pd.notna(v)}

JavaScript
 
final_result = {k:v for k, v in result.items() if pd.notna(v)}
​

final_result = [{k:v for k, v in result.items() if pd.notna(v)} 
                for result in df.to_dict('records')]
print(final_result)
[{'Courses': 'Spark', 'Fee': 22000, 'Duration': '30days', 'Discount': 1000.0, 'Language': 'Scala'}, 
 {'Courses': 'PySpark', 'Fee': 25000, 'Duration': '50days', 'Discount': 2300.0, 'Language': 'Python'}, 
 {'Courses': 'Hadoop', 'Fee': 23000, 'Duration': '55days'}]

JavaScript
 
final_result = [{k:v for k, v in result.items() if pd.notna(v)} 
                for result in df.to_dict('records')]
print(final_result)
[{'Courses': 'Spark', 'Fee': 22000, 'Duration': '30days', 'Discount': 1000.0, 'Language': 'Scala'}, 
 {'Courses': 'PySpark', 'Fee': 25000, 'Duration': '50days', 'Discount': 2300.0, 'Language': 'Python'}, 
 {'Courses': 'Hadoop', 'Fee': 23000, 'Duration': '55days'}]
 
​

Advertisement

Answer