I have this pandas dataframe.
technologies = [ ("Spark", 22000,'30days',1000.0, 'Scala'), ("PySpark",25000,'50days',2300.0, 'Python'), ("Hadoop",23000,'55days',np.nan,np.nan) ] df = pd.DataFrame(technologies,columns = ['Courses','Fee','Duration','Discount', 'Language']) print(df) Courses Fee Duration Discount Language 0 Spark 22000 30days 1000.0 Scala 1 PySpark 25000 50days 2300.0 Python 2 Hadoop 23000 55days NaN NaN
I am interested to convert every row into a dict.
def convert_to_dict(row) -> dict: result = dict(row) final_result = {k:v for k, v in result.items() if v is not np.nan} print(final_result)
So i use the above function and this trick
df.apply(lambda row: convert_to_dict(row), axis=1)
But the result i get is weird.
{'Courses': 'Spark', 'Fee': 22000, 'Duration': '30days', 'Discount': 1000.0, 'Language': 'Scala'} {'Courses': 'PySpark', 'Fee': 25000, 'Duration': '50days', 'Discount': 2300.0, 'Language': 'Python'} {'Courses': 'Hadoop', 'Fee': 23000, 'Duration': '55days', 'Discount': nan}
The last row had Language and Discount both as Nan.
And i expected that both should have been filtered out but i see only Language is filtered out.
How do i filter out all columns from the final result which are nan to filter out please ?
Advertisement
Answer
Use notna
for filtering missing values:
final_result = {k:v for k, v in result.items() if pd.notna(v)}
final_result = [{k:v for k, v in result.items() if pd.notna(v)} for result in df.to_dict('records')] print(final_result) [{'Courses': 'Spark', 'Fee': 22000, 'Duration': '30days', 'Discount': 1000.0, 'Language': 'Scala'}, {'Courses': 'PySpark', 'Fee': 25000, 'Duration': '50days', 'Discount': 2300.0, 'Language': 'Python'}, {'Courses': 'Hadoop', 'Fee': 23000, 'Duration': '55days'}]