I have this pandas dataframe.
JavaScript
x
13
13
1
technologies = [
2
("Spark", 22000,'30days',1000.0, 'Scala'),
3
("PySpark",25000,'50days',2300.0, 'Python'),
4
("Hadoop",23000,'55days',np.nan,np.nan)
5
]
6
df = pd.DataFrame(technologies,columns = ['Courses','Fee','Duration','Discount', 'Language'])
7
print(df)
8
9
Courses Fee Duration Discount Language
10
0 Spark 22000 30days 1000.0 Scala
11
1 PySpark 25000 50days 2300.0 Python
12
2 Hadoop 23000 55days NaN NaN
13
I am interested to convert every row into a dict.
JavaScript
1
5
1
def convert_to_dict(row) -> dict:
2
result = dict(row)
3
final_result = {k:v for k, v in result.items() if v is not np.nan}
4
print(final_result)
5
So i use the above function and this trick
JavaScript
1
2
1
df.apply(lambda row: convert_to_dict(row), axis=1)
2
But the result i get is weird.
JavaScript
1
4
1
{'Courses': 'Spark', 'Fee': 22000, 'Duration': '30days', 'Discount': 1000.0, 'Language': 'Scala'}
2
{'Courses': 'PySpark', 'Fee': 25000, 'Duration': '50days', 'Discount': 2300.0, 'Language': 'Python'}
3
{'Courses': 'Hadoop', 'Fee': 23000, 'Duration': '55days', 'Discount': nan}
4
The last row had Language and Discount both as Nan.
And i expected that both should have been filtered out but i see only Language is filtered out.
How do i filter out all columns from the final result which are nan to filter out please ?
Advertisement
Answer
Use notna
for filtering missing values:
JavaScript
1
2
1
final_result = {k:v for k, v in result.items() if pd.notna(v)}
2
JavaScript
1
8
1
final_result = [{k:v for k, v in result.items() if pd.notna(v)}
2
for result in df.to_dict('records')]
3
print(final_result)
4
[{'Courses': 'Spark', 'Fee': 22000, 'Duration': '30days', 'Discount': 1000.0, 'Language': 'Scala'},
5
{'Courses': 'PySpark', 'Fee': 25000, 'Duration': '50days', 'Discount': 2300.0, 'Language': 'Python'},
6
{'Courses': 'Hadoop', 'Fee': 23000, 'Duration': '55days'}]
7
8