I’d like someone to help me plot the NaN percentage of pandas data frame. I calculated percentage using this code.
per_1 = df_1.isna().mean().round(4) * 100
It gave me this result.
HR 7.94 O2Sat 10.36 Temp 66.06 SBP 15.20 MAP 9.17 Age 0.00 Gender 0.00 ICULOS 0.00 SepsisLabel 0.00 Patient_iD 0.00
Now, I want to plot the percentage along with the column names of data frame. Can anyone help me?
Regards.
Updated: The graph looks like this. How to beautify this in order to see the column name clearly?
Also, is it possible to show the percentage on each bar like shown in this below graph?
Update: The only issue is with HR percentage:
Advertisement
Answer
You can plot a barplot using the following code snippet::
import matplotlib.pyplot as plt plt.bar(per_1.keys(), per_1.values) plt.show()
Sample output:
UPDATE:
As per your update to the question, here is a solution that retains only columns having percentage greater than zero. Also the plot has been beautified as requested with values displayed over each bar.
f, ax = plt.subplots() for i,item in enumerate(zip(per_1.keys(),per_1.values)): if (item[1] > 0): ax.bar(item[0], item[1], label = item[0]) ax.text(i - 0.25, item[1] + 1.5 , str(item[1])) ax.set_xticklabels([]) ax.set_xticks([]) plt.ylim(0,80) plt.ylabel('Percentage') plt.xlabel('Columns') plt.legend() plt.show()
Sample Output:
UPDATE 2:
To round the decimals to two decimal places, replace this line in the earlier code:
ax.text(i - 0.25, item[1] + 1.5 , str(np.round(item[1],2)))
You will need to import numpy if not already done:
import numpy as np