Skip to content
Advertisement

Pandas: Remove Column Based on Threshold Criteria

I have to solve this problem: Objective: Drops columns most of whose rows missing Inputs: 1. Dataframe df: Pandas dataframe 2. threshold: Determines which columns will be dropped. If threshold is .9, the columns with 90% missing value will be dropped Outputs: 1. Dataframe df with dropped columns (if no columns are dropped, you will return the same dataframe)

Excel Doc Screenshot

I’ve coded this:

JavaScript

I have to have “self, dr, and threshold” and cannot add more. The code must pass the test cases below:

JavaScript

When I run VT.drop_nan_col(df, 0.9).head(), I cannot change this line of code, I get :

JavaScript

If I change the shape to have 0 instead of 1, I don’t think this is correct for what I’m doing, I get:

JavaScript

Can anyone help me understand how I can fix this?

Advertisement

Answer

I think you need to change from

df = df.drop(i)

to

df = df.drop(i, axis=1)

So you account for columns instead of rows, which is the default option. See here for the same error https://stackoverflow.com/a/44931865/5184851

Also, to use .head() the function drop_nan_col(...) needs to return dataframe i.e df

Advertisement