I have CSV files like following
ID Result1 Result2 Result3 0 1 1 0 1 0 1 0 2 0 0 0 3 0 0 0 4 1 0 0 5 0 0 0
I want to replace Result1, Result2, and Result3 columns by all zero if the sum of their column values is greater than 1. I tried this script
new = [] df1 = pd.read_csv(r"C:Userstest.csv") df= df1.loc[:, df1.columns != 'ID'] l = list(df.columns.values) for l in l: if sum(df[l]) > 1: df[l].loc[(df[l] > 0)] = 0 new.append(l) df2 = df[new]
When I do this operation Result3 column is dropped since it has all zero values. How do I do this operation only on the columns which satisfy the condition without affecting another column which didn’t satisfy the condition?
Advertisement
Answer
data = {'Result1': {0: 1, 1: 0, 2: 0, 3: 0, 4: 1, 5: 0}, 'Result2': {0: 1, 1: 1, 2: 0, 3: 0, 4: 0, 5: 0}, 'Result3': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0}} df = pd.DataFrame.from_dict(data) for col in df.columns: if df[col].sum() > 1: df[col] = 0 print(df)
gives
Result1 Result2 Result3 0 0 0 0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0
In your example dataframe, this makes all columns zero. To see that this is not always the case, put a 1 in the last column and verify that it survives the operation.