Trying to filter ‘time’ data into ‘time_filtered’ based on lut_lst ranges, ergo if ‘time’ value falls in any of the ranges, exchange with NaN otherwise copy value into new column.
import numpy as np # creates look up list for ranges that need to be excluded lut_lst = [] for i in range(0,2235,15): a= range(i,2+i) b= range(14+i, 15+i) lut_lst.append(a) lut_lst.append(b) lut_lst [range(0, 2), range(14, 15), range(15, 17), range(29, 30), range(30, 32), range(44, 45), range(45, 47), range(59, 60), ... range(2190, 2192), range(2204, 2205), range(2205, 2207), range(2219, 2220), range(2220, 2222), range(2234, 2235)] ## if 'time' value falls in any of the ranges of lut_lst, replace values with NaN (drop row) data_cols = ['filename', 'time'] data_vals = [['cell1', 0.0186], ['cell1', 0.0774], ['cell1', 2.2852], ['cell1', 2.3788], ['cell1', 14.62], ['cell1', 15.04], ['cell2', 20.3416], ['cell2', 20.9128], ['cell2', 29.6784], ['cell2', 30.1194], ['cell2', 32.3304]] df = pd.DataFrame(data_vals, columns=data_cols) # trying to filter 'time' but can't get INTO the ranges df['time_filtered'] = df['time'].apply(lambda x: x if (x not in lut_lst) else np.nan)
The output for df is not filtered. I tried using any(lut_lst) or all(lut_lst) but that just threw an error.
df filename record time time_filtered 0 cell1 1 0.0186 0.0186 1 cell1 1 0.0774 0.0774 2 cell1 1 2.2852 2.2852 3 cell1 25 2.3788 2.3788 4 cell1 25 14.6200 14.6200 5 cell1 101 15.0400 15.0400 6 cell2 2 20.3416 20.3416 7 cell2 2 20.9128 20.9128 8 cell2 50 29.6784 29.6784 9 cell2 50 30.1194 30.1194 10 cell2 80 32.3304 32.3304
Advertisement
Answer
Use tuples instead of ranges in lut_lst, and change your filter slightly:
import numpy as np # creates look up list for ranges that need to be excluded lut_lst = [] for i in range(0,2235,15): a= i,2+i b= 14+i, 15+i lut_lst.append(a) lut_lst.append(b) ## if 'time' value falls in any of the ranges of lut_lst, replace values with NaN (drop row) data_cols = ['filename', 'time'] data_vals = [['cell1', 0.0186], ['cell1', 0.0774], ['cell1', 2.2852], ['cell1', 2.3788], ['cell1', 14.62], ['cell1', 15.04], ['cell2', 20.3416], ['cell2', 20.9128], ['cell2', 29.6784], ['cell2', 30.1194], ['cell2', 32.3304]] df = pd.DataFrame(data_vals, columns=data_cols) df['time_filtered'] = df['time'].apply(lambda x: x if not any([a < x < b for a,b in lut_lst]) else np.nan) df
Output:
filename time time_filtered 0 cell1 0.0186 NaN 1 cell1 0.0774 NaN 2 cell1 2.2852 2.2852 3 cell1 2.3788 2.3788 4 cell1 14.6200 NaN 5 cell1 15.0400 NaN 6 cell2 20.3416 20.3416 7 cell2 20.9128 20.9128 8 cell2 29.6784 NaN 9 cell2 30.1194 NaN 10 cell2 32.3304 32.3304