Skip to content
Advertisement

Filter pandas column based on ranges in a huge list

Trying to filter ‘time’ data into ‘time_filtered’ based on lut_lst ranges, ergo if ‘time’ value falls in any of the ranges, exchange with NaN otherwise copy value into new column.

import numpy as np
# creates look up list for ranges that need to be excluded
lut_lst = []
for i in range(0,2235,15):
    a= range(i,2+i)
    b= range(14+i, 15+i)
    lut_lst.append(a)
    lut_lst.append(b)

lut_lst
[range(0, 2),
 range(14, 15),
 range(15, 17),
 range(29, 30),
 range(30, 32),
 range(44, 45),
 range(45, 47),
 range(59, 60),
...
 range(2190, 2192),
 range(2204, 2205),
 range(2205, 2207),
 range(2219, 2220),
 range(2220, 2222),
 range(2234, 2235)]


## if 'time' value falls in any of the ranges of lut_lst, replace values with NaN (drop row)
data_cols = ['filename', 'time']
data_vals = [['cell1', 0.0186],
        ['cell1', 0.0774],
        ['cell1', 2.2852],
        ['cell1', 2.3788],
        ['cell1', 14.62],
        ['cell1', 15.04],
        ['cell2', 20.3416],
        ['cell2', 20.9128],
        ['cell2', 29.6784],
        ['cell2', 30.1194],
        ['cell2', 32.3304]]
        
df = pd.DataFrame(data_vals, columns=data_cols)

# trying to filter 'time' but can't get INTO the ranges
df['time_filtered'] = df['time'].apply(lambda x: x if (x not in lut_lst) else np.nan)

The output for df is not filtered. I tried using any(lut_lst) or all(lut_lst) but that just threw an error.

df
filename    record  time    time_filtered
0   cell1   1   0.0186  0.0186
1   cell1   1   0.0774  0.0774
2   cell1   1   2.2852  2.2852
3   cell1   25  2.3788  2.3788
4   cell1   25  14.6200 14.6200
5   cell1   101 15.0400 15.0400
6   cell2   2   20.3416 20.3416
7   cell2   2   20.9128 20.9128
8   cell2   50  29.6784 29.6784
9   cell2   50  30.1194 30.1194
10  cell2   80  32.3304 32.3304

Advertisement

Answer

Use tuples instead of ranges in lut_lst, and change your filter slightly:

import numpy as np
# creates look up list for ranges that need to be excluded
lut_lst = []
for i in range(0,2235,15):
    a= i,2+i
    b= 14+i, 15+i
    lut_lst.append(a)
    lut_lst.append(b)

## if 'time' value falls in any of the ranges of lut_lst, replace values 
with NaN (drop row)
data_cols = ['filename', 'time']
data_vals = [['cell1', 0.0186],
    ['cell1', 0.0774],
    ['cell1', 2.2852],
    ['cell1', 2.3788],
    ['cell1', 14.62],
    ['cell1', 15.04],
    ['cell2', 20.3416],
    ['cell2', 20.9128],
    ['cell2', 29.6784],
    ['cell2', 30.1194],
    ['cell2', 32.3304]]
    
df = pd.DataFrame(data_vals, columns=data_cols)


df['time_filtered'] = df['time'].apply(lambda x: x if not any([a < x < b 
for a,b in lut_lst]) else np.nan)

df

Output:

    filename    time    time_filtered
0   cell1   0.0186  NaN
1   cell1   0.0774  NaN
2   cell1   2.2852  2.2852
3   cell1   2.3788  2.3788
4   cell1   14.6200 NaN
5   cell1   15.0400 NaN
6   cell2   20.3416 20.3416
7   cell2   20.9128 20.9128
8   cell2   29.6784 NaN
9   cell2   30.1194 NaN
10  cell2   32.3304 32.3304
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement