In the dataset df
below. I want to flag the anomalies in all columns except A
, B
,C
and L
.
Any value less than 1500 or greater than 400000 is regarded as an anomaly.
JavaScript
x
25
25
1
import pandas as pd
2
3
# intialise data of lists
4
data = {
5
'A':['T1', 'T2', 'T3', 'T4', 'T5'],
6
'B':[1,1,1,1,1],
7
'C':[1,2,3,5,9],
8
'D':[12005, 18190, 1034, 15310, 31117],
9
'E':[11021, 19112, 19021, 12, 24509 ],
10
'F':[10022,19910, 19113,19999, 25519],
11
'G':[14029, 29100, 39022, 24509, 412262],
12
'H':[52119,32991,52883,69359,57835],
13
'J':[41218, 52991,55121,69152,79355],
14
'K': [43211,8199991,56881,212,77342],
15
'L': [1,0,1,0,0],
16
'M': [31211,42901,53818,62158,69325],
17
18
}
19
20
# Create DataFrame
21
df = pd.DataFrame(data)
22
23
# Print the output.
24
df
25
Attempt:
JavaScript
1
13
13
1
exclude_cols = ['A','B','C','L']
2
3
def flag_outliers(s, exclude_cols):
4
if s.name in exclude_cols:
5
return '' # or None, or whatever df.style() needs
6
else:
7
s = pd.to_numeric(s, errors='coerce')
8
indexes = (s<1500)|(s>400000)
9
return ['background-color: red' if v else '' for v in indexes]
10
11
df.style.apply(lambda s: flag_outliers(s, exclude_cols), axis=1)
12
13
Result of the code:
Desired output should look like this:
Thanks for the effort!
Advertisement
Answer
If you set the subset as the argument of the apply function, you will get what you want.
JavaScript
1
13
13
1
exclude_cols = ['A','B','C','L']
2
3
def flag_outliers(s, exclude_cols):
4
if s.name in exclude_cols:
5
print(s.name)
6
return '' # or None, or whatever df.style() needs
7
else:
8
s = pd.to_numeric(s, errors='coerce')
9
indexes = (s<1500)|(s>400000)
10
return ['background-color: yellow' if v else '' for v in indexes]
11
12
df.style.apply(lambda s: flag_outliers(s, exclude_cols), axis=1, subset=['D','E','F','G','H','J','K'])
13