Let’s say I have following dataframe contains value
over time or date
:
JavaScript
x
6
1
import pandas as pd
2
3
df = pd.DataFrame(data={'date':['2020-10-16','2020-10-17','2020-10-18','2020-10-19','2020-10-20','2020-10-21','2020-10-22','2020-10-23','2020-10-24','2020-10-25','2020-10-26','2020-10-27','2020-10-28','2020-10-29','2020-10-30','2020-10-31','2020-11-01','2020-11-02','2020-11-03','2020-11-04','2020-11-05','2020-11-06','2020-11-07','2020-11-08','2020-11-09','2020-11-10','2020-11-11','2020-11-12','2020-11-13','2020-11-14','2020-11-15'],
4
'value':[161967, 161270, 148508, 152442, 157504, 157118, 155674, 134522, 213384, 163242, 217415, 221502, 146267, 143621, 145875, 139488, 104466, 94825, 143686, 151952, 161074, 161417, 135042, 148768, 131428, 127816, 151905, 180498, 177899, 193950, 12]})
5
df
6
I inspired from this answer to detect peaks and valleys via below code:
JavaScript
1
39
39
1
from scipy.signal import find_peaks
2
import numpy as np
3
import matplotlib.pyplot as plt
4
5
# Input signal
6
t = df.date
7
x = df.value
8
9
# Threshold value (for height of peaks and valleys)
10
thresh = 0.95
11
12
# Find indices of peaks
13
peak_idx, _ = find_peaks(x, height=thresh, distance=10)
14
15
# Find indices of valleys (from inverting the signal)
16
valley_idx, _ = find_peaks(-x, height=thresh, distance=10 )
17
18
# Plot signal
19
plt.figure(figsize=(14,12))
20
plt.plot(t, x , color='b', label='data')
21
plt.scatter(t, x, s=10,c='b',label='value')
22
23
# Plot threshold
24
plt.plot([min(t), max(t)], [thresh, thresh], '--', color='r', label='peaks-threshold')
25
plt.plot([min(t), max(t)], [-thresh, -thresh], '--', color='g', label='valleys-threshold')
26
27
# Plot peaks (red) and valleys (blue)
28
plt.plot(t[peak_idx], x[peak_idx], "x", color='r', label='peaks')
29
plt.plot(t[valley_idx], x[valley_idx], "x", color='g', label='valleys')
30
31
plt.xticks(rotation=45)
32
plt.ylabel('value')
33
plt.xlabel('timestamp')
34
plt.title(f'data over time for username=target')
35
plt.legend( loc='upper left')
36
plt.gcf().autofmt_xdate()
37
plt.show()
38
plt.show()
39
This is the output:
The problems:
- I can’t figure out how I can configure
find_peaks()
documentation to reach meaningful/drastic peaks & valley with respect to threshold as global outliers. I also checked this post but couldn’t help me to find the cheap solution as well as other libraries offered here. - The upper threshold with red dashed is missing!
Advertisement
Answer
- You need to specify height in the same domain as your data
- Upper thresohld is not missing, it is on the plot, just all those lines are close to 0 and clutter on the bottom.
JavaScript
1
31
31
1
thresh_top = np.median(x) + 1 * np.std(x)
2
thresh_bottom = np.median(x) - 1 * np.std(x)
3
# (you may want to use std calculated on 10-90 percentile data, without outliers)
4
5
# Find indices of peaks
6
peak_idx, _ = find_peaks(x, height=thresh_top)
7
8
# Find indices of valleys (from inverting the signal)
9
valley_idx, _ = find_peaks(-x, height=-thresh_bottom)
10
11
# Plot signal
12
plt.figure(figsize=(14,12))
13
plt.plot(t, x , color='b', label='data')
14
plt.scatter(t, x, s=10,c='b',label='value')
15
16
# Plot threshold
17
plt.plot([min(t), max(t)], [thresh_top, thresh_top], '--', color='r', label='peaks-threshold')
18
plt.plot([min(t), max(t)], [thresh_bottom, thresh_bottom], '--', color='g', label='valleys-threshold')
19
20
# Plot peaks (red) and valleys (blue)
21
plt.plot(t[peak_idx], x[peak_idx], "x", color='r', label='peaks')
22
plt.plot(t[valley_idx], x[valley_idx], "x", color='g', label='valleys')
23
24
plt.xticks(rotation=45)
25
plt.ylabel('value')
26
plt.xlabel('timestamp')
27
plt.title(f'data over time for username=target')
28
plt.legend( loc='upper left')
29
plt.gcf().autofmt_xdate()
30
plt.show()
31