Problem with plotting peaks using find_peaks from SciPy to detect drastic up/down turns or global outliers



Let’s say I have following dataframe contains value over time or date:

import pandas as pd

df = pd.DataFrame(data={'date':['2020-10-16','2020-10-17','2020-10-18','2020-10-19','2020-10-20','2020-10-21','2020-10-22','2020-10-23','2020-10-24','2020-10-25','2020-10-26','2020-10-27','2020-10-28','2020-10-29','2020-10-30','2020-10-31','2020-11-01','2020-11-02','2020-11-03','2020-11-04','2020-11-05','2020-11-06','2020-11-07','2020-11-08','2020-11-09','2020-11-10','2020-11-11','2020-11-12','2020-11-13','2020-11-14','2020-11-15'],
                        'value':[161967, 161270, 148508, 152442, 157504, 157118, 155674, 134522, 213384, 163242, 217415, 221502, 146267, 143621, 145875, 139488, 104466, 94825, 143686, 151952, 161074, 161417, 135042, 148768, 131428, 127816, 151905, 180498, 177899, 193950, 12]})
df

I inspired from this answer to detect peaks and valleys via below code:

from scipy.signal import find_peaks
import numpy as np
import matplotlib.pyplot as plt

# Input signal
t = df.date
x = df.value

# Threshold value (for height of peaks and valleys)
thresh = 0.95

# Find indices of peaks
peak_idx, _ = find_peaks(x, height=thresh, distance=10)

# Find indices of valleys (from inverting the signal)
valley_idx, _ = find_peaks(-x, height=thresh, distance=10 )

# Plot signal
plt.figure(figsize=(14,12))
plt.plot(t, x   , color='b', label='data')
plt.scatter(t, x, s=10,c='b',label='value')

# Plot threshold
plt.plot([min(t), max(t)], [thresh, thresh],   '--',  color='r', label='peaks-threshold')
plt.plot([min(t), max(t)], [-thresh, -thresh], '--',  color='g', label='valleys-threshold')

# Plot peaks (red) and valleys (blue)
plt.plot(t[peak_idx], x[peak_idx],     "x", color='r', label='peaks')
plt.plot(t[valley_idx], x[valley_idx], "x", color='g', label='valleys')

plt.xticks(rotation=45)
plt.ylabel('value')
plt.xlabel('timestamp')
plt.title(f'data over time for username=target')
plt.legend( loc='upper left')
plt.gcf().autofmt_xdate()
plt.show()
plt.show()

This is the output:

img

The problems:

  • I can’t figure out how I can configure find_peaks() documentation to reach meaningful/drastic peaks & valley with respect to threshold as global outliers. I also checked this post but couldn’t help me to find the cheap solution as well as other libraries offered here.
  • The upper threshold with red dashed is missing!

Answer

  1. You need to specify height in the same domain as your data
  2. Upper thresohld is not missing, it is on the plot, just all those lines are close to 0 and clutter on the bottom.
thresh_top = np.median(x) + 1 * np.std(x)
thresh_bottom = np.median(x) - 1 * np.std(x)
# (you may want to use std calculated on 10-90 percentile data, without outliers)

# Find indices of peaks
peak_idx, _ = find_peaks(x, height=thresh_top)

# Find indices of valleys (from inverting the signal)
valley_idx, _ = find_peaks(-x, height=-thresh_bottom)

# Plot signal
plt.figure(figsize=(14,12))
plt.plot(t, x   , color='b', label='data')
plt.scatter(t, x, s=10,c='b',label='value')

# Plot threshold
plt.plot([min(t), max(t)], [thresh_top, thresh_top],   '--',  color='r', label='peaks-threshold')
plt.plot([min(t), max(t)], [thresh_bottom, thresh_bottom], '--',  color='g', label='valleys-threshold')

# Plot peaks (red) and valleys (blue)
plt.plot(t[peak_idx], x[peak_idx],     "x", color='r', label='peaks')
plt.plot(t[valley_idx], x[valley_idx], "x", color='g', label='valleys')

plt.xticks(rotation=45)
plt.ylabel('value')
plt.xlabel('timestamp')
plt.title(f'data over time for username=target')
plt.legend( loc='upper left')
plt.gcf().autofmt_xdate()
plt.show()

enter image description here



Source: stackoverflow