I have a Dataset from the GPS log of my Google Account, from which I’d like to remove outliers from the CSV that clearly are not meant to be there.
For example, the GPS shows you are at 1,1 > 1,2 > 9,6 > 1,2 > 1,1
, so a major variation in location, that a couple seconds later is back to approx where it has been a few seconds back.
I have already tried filtering by velocity of the GPS, but that could remove GPS points that were made whilst flying. This also did not work for when the GPS was normal, then updated a little later and went up to 500km away, stayed there for 10 minutes and then corrected itself, because the moving Velocity would then be low enough to pass the “speed test”.
How would I detect these in a Dataset of around 430k rows? Something like traveling in a plane with very infrequent GPS updates would have to be taken care of as well.
Advertisement
Answer
I have settled on a Hybrid solution.
- Velocity limit: I used the distance function of the geopy module, to figure out the distance between two gps points. From the timestamp of the csv and the distance I then calculated the Velocity between these points and if it is over a certain threshhold which can be adjusted to your need, it will not write that point to the output CSV
Code
from geopy import distance d1 = distance.distance(coords_1, coords_2) d1 = float(str(d1)[:-3])*1000 # Convert to meters FMT = "%Y-%m-%d %H:%M:%S" #Formatting so it matches CSV Time = (datetime.strptime(cur_line["Time"], FMT) - datetime.strptime(pre_line["Time"], FMT)).total_seconds() Velocity = d1 / Time if Velocity < 800: # Set this to your needs # DO Stuff
- Law of Cosines: Calculating the Angle between 3 points and if the angle is too narrow, remove the point
Code:
from geopy import distance from trianglesolver import solve from math import degrees d1 = distance.distance(coords_1, coords_2) d2 = distance.distance(coords_2, coords_3) d3 = distance.distance(coords_3, coords_1) d1 = float(str(d1)[:-3])*1000 d2 = float(str(d2)[:-3])*1000 d3 = float(str(d3)[:-3])*1000 degTresh = 30.0 if d1 > 0.01 and d2 > 0.01 and d3 > 0.01: # if they are 0, there will be an error a,b,c,A,B,C = solve(a=d1, b=d2, c=d3) # Calculate the angles from the sides A,B,C = degrees(A), degrees(B), degrees(C) # Convert to math.degrees if (360.0 - degTresh) < C or C < degTresh: spike= True else: spike = False
These two methods combined worked fairly well and most of the times even remove small GPS spikes when standing still.