I have a .csv file containing x y data from transects (.csv file here). The file can contain a few dozen transects (example only 4).
I want to calculate the elevation change from each transect and then select the transect with the highest elevation change.
x y lines 0 3.444 1 0.009 3.445 1 0.180 3.449 1 0.027 3.449 1 ... 0 2.115 2 0.008 2.115 2 0.017 2.115 2 0.027 2.116 2
I’ve tried to calculate the change with pandas.dataframe.diff but I’m unable to select the highest elevation change from this.
UPDATE: I found a way to calculate the height difference for 1 transect. The goal is now to loop this script through the different other transects and let it select the transect with the highest difference. Not sure how to create a loop from this…
import numpy as np import matplotlib.pyplot as plt import pandas as pd from scipy.signal import savgol_filter, find_peaks, find_peaks_cwt from pandas import read_csv import csv df = pd.read_csv('transect4.csv', delimiter=',', header=None, names=['x', 'y', 'lines']) df_1 = df ['lines'] == 1 df1 = df[df_1] plt.plot(df1['x'], df1['y'], label='Original Topography') #apply a Savitzky-Golay filter smooth = savgol_filter(df1.y.values, window_length = 351, polyorder = 5) #find the maximums peaks_idx_max, _ = find_peaks(smooth, prominence = 0.01) #reciprocal, so mins will become max smooth_rec = 1/smooth #find the mins now peaks_idx_mins, _ = find_peaks(smooth_rec, prominence = 0.01) plt.xlabel('Distance') plt.ylabel('Height') plt.plot(df1['x'], smooth, label='Smoothed Topography') #plot them plt.scatter(df1.x.values[peaks_idx_max], smooth[peaks_idx_max], s = 55, c = 'green', label = 'Local Max Cusp') plt.scatter(df1.x.values[peaks_idx_mins], smooth[peaks_idx_mins], s = 55, c = 'black', label = 'Local Min Cusp') plt.legend(loc='upper left') plt.show() #Export to csv df['Cusp_max']=False df['Cusp_min']=False df.loc[df1.x[peaks_idx_max].index, 'Cusp_max']=True df.loc[df1.x[peaks_idx_mins].index, 'Cusp_min']=True data=df[df['Cusp_max'] | df['Cusp_min']] data.to_csv(r'Cusp_total.csv') #Calculate height difference my_data=pd.read_csv('Cusp_total.csv', delimiter=',', header=0, names=['ID', 'x', 'y', 'lines']) df_1 = df ['lines'] == 1 df1 = df[df_1] df1_diff=pd.DataFrame(my_data) df1_diff['Diff_Cusps']=df1_diff['y'].diff(-1) #Only use positive numbers for average df1_pos = df_diff[df_diff['Diff_Cusps'] > 0] print("Average Height Difference: ", (df1_pos['Diff_Cusps'].mean()), "m")
Ideally, the script would select the transect with the highest elevation change from an unknown number of transects in the .csv file, which will then be exported to a new .csv file.
Advertisement
Answer
You need to groupby
by column lines
.
Not sure if this is what you meant when you say elevation change but this gives difference of elevations (max(y) – min(y)) for each group, where groups are formed by all rows sharing same value of ‘line’each group representing one such value. This should help you with what you are missing in your logic, (sorry can’t put more time in).
frame = pd.read_csv('transect4.csv', header=None, names=['x', 'y', 'lines']) groups = frame.groupby('lines') groups['y'].max() - groups['y'].min() # Should give you max elevations of each group.