Skip to content
Advertisement

Analysing height difference from columns and selecting max difference in Python

I have a .csv file containing x y data from transects (.csv file here). The file can contain a few dozen transects (example only 4).

I want to calculate the elevation change from each transect and then select the transect with the highest elevation change.

x         y      lines
0       3.444      1
0.009   3.445      1
0.180   3.449      1
0.027   3.449      1
...
0       2.115      2
0.008   2.115      2
0.017   2.115      2
0.027   2.116      2 

I’ve tried to calculate the change with pandas.dataframe.diff but I’m unable to select the highest elevation change from this.

UPDATE: I found a way to calculate the height difference for 1 transect. The goal is now to loop this script through the different other transects and let it select the transect with the highest difference. Not sure how to create a loop from this…

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from scipy.signal import savgol_filter, find_peaks, find_peaks_cwt
from pandas import read_csv
import csv

df = pd.read_csv('transect4.csv', delimiter=',', header=None, names=['x', 'y', 'lines'])
df_1 = df ['lines'] == 1
df1 = df[df_1]

plt.plot(df1['x'], df1['y'], label='Original Topography')

#apply a Savitzky-Golay filter
smooth = savgol_filter(df1.y.values, window_length = 351, polyorder = 5)

#find the maximums
peaks_idx_max, _ = find_peaks(smooth, prominence = 0.01)

#reciprocal, so mins will become max
smooth_rec = 1/smooth

#find the mins now
peaks_idx_mins, _ = find_peaks(smooth_rec, prominence = 0.01)

plt.xlabel('Distance')
plt.ylabel('Height')

plt.plot(df1['x'], smooth, label='Smoothed Topography')

#plot them
plt.scatter(df1.x.values[peaks_idx_max], smooth[peaks_idx_max], s = 55,
            c = 'green', label = 'Local Max Cusp')
plt.scatter(df1.x.values[peaks_idx_mins], smooth[peaks_idx_mins], s = 55,
            c = 'black', label = 'Local Min Cusp')
plt.legend(loc='upper left')
plt.show()

#Export to csv
df['Cusp_max']=False
df['Cusp_min']=False
df.loc[df1.x[peaks_idx_max].index, 'Cusp_max']=True
df.loc[df1.x[peaks_idx_mins].index, 'Cusp_min']=True

data=df[df['Cusp_max'] | df['Cusp_min']]
data.to_csv(r'Cusp_total.csv')

#Calculate height difference
my_data=pd.read_csv('Cusp_total.csv', delimiter=',', header=0, names=['ID', 'x', 'y', 'lines'])
df_1 = df ['lines'] == 1
df1 = df[df_1]

df1_diff=pd.DataFrame(my_data)
df1_diff['Diff_Cusps']=df1_diff['y'].diff(-1)

#Only use positive numbers for average
df1_pos = df_diff[df_diff['Diff_Cusps'] > 0]
print("Average Height Difference: ", (df1_pos['Diff_Cusps'].mean()), "m")

Ideally, the script would select the transect with the highest elevation change from an unknown number of transects in the .csv file, which will then be exported to a new .csv file.

Advertisement

Answer

You need to groupby by column lines.

Not sure if this is what you meant when you say elevation change but this gives difference of elevations (max(y) – min(y)) for each group, where groups are formed by all rows sharing same value of ‘line’each group representing one such value. This should help you with what you are missing in your logic, (sorry can’t put more time in).

frame = pd.read_csv('transect4.csv', header=None, names=['x', 'y', 'lines'])
groups = frame.groupby('lines')
groups['y'].max() - groups['y'].min()
# Should give you max elevations of each group.
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement