Please, I have a dataframe that is listed in ascending order. My goal is to average similar numbers (numbers that are within 10% of each other in ‘both directions’) and concate their ‘Bell’ name together. For example, the image shows the input and output dataframe. I tried coding it but I stuck on how to progress.
JavaScript
x
5
1
def full_data_compare(self, df_full = pd.DataFrame()):
2
for k in range(df_full): #current rows
3
for j in range(df_full): #future rows
4
if int(df_full['Size'][k]) - int(df_full['Size'][k])*(1/10) <= int(df_full['Size'][j]) <= int(df_full['Size'][k]) + int(df_full['Size'][k])*(1/10) & int(df_full['Size'][k]) - int(df_full['Size'][k])*(1/10) <= int(df_full['Size'][j]) <= int(df_full['Size'][k]) + int(df_full['Size'][k])*(1/10):
5
Advertisement
Answer
Assuming you really want to check in both directions that the consecutive values are within 10%, you need to compute two Series with pct_change
. Then use it to groupby.agg
:
JavaScript
1
10
10
1
#df = df.sort_values(by='Size') for non-consecutive grouping
2
3
m1 = df['Size'].pct_change().abs().gt(0.1)
4
m2 = df['Size'].pct_change(-1).abs().shift().gt(0.1)
5
6
out = (df
7
.groupby((m1|m2).cumsum())
8
.agg({'Bell': ' '.join, 'Size': 'mean'})
9
)
10
NB. If you want to group non-consecutive values, you first need to sort them: sort_values(by='Size')
Output:
JavaScript
1
9
1
Bell Size
2
Size
3
0 A1 A2 1493.500000
4
1 A1 A2 A3 5191.333333
5
2 A1 A3 A2 35785.333333
6
3 A2 45968.000000
7
4 A1 78486.000000
8
5 A3 41205.000000
9