Is there any function to find out first ‘n‘ largest numbers and take an average of those two into a different column in pandas.
Note: Time or any non-numeric column to be ignored.
time |
n1 | n2 | n3 | n4 | average_largest_2 |
---|---|---|---|---|---|
11:50 | 1 | 2 | 3 |
4 |
3.5 |
12:50 | 5 | 6 | 7 |
8 |
7.5 |
13:50 | 8 |
7 |
6 | 5 | 7.5 |
Use this code if need be:
import pandas as pd import numpy as np time = ['11:50', '12:50', '13:50'] data_1 = {'time': time, 'n1': [1, 5, 8], 'n2': [2, 6 ,7], 'n3': [3, 7 ,6], 'n4': [4, 8, 5], } df1 = pd.DataFrame(data = data_1) df1
Expected Result: Average_largest_2_column
Advertisement
Answer
You can use nlargest
per row and get the mean:
df1['average_largest_2'] = (df1.select_dtypes('number') .apply(lambda r: r.nlargest(2).mean(), axis=1) )
Or using the underlying numpy array:
a = df1.select_dtypes('number').to_numpy() df1['average_largest_2'] = np.sort(a)[:,-2:].mean(1)
Output:
time n1 n2 n3 n4 average_largest_2 0 11:50 1 2 3 4 3.5 1 12:50 5 6 7 8 7.5 2 13:50 8 7 6 5 7.5