Is there any function to find out first ‘n‘ largest numbers and take an average of those two into a different column in pandas.
Note: Time or any non-numeric column to be ignored.
| time | n1 | n2 | n3 | n4 | average_largest_2 | 
|---|---|---|---|---|---|
| 11:50 | 1 | 2 | 3 | 4 | 3.5 | 
| 12:50 | 5 | 6 | 7 | 8 | 7.5 | 
| 13:50 | 8 | 7 | 6 | 5 | 7.5 | 
Use this code if need be:
import pandas as pd
import numpy as np
time = ['11:50', '12:50', '13:50']
data_1 = {'time': time,
          'n1': [1, 5, 8],
          'n2': [2, 6 ,7],
          'n3': [3, 7 ,6],
          'n4': [4, 8, 5],
        }
df1 = pd.DataFrame(data = data_1)
df1
Expected Result: Average_largest_2_column
Advertisement
Answer
You can use nlargest per row and get the mean:
df1['average_largest_2'] = (df1.select_dtypes('number')
                            .apply(lambda r: r.nlargest(2).mean(), axis=1)
                            )
Or using the underlying numpy array:
a = df1.select_dtypes('number').to_numpy()
df1['average_largest_2'] = np.sort(a)[:,-2:].mean(1)
Output:
time n1 n2 n3 n4 average_largest_2 0 11:50 1 2 3 4 3.5 1 12:50 5 6 7 8 7.5 2 13:50 8 7 6 5 7.5