Skip to content
Advertisement

Take the average of nlargest columns ignoring the non-numeric columns

Is there any function to find out first ‘n‘ largest numbers and take an average of those two into a different column in pandas.

Note: Time or any non-numeric column to be ignored.

time n1 n2 n3 n4 average_largest_2
11:50 1 2 3 4 3.5
12:50 5 6 7 8 7.5
13:50 8 7 6 5 7.5

Use this code if need be:

import pandas as pd
import numpy as np

time = ['11:50', '12:50', '13:50']
data_1 = {'time': time,
          'n1': [1, 5, 8],
          'n2': [2, 6 ,7],
          'n3': [3, 7 ,6],
          'n4': [4, 8, 5],
        }

df1 = pd.DataFrame(data = data_1)
df1

Expected Result: Average_largest_2_column

Advertisement

Answer

You can use nlargest per row and get the mean:

df1['average_largest_2'] = (df1.select_dtypes('number')
                            .apply(lambda r: r.nlargest(2).mean(), axis=1)
                            )

Or using the underlying numpy array:

a = df1.select_dtypes('number').to_numpy()

df1['average_largest_2'] = np.sort(a)[:,-2:].mean(1)

Output:

    time  n1  n2  n3  n4  average_largest_2
0  11:50   1   2   3   4                3.5
1  12:50   5   6   7   8                7.5
2  13:50   8   7   6   5                7.5
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement