I have a text file that has a long 2D array as follows:
[[1, 2], [5,585], [2, 0], [1, 500], [2, 668], [3, 54], [4, 28], [3, 28], [4,163], [3,85], [5,906], [2,5000], [6,358], [4,69], [3,89], [7, 258],[5, 632], [7, 585] ..... [6, 47]]
The first element of each has numbers between 1 to 7. I want to read information of second element for all and find the maximum and minimum amount for each group between 1 to 7 separately. For example output like this:
Mix for first element with 1: 500 Max for first element with 1: 2 average: 251 Min for row with 2: 0 Max for row with 2: 5000 average: 2500 and so on
What is the most efficient way of getting min, max, and average values by grouping based on the first element of the array?
file = open("myfile.txt", "r") list_of_lists = file.read() unique_values = set([list[1] for list in list_of_lists]) group_list = [[list[0] for list in list_of_lists if list[1] == value] for value in unique_values] print(group_list)
Advertisement
Answer
We can use pandas
for this:
import numpy as np import pandas as pd file_data = [[1, 2], [5,585], [2, 0], [1, 500], [2, 668], [3, 54], [4, 28], [3, 28], [4,163], [3,85], [5,906], [2,5000], [6,358], [4,69], [3,89], [7, 258],[5, 632], [7, 585], [6, 47]] file_data = np.array(file_data) df = pd.DataFrame(data = {'num': file_data[:, 0], 'data': file_data[:, 1]}) for i in np.sort(df['num'].unique()): print('Min for', i, ':', df.loc[df['num'] == i, 'data'].min()) print('Max for', i, ':', df.loc[df['num'] == i, 'data'].max()) temp_df = df.loc[df['num'] == i, 'data'] print("Average for", i, ":", temp_df.sum()/len(temp_df.index))
This gives us:
Min for 1 : 2 Max for 1 : 500 Average for 1 : 251.0 Min for 2 : 0 Max for 2 : 5000 Average for 2 : 1889.3333333333333 Min for 3 : 28 Max for 3 : 89 Average for 3 : 64.0 Min for 4 : 28 Max for 4 : 163 Average for 4 : 86.66666666666667 Min for 5 : 585 Max for 5 : 906 Average for 5 : 707.6666666666666 Min for 6 : 47 Max for 6 : 358 Average for 6 : 202.5 Min for 7 : 258 Max for 7 : 585 Average for 7 : 421.5