I have a text file that has a long 2D array as follows:
[[1, 2], [5,585], [2, 0], [1, 500], [2, 668], [3, 54], [4, 28], [3, 28], [4,163], [3,85], [5,906], [2,5000], [6,358], [4,69], [3,89], [7, 258],[5, 632], [7, 585] ..... [6, 47]]
The first element of each has numbers between 1 to 7. I want to read information of second element for all and find the maximum and minimum amount for each group between 1 to 7 separately. For example output like this:
Mix for first element with 1: 500 Max for first element with 1: 2 average: 251 Min for row with 2: 0 Max for row with 2: 5000 average: 2500 and so on
What is the most efficient way of getting min, max, and average values by grouping based on the first element of the array?
file = open("myfile.txt", "r")
list_of_lists = file.read()
unique_values = set([list[1] for list in list_of_lists])
group_list = [[list[0] for list in list_of_lists if list[1] == value] for value in unique_values]
print(group_list)
Advertisement
Answer
We can use pandas for this:
import numpy as np
import pandas as pd
file_data = [[1, 2], [5,585], [2, 0], [1, 500], [2, 668], [3, 54], [4, 28], [3, 28], [4,163], [3,85], [5,906], [2,5000], [6,358], [4,69], [3,89], [7, 258],[5, 632], [7, 585], [6, 47]]
file_data = np.array(file_data)
df = pd.DataFrame(data = {'num': file_data[:, 0], 'data': file_data[:, 1]})
for i in np.sort(df['num'].unique()):
print('Min for', i, ':', df.loc[df['num'] == i, 'data'].min())
print('Max for', i, ':', df.loc[df['num'] == i, 'data'].max())
temp_df = df.loc[df['num'] == i, 'data']
print("Average for", i, ":", temp_df.sum()/len(temp_df.index))
This gives us:
Min for 1 : 2 Max for 1 : 500 Average for 1 : 251.0 Min for 2 : 0 Max for 2 : 5000 Average for 2 : 1889.3333333333333 Min for 3 : 28 Max for 3 : 89 Average for 3 : 64.0 Min for 4 : 28 Max for 4 : 163 Average for 4 : 86.66666666666667 Min for 5 : 585 Max for 5 : 906 Average for 5 : 707.6666666666666 Min for 6 : 47 Max for 6 : 358 Average for 6 : 202.5 Min for 7 : 258 Max for 7 : 585 Average for 7 : 421.5