I have a text file that has a long 2D array as follows:
JavaScript
x
2
1
[[1, 2], [5,585], [2, 0], [1, 500], [2, 668], [3, 54], [4, 28], [3, 28], [4,163], [3,85], [5,906], [2,5000], [6,358], [4,69], [3,89], [7, 258],[5, 632], [7, 585] .. [6, 47]]
2
The first element of each has numbers between 1 to 7. I want to read information of second element for all and find the maximum and minimum amount for each group between 1 to 7 separately. For example output like this:
JavaScript
1
10
10
1
Mix for first element with 1: 500
2
Max for first element with 1: 2
3
average: 251
4
5
Min for row with 2: 0
6
Max for row with 2: 5000
7
average: 2500
8
9
and so on
10
What is the most efficient way of getting min, max, and average values by grouping based on the first element of the array?
JavaScript
1
8
1
file = open("myfile.txt", "r")
2
list_of_lists = file.read()
3
4
unique_values = set([list[1] for list in list_of_lists])
5
group_list = [[list[0] for list in list_of_lists if list[1] == value] for value in unique_values]
6
7
print(group_list)
8
Advertisement
Answer
We can use pandas
for this:
JavaScript
1
15
15
1
import numpy as np
2
import pandas as pd
3
4
file_data = [[1, 2], [5,585], [2, 0], [1, 500], [2, 668], [3, 54], [4, 28], [3, 28], [4,163], [3,85], [5,906], [2,5000], [6,358], [4,69], [3,89], [7, 258],[5, 632], [7, 585], [6, 47]]
5
6
file_data = np.array(file_data)
7
8
df = pd.DataFrame(data = {'num': file_data[:, 0], 'data': file_data[:, 1]})
9
10
for i in np.sort(df['num'].unique()):
11
print('Min for', i, ':', df.loc[df['num'] == i, 'data'].min())
12
print('Max for', i, ':', df.loc[df['num'] == i, 'data'].max())
13
temp_df = df.loc[df['num'] == i, 'data']
14
print("Average for", i, ":", temp_df.sum()/len(temp_df.index))
15
This gives us:
JavaScript
1
22
22
1
Min for 1 : 2
2
Max for 1 : 500
3
Average for 1 : 251.0
4
Min for 2 : 0
5
Max for 2 : 5000
6
Average for 2 : 1889.3333333333333
7
Min for 3 : 28
8
Max for 3 : 89
9
Average for 3 : 64.0
10
Min for 4 : 28
11
Max for 4 : 163
12
Average for 4 : 86.66666666666667
13
Min for 5 : 585
14
Max for 5 : 906
15
Average for 5 : 707.6666666666666
16
Min for 6 : 47
17
Max for 6 : 358
18
Average for 6 : 202.5
19
Min for 7 : 258
20
Max for 7 : 585
21
Average for 7 : 421.5
22