So i’m trying to count the most repeated values in an text file. By using the Counter
method it returns exaclty what im looking for
file.txt
12334
99965
99965
44144
99965
00000
44144
script.py
pArray=[]
with open("file.txt") as my_file:
for line in my_file:
pArray.append((line.split('n'))[0])
dictn = Counter(pArray)
print(dictn)
for key, value in dictn.items():
print("KEY",key)
print("VALUE",value)
print(dictn)
OUTPUT
Counter({'99965': 3, '44144': 2, '12334': 1, '00000': 1})
KEY 12334
VALUE 1
KEY 99965
VALUE 3
KEY 44144
VALUE 2
KEY 00000
VALUE 1
['12334', '99965', '44144', '00000']
But as you can see the output of the final array is not in the same order as the dictionary
(value
should be in descending order)
I am expecting an output like
['99965', '44144', '12334', '00000']
I also tried list(dictn.keys())
but i got the same output :/
Why is the order changing and how can I fix it?
Advertisement
Answer
From the docs, we see that Counter
objects: are “unordered collections” – much like dictionaries
, (in-fact they are a sub-class). So this means that iterating over .items()
won’t give the elements in order of size.
However, we can simply use .most_common
which returns a list of tuples – each containing an element and its count. The most important thing being that it is in order.
So all we need to do is use a list-comprehension
to extract the first element of each tuple in the list returned. That can be done with:
[t[0] for t in dictn.most_common()]
which gave:
['99965', '44144', '12334', '00000']
but could also give the following as the counts for '12334'
and '00000'
are the same. This is unavoidable due to the nature of how dictionaries (and Counters) work. But if this is important, just let me know and I can update the answer.
['99965', '44144', '00000', '12334']
Note that not all of your code needs to be inside the with
statement, once you have created pArray
, you can exit the with
statement. Also, basic Python uses lists
, not arrays
!