So i’m trying to count the most repeated values in an text file. By using the Counter
method it returns exaclty what im looking for
file.txt
12334 99965 99965 44144 99965 00000 44144
script.py
pArray=[] with open("file.txt") as my_file: for line in my_file: pArray.append((line.split('n'))[0]) dictn = Counter(pArray) print(dictn) for key, value in dictn.items(): print("KEY",key) print("VALUE",value) print(dictn)
OUTPUT
Counter({'99965': 3, '44144': 2, '12334': 1, '00000': 1}) KEY 12334 VALUE 1 KEY 99965 VALUE 3 KEY 44144 VALUE 2 KEY 00000 VALUE 1 ['12334', '99965', '44144', '00000']
But as you can see the output of the final array is not in the same order as the dictionary
(value
should be in descending order)
I am expecting an output like
['99965', '44144', '12334', '00000']
I also tried list(dictn.keys())
but i got the same output :/
Why is the order changing and how can I fix it?
Advertisement
Answer
From the docs, we see that Counter
objects: are “unordered collections” – much like dictionaries
, (in-fact they are a sub-class). So this means that iterating over .items()
won’t give the elements in order of size.
However, we can simply use .most_common
which returns a list of tuples – each containing an element and its count. The most important thing being that it is in order.
So all we need to do is use a list-comprehension
to extract the first element of each tuple in the list returned. That can be done with:
[t[0] for t in dictn.most_common()]
which gave:
['99965', '44144', '12334', '00000']
but could also give the following as the counts for '12334'
and '00000'
are the same. This is unavoidable due to the nature of how dictionaries (and Counters) work. But if this is important, just let me know and I can update the answer.
['99965', '44144', '00000', '12334']
Note that not all of your code needs to be inside the with
statement, once you have created pArray
, you can exit the with
statement. Also, basic Python uses lists
, not arrays
!