Skip to content
Advertisement

Python sorted not sorting Counter output properly

I am trying to sort out min page load time and max page load time from Apache access log file. After parsing the log file and sorting using sorted I am seeing weird sorting.

#!/usr/bin/python3

from collections import Counter 
import re
import sys

logfile = sys.argv[1]

def abcd(match):
    clean_log = []

    for line in open(logfile):
        try:
            if re.findall(match, line):
                clean_log.append(re.findall(match, line))
        except ValueError:
            pass
    return(clean_log)


serve_time = "d+$"

print(sorted(Counter(map(tuple, abcd(serve_time))).most_common(), key = lambda i: (i[0])))

The above code sorting thousandths and only then sorting hundreds:

$ ./log-parser.py access.log
[(('1660',), 1), (('1971',), 1), (('2020',), 1), (('2358',), 1), (('2384',), 1), (('2523',), 1), (('2976',), 1), (('3939',), 1), (('455',), 1), (('677',), 1)]

As you see 455 and 677 are at the end, but if you look separately at thousandths and hundreds sorting is going correctly.

Can someone shed light into this please?

BTW, if I don’t use map to tuple I am getting “TypeError: unhashable type: ‘list'” error for “Counter”, thus need to work with tuple. Sorting using method below is the same story:

    print(sorted(abcd(serve_time)))

[['1660'], ['1971'], ['2020'], ['2358'], ['2384'], ['2523'], ['2976'], ['3939'], ['455'], ['677']]

Advertisement

Answer

it’s sorting by string and not by number

and ‘3’ > ‘2’

if you want to sort by number change your lambda to:

key=lambda i: int(i[0])
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement