I am trying to sort out min page load time and max page load time from Apache access log file. After parsing the log file and sorting using sorted I am seeing weird sorting.
#!/usr/bin/python3 from collections import Counter import re import sys logfile = sys.argv[1] def abcd(match): clean_log = [] for line in open(logfile): try: if re.findall(match, line): clean_log.append(re.findall(match, line)) except ValueError: pass return(clean_log) serve_time = "d+$" print(sorted(Counter(map(tuple, abcd(serve_time))).most_common(), key = lambda i: (i[0])))
The above code sorting thousandths and only then sorting hundreds:
$ ./log-parser.py access.log [(('1660',), 1), (('1971',), 1), (('2020',), 1), (('2358',), 1), (('2384',), 1), (('2523',), 1), (('2976',), 1), (('3939',), 1), (('455',), 1), (('677',), 1)]
As you see 455 and 677 are at the end, but if you look separately at thousandths and hundreds sorting is going correctly.
Can someone shed light into this please?
BTW, if I don’t use map to tuple I am getting “TypeError: unhashable type: ‘list'” error for “Counter”, thus need to work with tuple. Sorting using method below is the same story:
print(sorted(abcd(serve_time))) [['1660'], ['1971'], ['2020'], ['2358'], ['2384'], ['2523'], ['2976'], ['3939'], ['455'], ['677']]
Advertisement
Answer
it’s sorting by string and not by number
and ‘3’ > ‘2’
if you want to sort by number change your lambda to:
key=lambda i: int(i[0])