I wish to sort the below list first by the number, then by the text.
lst = ['b-3', 'a-2', 'c-4', 'd-2'] # result: # ['a-2', 'd-2', 'b-3', 'c-4']
Attempt 1
res = sorted(lst, key=lambda x: (int(x.split('-')[1]), x.split('-')[0]))
I was not happy with this since it required splitting a string twice, to extract the relevant components.
Attempt 2
I came up with the below solution. But I am hoping there is a more succinct solution via Pythonic lambda
statements.
def sorter_func(x): text, num = x.split('-') return int(num), text res = sorted(lst, key=sorter_func)
I looked at Understanding nested lambda function behaviour in python but couldn’t adapt this solution directly. Is there a more succinct way to rewrite the above code?
Advertisement
Answer
In almost all cases I would simply go with your second attempt. It’s readable and concise (I would prefer three simple lines over one complicated line every time!) – even though the function name could be more descriptive. But if you use it as local function that’s not going to matter much.
You also have to remember that Python uses a key
function, not a cmp
(compare) function. So to sort an iterable of length n
the key
function is called exactly n
times, but sorting generally does O(n * log(n))
comparisons. So whenever your key-function has an algorithmic complexity of O(1)
the key-function call overhead isn’t going to matter (much). That’s because:
O(n*log(n)) + O(n) == O(n*log(n))
There’s one exception and that’s the best case for Pythons sort
: In the best case the sort
only does O(n)
comparisons but that only happens if the iterable is already sorted (or almost sorted). If Python had a compare function (and in Python 2 there really was one) then the constant factors of the function would be much more significant because it would be called O(n * log(n))
times (called once for each comparison).
So don’t bother about being more concise or making it much faster (except when you can reduce the big-O without introducing too big constant factors – then you should go for it!), the first concern should be readability. So you should really not do any nested lambda
s or any other fancy constructs (except maybe as exercise).
Long story short, simply use your #2:
def sorter_func(x): text, num = x.split('-') return int(num), text res = sorted(lst, key=sorter_func)
By the way, it’s also the fastest of all proposed approaches (although the difference isn’t much):
Summary: It’s readable and fast!
Code to reproduce the benchmark. It requires simple_benchmark
to be installed for this to work (Disclaimer: It’s my own library) but there are probably equivalent frameworks to do this kind of task, but I’m just familiar with it:
# My specs: Windows 10, Python 3.6.6 (conda) import toolz import iteration_utilities as it def approach_jpp_1(lst): return sorted(lst, key=lambda x: (int(x.split('-')[1]), x.split('-')[0])) def approach_jpp_2(lst): def sorter_func(x): text, num = x.split('-') return int(num), text return sorted(lst, key=sorter_func) def jpp_nested_lambda(lst): return sorted(lst, key=lambda x: (lambda y: (int(y[1]), y[0]))(x.split('-'))) def toolz_compose(lst): return sorted(lst, key=toolz.compose(lambda x: (int(x[1]), x[0]), lambda x: x.split('-'))) def AshwiniChaudhary_list_comprehension(lst): return sorted(lst, key=lambda x: [(int(num), text) for text, num in [x.split('-')]]) def AshwiniChaudhary_next(lst): return sorted(lst, key=lambda x: next((int(num), text) for text, num in [x.split('-')])) def PaulCornelius(lst): return sorted(lst, key=lambda x: tuple(f(a) for f, a in zip((int, str), reversed(x.split('-'))))) def JeanFrançoisFabre(lst): return sorted(lst, key=lambda s : [x if i else int(x) for i,x in enumerate(reversed(s.split("-")))]) def iteration_utilities_chained(lst): return sorted(lst, key=it.chained(lambda x: x.split('-'), lambda x: (int(x[1]), x[0]))) from simple_benchmark import benchmark import random import string funcs = [ approach_jpp_1, approach_jpp_2, jpp_nested_lambda, toolz_compose, AshwiniChaudhary_list_comprehension, AshwiniChaudhary_next, PaulCornelius, JeanFrançoisFabre, iteration_utilities_chained ] arguments = {2**i: ['-'.join([random.choice(string.ascii_lowercase), str(random.randint(0, 2**(i-1)))]) for _ in range(2**i)] for i in range(3, 15)} b = benchmark(funcs, arguments, 'list size') %matplotlib notebook b.plot_difference_percentage(relative_to=approach_jpp_2)
I took the liberty to include a function composition approach of one of my own libraries iteration_utilities.chained
:
from iteration_utilities import chained sorted(lst, key=chained(lambda x: x.split('-'), lambda x: (int(x[1]), x[0])))
It’s quite fast (2nd or 3rd place) but still slower than using your own function.
Note that the key
overhead would be more significant if you used a function that had O(n)
(or better) algorithmic complexity, for example min
or max
. Then the constant factors of the key-function would be more significant!