First Unique Character in a String

Question

Given a string, find the first non-repeating character in it and return its index. If it doesn&#8217;t exist, return -1. I came up with the following solution. How can I make it more efficient for very long input strings? Answer Your version isn&#8217;t bad for few cases with &#8220;nice&#8221; strings&#8230;…

Accepted Answer

Your version isn&#8217;t bad for few cases with &#8220;nice&#8221; strings&#8230; but using count is quite expensive for long &#8220;bad&#8221; strings, I&#8217;d suggest you cache items, for instance:def f1(s):    if s == '':        return -1    for item in s:        if s.count(item) == 1:            return s.index(item)            break    return -1def f2(s):    cache = set()    if s == '':        return -1    for item in s:        if item not in cache:            if s.count(item) == 1:                return s.index(item)            else:                cache.add(item)    return -1import timeitimport randomimport stringrandom.seed(1)K, N = 500, 100000data = ''.join(random.choice(string.ascii_uppercase + string.digits)               for _ in range(K))print(    timeit.timeit('f1(data)', setup='from __main__ import f1, data', number=N))print(    timeit.timeit('f2(data)', setup='from __main__ import f2, data', number=N))The results on my laptop are:32.059263300294374.267771588590406The version using cache gives you 8x speed up factor vs yours wich is using count function all the time. So, my general advice would be&#8230; cache as much as possible whether it&#8217;s possibleEDIT:I&#8217;ve added Patrick Haugh version to the benchmark and it gave 10.92784585620725EDIT2:I&#8217;ve added Mehmet Furkan Demirel version to the benchmark and it gave 10.325440507549331EDIT3:I&#8217;ve added wim version to the benchmark and it gave 12.47985351744839CONCLUSION:I&#8217;d use the version i&#8217;ve proposed initially using a simple cache without relying on Python counter modules, it&#8217;s not necessary (in terms of performance)

Advertisement

Answer