Given a string, find the first non-repeating character in it and return its index. If it doesn’t exist, return -1.
first_unique('leetcode') # 0 first_unique('loveleetcode') # 2
I came up with the following solution. How can I make it more efficient for very long input strings?
def first_unique(self, s): if s == '': return -1 for item in s: if s.count(item) == 1: return s.index(item) break return -1
Advertisement
Answer
Your version isn’t bad for few cases with “nice” strings… but using count is quite expensive for long “bad” strings, I’d suggest you cache items, for instance:
def f1(s): if s == '': return -1 for item in s: if s.count(item) == 1: return s.index(item) break return -1 def f2(s): cache = set() if s == '': return -1 for item in s: if item not in cache: if s.count(item) == 1: return s.index(item) else: cache.add(item) return -1 import timeit import random import string random.seed(1) K, N = 500, 100000 data = ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(K)) print( timeit.timeit('f1(data)', setup='from __main__ import f1, data', number=N)) print( timeit.timeit('f2(data)', setup='from __main__ import f2, data', number=N))
The results on my laptop are:
32.05926330029437 4.267771588590406
The version using cache gives you 8x speed up factor vs yours wich is using count function all the time. So, my general advice would be… cache as much as possible whether it’s possible
EDIT:
I’ve added Patrick Haugh version to the benchmark and it gave 10.92784585620725
EDIT2:
I’ve added Mehmet Furkan Demirel version to the benchmark and it gave 10.325440507549331
EDIT3:
I’ve added wim version to the benchmark and it gave 12.47985351744839
CONCLUSION:
I’d use the version i’ve proposed initially using a simple cache without relying on Python counter modules, it’s not necessary (in terms of performance)