I have a class in python that contains a list, that can be updated by calling a function. This list has several associated lists, which are all updated whenever the list is updated. However, one of the lists has roughly 40,000 items, and so when it is updated and all the related lists are updated, and it takes a very long time! Instead of remaking all the lists associated with the new list every time I update the list, I want to get what has been added to the main list and what has been removed, and only update these specific parts of the list.
Below is some code that gets what items have been removed, where self.buildings
is the list that contains the old list, and globalvars[self.name]
gets the current list.
def updatelist(self): globalvars = globals() removed = [] for building in self.buildings: if building not in globalvars[self.name]: removed.append(building)
However, while this code functions, it takes a very long time for the list with 40,000 items, and I am certain that I could just iterate over the list and find which items have been removed without using in
, which has a time complexity that makes this unusable.
Finally, I want some code which can also identify what has been appended to the list. Something will never be inserted into the list at a random place, so there will only by new items appearing at the end which should make it easier to code.
A sample input would be:
self.buildings = [[3, 2, 3, 5], [0, 0, 1, 1], [1, 1, 5, 5], [6, 2, 3, 3]] globalvars[self.name] = [[0, 0, 1, 1], [1, 1, 5, 5], [8, 2, 6, 3], [6, 2, 7, 0]]
An ideal output would be:
removed = [[3, 2, 3, 5], [6, 2, 3, 3]] appended = [[8, 2, 6, 3], [6, 2, 7, 0]]
How would I achieve this in an optimised way?
Thanks in advance.
Advertisement
Answer
as i was discussing in the comments section, using sets
will speed up your code, but your entries have to be hashable, if the input is in fact list of lists, then converting it to a set of tuples would do the trick, otherwise if it’s a complicated class then you must define a __hash__
function for it.
the below code shows the use of sets for the content in the question.
from random import randint # construct the large random arrays large_list_size = 10000 small_list_size = 4 first_list = [] second_list = [] first_list.append([1,1,1,1]) second_list.append([1,1,1,1]) for i in range(large_list_size): small_list_instance = [1,1,1,1] while small_list_instance in first_list: small_list_instance = [] for j in range(small_list_size): small_list_instance.append(randint(0,10)) first_list.append(small_list_instance) small_list_instance = [1, 1, 1, 1] while small_list_instance in second_list: small_list_instance = [] for j in range(small_list_size): small_list_instance.append(randint(0, 10)) second_list.append(small_list_instance) # list implementation in question def updatelist(first,second): removed = [] for building in first: if building not in second: removed.append(building) return removed # sets implemenation def updatelist_sets(first,second): first_set = set([(*x,) for x in first]) second_set = set([(*x,) for x in second]) removed = first_set.difference(second_set) return removed # measure lists time import time retries = 10 t1 = time.time() for i in range(retries): a = updatelist(first_list,second_list) t2 = time.time() time_lists = (t2-t1)/retries print('using lists = ',time_lists) # measure sets time t1 = time.time() for i in range(retries): b = updatelist_sets(first_list,second_list) t2 = time.time() time_sets = (t2-t1)/retries print('using sets = ',time_sets) print('speedup = ',time_lists/time_sets) # convert b to be like a in structure b = [list(x) for x in b] # compare equality of both outputs if sorted(a) == sorted(b): print('both are equal')
you get
using lists = 1.0185745239257813 using sets = 0.004686903953552246 speedup = 217.32353255367963 both are equal
needless to say, most of the time is spent constructing the set of tuples, so if you change your original data structure to a set of hashable objects, then the code is going to be even faster.
Edit: fixed last comparison as @Kelly Bundy suggested.