Skip to content
Advertisement

Tag: large-data

How to efficiently filter a large python list?

I have a relatively large array called allListings and want to filter out all rows where allListings[:][14] == listingID. This is the code I am using: tempRows = list(filter(lambda x: x[14] == listingID, allListings)) The filtering is repeated in a for loop for all different listingID Profiling shows, that this line consumes 95% of the runtime in the loop. Is

Finding identical numbers in large files python

I have two data files in python, each containing two-column data as below: There are about 10M entries in each file (~400Mb). I have to sort through each file and check if any number in the first column of one file matches any number in the first column in another file. The code I currently have converted the files to

Shared memory in multiprocessing

I have three large lists. First contains bitarrays (module bitarray 0.8.0) and the other two contain arrays of integers. These data structures take quite a bit of RAM (~16GB total). If i start 12 sub-processes using: Does this mean that l1, l2 and l3 will be copied for each sub-process or will the sub-processes share these lists? Or to be

Advertisement