I have a relatively large array called allListings and want to filter out all rows where allListings[:][14] == listingID. This is the code I am using: tempRows = list(filter(lambda x: x[14] == listingID, allListings)) The filtering is repeated in a for loop for all different listingID Profiling shows, that this line consumes 95% of the runtime in the loop. Is
Tag: large-data
how to do hyperparameter optimization in large data?
I almost finished my time series model, collected enough data and now I am stuck at hyperparameter optimization. And after lots of googling I found new & good library called ultraopt, but problem is that how much amount of fragment of data should I use from my total data (~150 GB) for hyperparameter tuning. And I want to try lots of
Finding identical numbers in large files python
I have two data files in python, each containing two-column data as below: There are about 10M entries in each file (~400Mb). I have to sort through each file and check if any number in the first column of one file matches any number in the first column in another file. The code I currently have converted the files to
Writing large Pandas Dataframes to CSV file in chunks
How do I write out a large data files to a CSV file in chunks? I have a set of large data files (1M rows x 20 cols). However, only 5 or so columns of the data files are of interest to me. I want to make things easier by making copies of these files with only the columns of
Shared memory in multiprocessing
I have three large lists. First contains bitarrays (module bitarray 0.8.0) and the other two contain arrays of integers. These data structures take quite a bit of RAM (~16GB total). If i start 12 sub-processes using: Does this mean that l1, l2 and l3 will be copied for each sub-process or will the sub-processes share these lists? Or to be