Tag: large-data

How to efficiently filter a large python list?

I have a relatively large array called allListings and want to filter out all rows where allListings[:][14] == listingID. This is the code I am using: tempRows = list(filter(lambda x: x[14] == listingID, allListings)) The filtering is repeated in a for loop for all different listingID Profiling shows, that th…

how to do hyperparameter optimization in large data?

hyperparameters large-data machine-learning performance python

I almost finished my time series model, collected enough data and now I am stuck at hyperparameter optimization. And after lots of googling I found new & good library called ultraopt, but problem is that how much amount of fragment of data should I use from my total data (~150 GB) for hyperparameter tunin…

Finding identical numbers in large files python

large-data loops parsing python

I have two data files in python, each containing two-column data as below: There are about 10M entries in each file (~400Mb). I have to sort through each file and check if any number in the first column of one file matches any number in the first column in another file. The code I currently have converted the…

Writing large Pandas Dataframes to CSV file in chunks

dataframe export-to-csv large-data pandas python

How do I write out a large data files to a CSV file in chunks? I have a set of large data files (1M rows x 20 cols). However, only 5 or so columns of the data files are of interest to me. I want to make things easier by making copies of these files with only the columns of

Shared memory in multiprocessing

large-data multiprocessing python shared-memory

I have three large lists. First contains bitarrays (module bitarray 0.8.0) and the other two contain arrays of integers. These data structures take quite a bit of RAM (~16GB total). If i start 12 sub-processes using: Does this mean that l1, l2 and l3 will be copied for each sub-process or will the sub-process…