Skip to content
Advertisement

How to implement multiprocessing in my python script?

So the thing I’m trying to do is to have multiple cores process some stuff….

I have a py script which basically makes a feature vector for my ML and DL models. Now the thing I’m doing is that I take a txt file, I compare it with a list that I have initialized at runtime. So the code takes the first line in the txt and checks where that line is in the unique list and in that position, it does +1 in another list. Then that list is appended to another list which will then be written in a csv file, but that happens at the end of the execution. Now without threading, I was able to process 1 file at around 500ms to 1.5sec. When I pull up the activity monitor on my sys, I can see that at a time, any 1 core will be pinned to 100%. But when I do the multithreading stuff, all 10 cores would be sitting at around 10% to 20% only. And the execution isn’t that fast compared to my non-threading stuff.

I have 6464 files and my sys is either i5 8300H (4 cores) or i9 10900 (10 cores)

Here in find_files(), I basically made the script to find the files that I need. Some of the variables are initialized. I simply didn’t put it here to make the code complicated. Now at the end, you can see a thread helper function which basically is used to divide the calls to multiple threads.

JavaScript

Here you can see the thread helper takes the ‘count’ arg and divides the calls to specific threads to evenout the load. I’ve even tried doing thread.join() too. But same issue.

JavaScript

So in this function I basically take a file, open it, take the input lines one by one and see where it is in the unique list and I +1 the value in the feature vector list at that position corresponding to the unique list.

JavaScript

I’m not sure where I messed up. So can you guys helpz me divide the load to multiple CPU cores? Also please do correct me if I messed up terms while asking this question….

Advertisement

Answer

Because of python’s GIL, only one thread is executed at once, making threading effective only for I/O bounded applications, not CPU-bounded, like yours.

To parallelize your program, I recommend using multiprocessing module, which is quite similar to threading in its use but implements true parallelism.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement