Skip to content
Advertisement

How can I improve processing time with threads on Spyder?

I’m trying to change the date format of a column in a CSV. I know there are easier ways to do this, but the goal here is to get the threads working properly. I work with Spyder and Python 3.8. My code works as follows:

  • I create a thread class with a function to change the date format
  • I split my dataframe in several dataframes according to the number of threads
  • I assign to each thread a part of the dataframe
  • each thread changes the date formats in its dataframe
  • at the end, I concatenate all the dataframes into one

“serie” is my original dataframe. Here is my code:

JavaScript

It’s working, but I find the execution time quite long, for a total of 100000 values it takes me about 1min30 to process with no threads, but with 80 threads it takes me about 30 seconds, and with 200 or 400 threads I stagnate at 30 seconds. Is my code bad or am I limited by something?

Advertisement

Answer

Have you tried just letting Pandas do the work over the series?

JavaScript

On my Macbook, this processes a million entries in 5 seconds.

Another way to do the same (without date validation, though), is

JavaScript

which finishes the job in about 3.3 seconds.

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement