Skip to content
Advertisement

Create a separate logger for each process when using concurrent.futures.ProcessPoolExecutor in Python

I am cleaning up a massive CSV data dump. I was able to split the single large file into smaller ones using gawk initially using a unix SE Query as a following flow:

JavaScript

I have about 12 split csv files that are created using the above mentioned flow and each with ~170K lines in them.

I am using python3.7.7 on a Windows 10 machine.

Code

JavaScript

Requirements

I wish to set a logging logger with the name f_name.log within each process spawned by the ProcessPoolExecutor and want to store the logs with the respective parsed file name. I am not sure if I should use something like:

JavaScript

or are there caveats for using logging modules in a multiprocessing environment?

Advertisement

Answer

Found out a simple way to achieve this task:

JavaScript

I called the create_log_handler within my convert_files(.....) function and then used logger.info and logger.error` accordingly.

by passing the logger as a parameter to convert_raw_data I was able to log even the erroneous data point in each of my csv file on each process.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement