Create a separate logger for each process when using concurrent.futures.ProcessPoolExecutor in Python

Question

I am cleaning up a massive CSV data dump. I was able to split the single large file into smaller ones using gawk initially using a unix SE Query as a following flow: I have about 12 split csv files that are created using the above mentioned flow and each with ~170K lines in them. I am using python3.7.7 on

Accepted Answer

Found out a simple way to achieve this task:import loggingdef create_log_handler(fname):    logger = logging.getLogger(name=fname)    logger.setLevel(logging.ERROR)    fileHandler = logging.FileHandler(fname + ".log")    fileHandler.setLevel(logging.ERROR)    logger.addHandler(fileHandler)    formatter = logging.Formatter('%(name)s %(levelname)s: %(message)s')    fileHandler.setFormatter(formatter)    return loggerI called the create_log_handler within my convert_files(.....) function and then used logger.info and logger.error` accordingly.by passing the logger as a parameter to convert_raw_data I was able to log even the erroneous data point in each of my csv file on each process.

Create a separate logger for each process when using concurrent.futures.ProcessPoolExecutor in Python

Code

Requirements

Advertisement

Answer