Reading files faster in python

Question

I'm writting a script to read a TXT file where each line is a Log entry and I need to separate this log in different files (for all Hor, Sia, Lmu). I'm reading each line and dividing in new files with no problem when using my test file (80kb), but when I try to apply to the actual file (177MB

Accepted Answer

The first thing that I spot is that you are opening the output files for each line.You could open them once and them process all the lines.The same is valid for the regex: you could compute it once before the for loop with re.compile()Here is an example:def process_log(input_file, output_files):    prog = re.compile(r"Crm|([A-Za-z0-9_]+)|]")    for i, line in enumerate(file):        if i > 2:           component = prog.match(line).group(1)           output_files[component].write('{}'.format(line))def open_outputs_files():     output_files = {}     components = ["Crm", "Hor", "Sia", "Lmu", "SiebelSeed"]     for component in components:         with open(f'HHR_Splitter/output/{component}.txt','w+', encoding="UTF-16") as new_file:             output_files[component] = new_file     return output_fileswith open(path, "r", encoding="UTF-16") as input_file:    output_files = open_outputs_files()    process_log(input_file, output_files)

Advertisement

Answer