merge & write two jsonl (json lines) files into a new jsonl file in python3.6

Question

Hello I have two jsonl files like so: one.jsonl second.jsonl And my goal is to write a new jsonl file (with encoding preserved) name merged_file.jsonl which will look like this: My approach is like this: However I am met with this error: TypeError: Object of type generator is not JSON serializable I will apprecite your hint/help in any ways. Thank

Accepted Answer

It is possible that extract_json returns a generator instead of a list/dict which is json serializablesince it is jsonl, which means each line is a valid jsonso you just need to tweak your existing code a little bit.import jsonimport globresult = []for f in glob.glob("folder_with_all_jsonl/*.jsonl"):    with open(f, 'r', encoding='utf-8-sig') as infile:        for line in infile.readlines():            try:                result.append(json.loads(line)) # read each line of the file            except ValueError:                print(f)# This would output jsonlwith open('merged_file.jsonl','w', encoding= 'utf-8-sig') as outfile:    #json.dump(result, outfile)    #write each line as a json    outfile.write("n".join(map(json.dumps, result)))Now that I think about it you didn&#8217;t even have to load it using json, except it will help you sanitize any badly formatted JSON lines is allyou could collect all the lines in one shot like thisoutfile = open('merged_file.jsonl','w', encoding= 'utf-8-sig')for f in glob.glob("folder_with_all_jsonl/*.jsonl"):    with open(f, 'r', encoding='utf-8-sig') as infile:        for line in infile.readlines():            outfile.write(line)outfile.close()

Advertisement

Answer