search for common value(patient ID) from column 1 and if all there values in other column (pathologies) is null delete the rows of these common ID’s

Question

PATIENT_ID PATHOLOGIES 12 null 12 null 3 patho1 3 null 5 patho2 2 patho1 12 null If you can see, patient ID 12 is always null but others can be null or has pathologies if the same ID is always null, I want to delete it with the related rows in all columns note: I have 2 million ID, so

Accepted Answer

To remove all patients with only "null" variables you can use this example:import csvfrom itertools import groupbywith open("input.csv", "r") as f_in:    reader = csv.reader(f_in)    next(reader)  # skip header    out = []    for id_, g in groupby(sorted(reader), lambda k: k[0]):        g = list(g)        if all(pathology == "null" for _, pathology in g):            continue        out.extend(g)with open("output.csv", "w") as f_out:    writer = csv.writer(f_out)    writer.writerow(["PATIENT_ID", "PATHOLOGIES"])    writer.writerows(out)This creates output.csv:PATIENT_IDPATHOLOGIES2patho13null3patho15patho2

Advertisement

Answer