I am trying to sort txt file which looks like that :
byr:1983 iyr:2017 pid:796082981 cid:129 eyr:2030 ecl:oth hgt:182cm iyr:2019 cid:314 eyr:2039 hcl:#cfa07d hgt:171cm ecl:#0180ce byr:2006 pid:8204115568 byr:1991 eyr:2022 hcl:#341e13 iyr:2016 pid:729933757 hgt:167cm ecl:gry hcl:231d64 cid:124 ecl:gmt eyr:2039 hgt:189in pid:#9c3ea1
and so on(+1000 lines), to that structure:
byr:value iyr:value eyr:value hgt:value hcl:value ecl:value pid:value cid:value byr:value iyr:value eyr:value hgt:value hcl:value ecl:value pid:value cid:value
byr, iyr etc. order doesn’t matter, but every “set” of key:value has to be seperated by blank line. My main problem, if I can call it that way, is to create piece of code that sorts the file properly when there is more than one key:value element, I managed to make some progress, but it is still not as it should be – the following code:
result_file = open('testresult.txt', 'w') #list_of_lines = [] testing purpose with open('input.txt', 'r') as f: for line in f: if line == "n": #list_of_lines.append('n') testing result_file.writelines('n') else: for i in line.split(' '): if i[-1] == "n": result_file.write(i) else: result_file.write(i + 'n') #print(i) testing purpose
is making result as below:
byr:1983 iyr:2017 pid:796082981 cid:129 eyr:2030 ecl:oth hgt:182cm iyr:2019 cid:314 eyr:2039 hcl:#cfa07d hgt:171cm ecl:#0180ce byr:2006 pid:8204115568 byr:1991 eyr:2022 hcl:#341e13 iyr:2016 pid:729933757 hgt:167cm ecl:gry
and as you can see it doesn’t work properly – for example there should be no blank line between first occurrence of byr and first occurrence of hgt and so on. It seemed to me that the last if statement
if i[-1] == "n": result_file.write(i) else: result_file.write(i + 'n')
is protecting me from such situation, but now I totally don’t get why isn’t it as I “predicted”. Please help. Thanks from advance <3
Advertisement
Answer
Try this
lines = [] with open("file.txt", "r") as f: lines = f.readlines() print(lines) splited_lines = [] for line in lines: [ splited_lines.append(splited) for splited in line.split(" ")] print("splitted_lines") print(splited_lines) # notice every occurence in splitted_lines has a 'n', # that might be causing your more then on newline problem, # lets remove that cleaned_lines = [] [cleaned_lines.append(splited.strip("n")) for splited in splited_lines] print("Removed /n") print(cleaned_lines) with open("output.txt", "w") as f: for line in cleaned_lines: f.write(line+"n")
Having this in file.txt :
byr:1983 iyr:2017 pid:796082981 cid:129 eyr:2030 ecl:oth hgt:182cm iyr:2019 cid:314 eyr:2039 hcl:#cfa07d hgt:171cm ecl:#0180ce byr:2006 pid:8204115568 byr:1991 eyr:2022 hcl:#341e13 iyr:2016 pid:729933757 hgt:167cm ecl:gry hcl:231d64 cid:124 ecl:gmt eyr:2039 hgt:189in pid:#9c3ea1
Running the above script gives me this in output.txt:
byr:1983 iyr:2017 pid:796082981 cid:129 eyr:2030 ecl:oth hgt:182cm iyr:2019 cid:314 eyr:2039 hcl:#cfa07d hgt:171cm ecl:#0180ce byr:2006 pid:8204115568 byr:1991 eyr:2022 hcl:#341e13 iyr:2016 pid:729933757 hgt:167cm ecl:gry hcl:231d64 cid:124 ecl:gmt eyr:2039 hgt:189in pid:#9c3ea1
Hope this is what you needed ?