Skip to content
Advertisement

txt file sorting(key:value in every line) – a problem with ‘n’

I am trying to sort txt file which looks like that :

byr:1983 iyr:2017
pid:796082981 cid:129 eyr:2030
ecl:oth hgt:182cm

iyr:2019
cid:314
eyr:2039 hcl:#cfa07d hgt:171cm ecl:#0180ce byr:2006 pid:8204115568

byr:1991 eyr:2022 hcl:#341e13 iyr:2016 pid:729933757 hgt:167cm ecl:gry

hcl:231d64 cid:124 ecl:gmt eyr:2039
hgt:189in
pid:#9c3ea1

and so on(+1000 lines), to that structure:

byr:value
iyr:value
eyr:value
hgt:value
hcl:value
ecl:value
pid:value
cid:value

byr:value
iyr:value
eyr:value
hgt:value
hcl:value
ecl:value
pid:value
cid:value

byr, iyr etc. order doesn’t matter, but every “set” of key:value has to be seperated by blank line. My main problem, if I can call it that way, is to create piece of code that sorts the file properly when there is more than one key:value element, I managed to make some progress, but it is still not as it should be – the following code:

result_file = open('testresult.txt', 'w')
#list_of_lines = [] testing purpose


with open('input.txt', 'r') as f:
    for line in f:
        if line == "n":
            #list_of_lines.append('n') testing
            result_file.writelines('n')
        else:
            for i in line.split(' '):
                if i[-1] == "n":
                    result_file.write(i)
                else:
                    result_file.write(i + 'n')
                #print(i) testing purpose

is making result as below:

byr:1983
iyr:2017

pid:796082981
cid:129
eyr:2030

ecl:oth
hgt:182cm


iyr:2019

cid:314

eyr:2039
hcl:#cfa07d
hgt:171cm
ecl:#0180ce
byr:2006
pid:8204115568


byr:1991
eyr:2022
hcl:#341e13
iyr:2016
pid:729933757
hgt:167cm
ecl:gry

and as you can see it doesn’t work properly – for example there should be no blank line between first occurrence of byr and first occurrence of hgt and so on. It seemed to me that the last if statement

if i[-1] == "n":
    result_file.write(i)
else:
    result_file.write(i + 'n')

is protecting me from such situation, but now I totally don’t get why isn’t it as I “predicted”. Please help. Thanks from advance <3

Advertisement

Answer

Try this

lines = []
with open("file.txt", "r") as f:
    lines = f.readlines()

print(lines)

splited_lines = []

for line in lines:
    [ splited_lines.append(splited) for splited in line.split(" ")]

print("splitted_lines")
print(splited_lines)

# notice every occurence in splitted_lines has a 'n', 
# that might be causing your more then on newline problem,
# lets remove that

cleaned_lines = []

[cleaned_lines.append(splited.strip("n")) for splited in splited_lines]

print("Removed /n")
print(cleaned_lines)

with open("output.txt", "w") as f:
    for line in cleaned_lines:
        f.write(line+"n")

Having this in file.txt :

byr:1983 iyr:2017
pid:796082981 cid:129 eyr:2030
ecl:oth hgt:182cm

iyr:2019
cid:314
eyr:2039 hcl:#cfa07d hgt:171cm ecl:#0180ce byr:2006 pid:8204115568

byr:1991 eyr:2022 hcl:#341e13 iyr:2016 pid:729933757 hgt:167cm ecl:gry

hcl:231d64 cid:124 ecl:gmt eyr:2039
hgt:189in
pid:#9c3ea1

Running the above script gives me this in output.txt:

byr:1983
iyr:2017
pid:796082981
cid:129
eyr:2030
ecl:oth
hgt:182cm

iyr:2019
cid:314
eyr:2039
hcl:#cfa07d
hgt:171cm
ecl:#0180ce
byr:2006
pid:8204115568

byr:1991
eyr:2022
hcl:#341e13
iyr:2016
pid:729933757
hgt:167cm
ecl:gry

hcl:231d64
cid:124
ecl:gmt
eyr:2039
hgt:189in
pid:#9c3ea1

Hope this is what you needed ?

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement