So, I have file which looks like this
hannah123@gmail.com:h1annah!! - Number of visits: 132 - True - False - True john123@gmail.com:joh22nny!! - Number of visits: 14814 - True - False - False kingler123@gmail.com:gin55er.! - Number of visits: 15 - True - False - False
My objective is to order it like this
john123@gmail.com:joh22nny!! - Number of visits: 14814 - True - False - False hannah123@gmail.com:h1annah!! - Number of visits: 132 - True - False - True kingler123@gmail.com:gin55er.! - Number of visits: 15 - True - False - False
So it would order the number of visits from higher to lower.
I’ve found a solution, which looks like this.
with open('file.txt') as f, open('file2.txt', 'w') as f2: f2.writelines(sorted(f.readlines(), key=lambda s: int(s.rsplit(' ')[-1].strip()), reverse=True))
Though, this would only work if there’s a integer on the last character.
So it won’t work with the files I need it too.
My problem is on getting the numerical values from the number of visits and ordering them into ascending order, without removing anything from the file.
Sorry if this is wordy, I dont speak english.
Advertisement
Answer
This solution uses the re
module.
The regex pattern I used of r"Number of visits: (d*) -"
is actually larger than it needs to be and could be reduced to r": (d*) -"
, but I wanted it to be clear and explicit which digits it should be capturing.
If you aren’t familiar with re
/Regular Expressions, the parentheses indicate that whatever matches the pattern inside of them should be captured separately from the matching string. d*
means to capture any number of consecutive digits.
Each line and the extracted value are then put into a tuple and stored in the list data
. I chose to convert the value to an int() at this time but it could also be done as part of the sort lambda function instead.
import re infile = "test.txt" outfile = "output.txt" data = [] # Read from the input file and use regex. with open(infile, 'r') as fp: while True: line = fp.readline() if not line: break # Use re.search to capture the digits we want. match = re.search(r"Number of visits: (d*) -", line) # Save data array of tuples with (line, integer) data.append((line, int(match.group(1)))) # Sort the list of tuples by the integers. data.sort(key=lambda e: e[1], reverse=True) # Write just the lines to the output file. with open(outfile, 'w') as fp: for line in data: fp.write(line[0])