I have two textfiles here. I want to count the number of occurrences of strings in “Textfile 1” among the string pairs given in “Textfile 2”.
Textfile_1:
1763_0M73 2610_0M63 7529_12M64 7529_18M64 0091_00M56
Textfile_2:
1763_0M73, 2610_0M63 2610_0M63, 7529_12M64 7529_18M64, 0091_00M56 0091_00M56, 7529_12M64 0267_12M64, 0091_00M56
Expected Output:
1763_0M73, 1 2610_0M63, 2 7529_12M64, 2 7529_18M64, 1 0091_00M56, 3
I tried the following script. But it is not giving the expected output.
with open('Textfile_2.txt') as f1: lookup = dict([x.strip() for x in line.split(',')] for line in f1) print(lookup) with open('Output.txt', 'w') as out: with open('Textfile_1.txt') as f2: for line in f2: k = line.strip() n = lookup[k] print(n)
Does anybody know how to do this in python? I’m quite new to python programming.
Advertisement
Answer
A few things not done correctly in your code. Here’s the list comprehension code.
#Step 1: Read the Textfile_1 and store them as dictionary values #strip out the n as you read through each record from the file #value of each element will be set to 0 with open('Textfile_1.txt','r') as f1: txt1 = {_.rstrip("n"):0 for _ in f1} #Step 2: Read the Textfile_2 and strip out the n. This will give you two values #Then split the values into a list. You will get [[str1,str2],[str3,str4]...] with open('Textfile_2.txt','r') as f2: txt2 = [z.rstrip("n").split(',') for z in f2] #Step 3: The strings in the list of lists may have leading or trailing spaces #as you iterate thru them, remove the leading/trailing spaces #then check for that value in the dictionary #if found, increment the value by 1 for i in [y.strip() for x in txt2 for y in x]: if i in txt1: txt1[i] += 1 #Step 4: print the final dictionary as it now containts the counts print (txt1) #Step 5: If you want to write this into a file, then use the below code #Open file in write mode. Iterate thru the dictionary using txt1.items() #for each key and value, write to file. Include n to have a newline with open('Textfile_3.txt','w') as f3: for k,v in txt1.items(): t3 = k + ', ' + str(v) + 'n' f3.write(t3)
The output of this is:
{'1763_0M73': 0, '2610_0M63': 0, '7529_12M64': 2, '7529_18M64': 1, '0091_00M56': 3}
The output to Textfile_3
will be:
1763_00M73, 1 2610_00M63, 2 7529_12M64, 2 7529_18M64, 1 0091_00M56, 3