I have two textfiles here. I want to count the number of occurrences of strings in “Textfile 1” among the string pairs given in “Textfile 2”.
Textfile_1:
1763_0M73 2610_0M63 7529_12M64 7529_18M64 0091_00M56
Textfile_2:
1763_0M73, 2610_0M63 2610_0M63, 7529_12M64 7529_18M64, 0091_00M56 0091_00M56, 7529_12M64 0267_12M64, 0091_00M56
Expected Output:
1763_0M73, 1 2610_0M63, 2 7529_12M64, 2 7529_18M64, 1 0091_00M56, 3
I tried the following script. But it is not giving the expected output.
with open('Textfile_2.txt') as f1:
lookup = dict([x.strip() for x in line.split(',')] for line in f1)
print(lookup)
with open('Output.txt', 'w') as out:
with open('Textfile_1.txt') as f2:
for line in f2:
k = line.strip()
n = lookup[k]
print(n)
Does anybody know how to do this in python? I’m quite new to python programming.
Advertisement
Answer
A few things not done correctly in your code. Here’s the list comprehension code.
#Step 1: Read the Textfile_1 and store them as dictionary values
#strip out the n as you read through each record from the file
#value of each element will be set to 0
with open('Textfile_1.txt','r') as f1:
txt1 = {_.rstrip("n"):0 for _ in f1}
#Step 2: Read the Textfile_2 and strip out the n. This will give you two values
#Then split the values into a list. You will get [[str1,str2],[str3,str4]...]
with open('Textfile_2.txt','r') as f2:
txt2 = [z.rstrip("n").split(',') for z in f2]
#Step 3: The strings in the list of lists may have leading or trailing spaces
#as you iterate thru them, remove the leading/trailing spaces
#then check for that value in the dictionary
#if found, increment the value by 1
for i in [y.strip() for x in txt2 for y in x]:
if i in txt1: txt1[i] += 1
#Step 4: print the final dictionary as it now containts the counts
print (txt1)
#Step 5: If you want to write this into a file, then use the below code
#Open file in write mode. Iterate thru the dictionary using txt1.items()
#for each key and value, write to file. Include n to have a newline
with open('Textfile_3.txt','w') as f3:
for k,v in txt1.items():
t3 = k + ', ' + str(v) + 'n'
f3.write(t3)
The output of this is:
{'1763_0M73': 0, '2610_0M63': 0, '7529_12M64': 2, '7529_18M64': 1, '0091_00M56': 3}
The output to Textfile_3 will be:
1763_00M73, 1 2610_00M63, 2 7529_12M64, 2 7529_18M64, 1 0091_00M56, 3