JavaScript
x
15
15
1
s="""Paragraph 1
2
some text blah blah
3
blah blah
4
UNWANTED TEXT
5
some text
6
Paragraph END
7
8
UNWNTED TEXT
9
10
Paragraph 2
11
some text blah blah
12
blah blah
13
UNWNTED TEXT
14
Paragraph END"""
15
Now python code to re.sub to replace UNWANTED TEXT only inside paragraphs keep UNWANTED TEXT Outside paragraphs
JavaScript
1
7
1
search_unwanted_only_inparagrap = re.findall('(?s)(?<=Paragraph)(.*?)(?=END)', text_file, flags = re.MULTILINE )
2
if search_unwanted_only_inparagrap:
3
replace_only_insidepara = re.sub(r"UNWANTED TEXT+", " ", text_file) #replace string substitue
4
print (replace_only_insidepara)
5
else:
6
print ("not found")
7
But the output replace all instance of UNWANTED TEXT in through out the file
JavaScript
1
15
15
1
Paragraph 1
2
some text blah blah
3
blah blah
4
5
some text
6
Paragraph END
7
8
9
10
Paragraph 2
11
some text blah blah
12
blah blah
13
14
Paragraph END
15
but i expect like this
JavaScript
1
15
15
1
Paragraph 1
2
some text blah blah
3
blah blah
4
5
some text
6
Paragraph END
7
8
UNWNTED TEXT
9
10
Paragraph 2
11
some text blah blah
12
blah blah
13
14
Paragraph END
15
Please help.
Advertisement
Answer
Your demo input should have been more ‘minimal’. However, I tried to understand your requirement and I tried re.split works:
JavaScript
1
28
28
1
import re
2
3
s = """Paragraph 1
4
some text blah blah
5
blah blah
6
UNWANTED TEXT
7
some text
8
Paragraph END
9
10
UNWANTED TEXT
11
12
Paragraph 2
13
some text blah blah
14
blah blah
15
UNWANTED TEXT
16
Paragraph END"""
17
reg_para = re.compile(r'(Paragraphs+d+.+?END)', re.DOTALL)
18
paras = reg_para.split(s)
19
for para in paras:
20
if reg_para.match(para):
21
para = re.sub(r"UNWANTED TEXT", " ", para)
22
# in case you want replace more words:
23
# of course you can use list of keywords some loops
24
para = re.sub(r"Another WORD", " ", para)
25
print(para)
26
else:
27
print(para)
28
Output:
JavaScript
1
17
17
1
Paragraph 1
2
some text blah blah
3
blah blah
4
5
some text
6
Paragraph END
7
8
9
UNWANTED TEXT
10
11
12
Paragraph 2
13
some text blah blah
14
blah blah
15
16
Paragraph END
17