s="""Paragraph 1 some text blah blah blah blah UNWANTED TEXT some text Paragraph END UNWNTED TEXT Paragraph 2 some text blah blah blah blah UNWNTED TEXT Paragraph END"""
Now python code to re.sub to replace UNWANTED TEXT only inside paragraphs keep UNWANTED TEXT Outside paragraphs
search_unwanted_only_inparagrap = re.findall('(?s)(?<=Paragraph)(.*?)(?=END)', text_file, flags = re.MULTILINE ) if search_unwanted_only_inparagrap: replace_only_insidepara = re.sub(r"UNWANTED TEXT+", " ", text_file) #replace string substitue print (replace_only_insidepara) else: print ("not found")
But the output replace all instance of UNWANTED TEXT in through out the file
Paragraph 1 some text blah blah blah blah some text Paragraph END Paragraph 2 some text blah blah blah blah Paragraph END
but i expect like this
Paragraph 1 some text blah blah blah blah some text Paragraph END UNWNTED TEXT Paragraph 2 some text blah blah blah blah Paragraph END
Please help.
Advertisement
Answer
Your demo input should have been more ‘minimal’. However, I tried to understand your requirement and I tried re.split works:
import re s = """Paragraph 1 some text blah blah blah blah UNWANTED TEXT some text Paragraph END UNWANTED TEXT Paragraph 2 some text blah blah blah blah UNWANTED TEXT Paragraph END""" reg_para = re.compile(r'(Paragraphs+d+.+?END)', re.DOTALL) paras = reg_para.split(s) for para in paras: if reg_para.match(para): para = re.sub(r"UNWANTED TEXT", " ", para) # in case you want replace more words: # of course you can use list of keywords some loops para = re.sub(r"Another WORD", " ", para) print(para) else: print(para)
Output:
Paragraph 1 some text blah blah blah blah some text Paragraph END UNWANTED TEXT Paragraph 2 some text blah blah blah blah Paragraph END