Skip to content
Advertisement

How to replace and insert a new substring in python?

This is a working code and mabybe not very effcient code to replace a substring with another substring previously modified

Input string :

text = ["part1 Pirates (2006)",
        "part2 Pirates (2006)"
]

Output string:

 Pirates PT1 (2006)

 Pirates PT2 (2006)

It has to replace substring like ‘part1’ ‘part2 an so on , with ‘PT’ and copy it between title and year substring Code:

#'''''''''''''''''''''''''
# are there parenthesis?
# 
def parenth(stringa):
   count = 0
  for i in stringa:
     if i == "(":
        count += 1
     elif i == ")":
        count -= 1
     if count < 0:
        return False
  return count == 0 


#'''''''''''''''''''''''''
# extract 'year' from 
# the string
# 
def getYear(stringa):
     if parenth(stringa) is True:
      return stringa[stringa.find("(")+1:stringa.find(")")]



#Start
for title in text:

  #Does the year exist ? try to Get it ---------> '2006'
  yearStr = getYear(title) 

  #Get integer next to 'part' substring  -------> '1'
  intPartStr = re.findall(r'part(d+)', title)

  #Delete 'part' Substring  --------------------> 'Pirates (2006)
  partStr = re.sub(r'part(d+)',"",title)

  #Build a new string  -------------------------> "PT1 (2006)"  
  newStr = "PT" + intPartStr[0] + " (" + yearStr + ")"

  #Update title with new String  newStr --------> "Pirates PT1 (2006)"
  result = re.sub(r'(([0-9]+))',newStr,partStr)

  #End
print (result)

but when the list is like this

text = ["pt1 Pirates (2006)",
        "part 2 Pirates (2006)"
]

I dont know how to extract the integer next to ‘part’ , ‘pt’ or ‘part 2’ and so on

EDIT:

I assumed this string was the same , but it doesn’t, sry

How to solve ?

"part 2 the day sports stood still (2021)"

w+ doesn’t grab all the words

Advertisement

Answer

You can do all the substitution at the same time:

import re

text = [
    "part1 Pirates (2006)",
    "part2 Pirates (2006)",
    "pt1 Pirates (2006)",
    "part 2 Pirates (2006)",
    "part 1 The day sports stood still (2021)"
]

pattern = r'(?:part|pt)s?(d+)s?(b[ws]+b)s?((d+))'
substitute = r'2 PT1 (3)'

for title in text:
    title = re.sub(pattern, substitute, title)

# if you want the result in a new array:
text_formatted = [re.sub(pattern, substitute, title) for title in text]

Regex explanation:

  • (?:part|pt)s?(d+) ignore text and capture the value (group 1)
  • (b[ws]+b) capture the title (group 2)
  • ((d+)) capture the year in parenthesis (group 3)
  • '2 PT1 (3)' recreate your string with group number
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement