This is a working code and mabybe not very effcient code to replace a substring with another substring previously modified
Input string :
text = ["part1 Pirates (2006)", "part2 Pirates (2006)" ]
Output string:
Pirates PT1 (2006) Pirates PT2 (2006)
It has to replace substring like ‘part1’ ‘part2 an so on , with ‘PT’ and copy it between title and year substring Code:
#''''''''''''''''''''''''' # are there parenthesis? # def parenth(stringa): count = 0 for i in stringa: if i == "(": count += 1 elif i == ")": count -= 1 if count < 0: return False return count == 0 #''''''''''''''''''''''''' # extract 'year' from # the string # def getYear(stringa): if parenth(stringa) is True: return stringa[stringa.find("(")+1:stringa.find(")")] #Start for title in text: #Does the year exist ? try to Get it ---------> '2006' yearStr = getYear(title) #Get integer next to 'part' substring -------> '1' intPartStr = re.findall(r'part(d+)', title) #Delete 'part' Substring --------------------> 'Pirates (2006) partStr = re.sub(r'part(d+)',"",title) #Build a new string -------------------------> "PT1 (2006)" newStr = "PT" + intPartStr[0] + " (" + yearStr + ")" #Update title with new String newStr --------> "Pirates PT1 (2006)" result = re.sub(r'(([0-9]+))',newStr,partStr) #End print (result)
but when the list is like this
text = ["pt1 Pirates (2006)", "part 2 Pirates (2006)" ]
I dont know how to extract the integer next to ‘part’ , ‘pt’ or ‘part 2’ and so on
I assumed this string was the same , but it doesn’t, sry
How to solve ?
"part 2 the day sports stood still (2021)"
w+ doesn’t grab all the words
You can do all the substitution at the same time:
import re text = [ "part1 Pirates (2006)", "part2 Pirates (2006)", "pt1 Pirates (2006)", "part 2 Pirates (2006)", "part 1 The day sports stood still (2021)" ] pattern = r'(?:part|pt)s?(d+)s?(b[ws]+b)s?((d+))' substitute = r'2 PT1 (3)' for title in text: title = re.sub(pattern, substitute, title) # if you want the result in a new array: text_formatted = [re.sub(pattern, substitute, title) for title in text]
Regex explanation:
ignore text and capture the value (group 1)(b[ws]+b)
capture the title (group 2)((d+))
capture the year in parenthesis (group 3)'2 PT1 (3)'
recreate your string with group number