JavaScript
x
40
40
1
import re
2
3
4
def one_day_or_another_day_relative_to_a_date_func(input_text):
5
#print(repr(input_text)) #print what you have captured, and you should replace
6
return "aaaaaaaa"
7
8
9
def identify(input_text):
10
some_text = r"(?:(?!.s*?n)[^;])*"
11
12
date_capture_pattern = r"([12]d{3}-[01]d-[0-3]d)(D*?)"
13
14
previous_days = r"(d+)s*(?:dias|dia)s*(?:antes|previos|previo|antes|atrás|atras)s*"
15
after_days = r"(d+)s*(?:dias|dia)s*(?:después|despues|luego)s*"
16
17
n_patterns = [
18
previous_days + r"(?:del|des*el|de|al|a)s*" + some_text + date_capture_pattern + some_text + r"s*(?:,s*o|o)s*" + previous_days,
19
after_days + r"(?:del|des*el|de|al|a)s*" + some_text + date_capture_pattern + some_text + r"s*(?:,s*o|o)s*" + previous_days,
20
previous_days + r"(?:del|des*el|de|al|a)s*" + some_text + date_capture_pattern + some_text + r"s*(?:,s*o|o)s*" + after_days,
21
after_days + r"(?:del|des*el|de|al|a)s*" + some_text + date_capture_pattern + some_text + r"s*(?:,s*o|o)s*" + after_days]
22
23
#Itero la lista de patrones de búsqueda para que el programa intente con uno por uno
24
for n_pattern in n_patterns:
25
#Este es mi intento de realizar el reemplazo, aunque tiene problemas con modificadores non-greedy
26
input_text = re.sub(n_pattern, one_day_or_another_day_relative_to_a_date_func , input_text, re.IGNORECASE)
27
28
29
30
input_texts = ["8 dias antes o 9 dias antes del 2022-12-22",
31
"2 dias despues o 1 dia antes del 2022-12-22, dia en donde ocurrio",
32
"a tan solo 2 dias despues de 2022-12-22 o a caso eran 6 dias despues, mmm no recuerdo bien",
33
]
34
35
36
#Testing...
37
for input_text in input_texts:
38
#print(input_text)
39
print(one_day_or_another_day_relative_to_a_date_func(input_text))
40
Incorrect output that I am getting, because if I incorrectly capture the substrings, the replacements will also be incorrect
JavaScript
1
4
1
"aaaaaaaa"
2
"aaaaaaaa"
3
"aaaaaaaa"
4
Having well-defined limits, I don’t understand why this capture pattern try to capture beyond them?
And the output that I need is that:
JavaScript
1
4
1
"aaaaaaaa"
2
"aaaaaaaa, dia en donde ocurrio"
3
"a tan solo aaaaaaaa, mmm no recuerdo bien"
4
Advertisement
Answer
There are several errors in your code, among which:
- You are printing the result of the
one_day_or_another_day_relative_to_a_date_func
function. Print the result ofidentify
instead. - In the
identify
function you are not returning the result text. Addreturn input_text
at the end of it. - Make the “o…” suffix optional.
- Use regex alternation instead of multiple patterns, otherwise you may get unexpected results.
Fixed code (I’ve also made it more compact):
JavaScript
1
32
32
1
import re
2
3
4
def one_day_or_another_day_relative_to_a_date_func(input_text):
5
#print(repr(input_text)) #print what you have captured, and you should replace
6
return "aaaaaaaa"
7
8
9
def identify(input_text):
10
some_text = r"(?:(?!.s*?n)[^;])*"
11
date_capture_pattern = r"([12]d{3}-[01]d-[0-3]d)(D*?)"
12
previous_days = r"antes|previos|previo|antes|atrás|atras"
13
after_days = r"después|despues|luego"
14
prev_or_after = r"(d+)s*(?:dias|dia)s*(?:" + previous_days + "|" + after_days + ")s*"
15
preposition = r"(?:del|des*el|de|al|a)s*"
16
suffix = "(?:" + r"s*(?:,s*o|o)s*" + some_text + prev_or_after + ")?"
17
pattern = prev_or_after + some_text + preposition + date_capture_pattern + suffix
18
input_text = re.sub(pattern, one_day_or_another_day_relative_to_a_date_func , input_text, re.IGNORECASE)
19
return input_text
20
21
22
input_texts = ["8 dias antes o 9 dias antes del 2022-12-22",
23
"2 dias despues o 1 dia antes del 2022-12-22, dia en donde ocurrio",
24
"a tan solo 2 dias despues de 2022-12-22 o a caso eran 6 dias despues, mmm no recuerdo bien",
25
]
26
27
28
#Testing...
29
for input_text in input_texts:
30
#print(input_text)
31
print(identify(input_text))
32
Result:
JavaScript
1
4
1
aaaaaaaa
2
aaaaaaaa, dia en donde ocurrio
3
a tan solo aaaaaaaa, mmm no recuerdo bien
4