import re #Examples: input_text = "Recien el 2021-10-12 despues de 3 dias 2021-10-12" #NOT PASS input_text = "Recien el 2021-10-12 hsah555sahsdhj. Ya despues de 3 dias hjsdfhjdsfhjdsf 2021-10-12" #NOT PASS input_text = "Recien el 2021-10-12 hsah555sahsdhj; despues de 3 dias hjsdfhjdsfhjdsf 2021-10-12" #NOT PASS input_text = "Recien el 2021-10-12 hsah555sahsdhj despues de 3 dias hjsdfhjdsfhjdsf.n 2021-10-12" #NOT PASS input_text = "Recien el 2021-10-12 hsah555sahsdhj; mmm... creo que ya despues de 3 dias hjsdfhjdsfhjdsf.n 2021-10-12" #PASS input_text = "Recien el 2021-10-12 hsah555sahsdhj. nnn mmm... creo que ya despues de 3 dias hjsdfhjdsfhjdsf.n 2021-10-12" #PASS some_text = r"[s|]*" # <--- I NEED MODIFY THIS PATTERN date_format = r"d*-d{2}-d{2}" check_00 = re.search(date_format + some_text + r"(?:(?:pasados|pasado|despues del|despues de el|despues de|despues|tras) (d+) (?:días|día|dias|dia)|(d+) (?:días|día|dias|dia) (?:pasados|pasado|despues del|despues de el|despues de|despues|tras))", input_text, re.IGNORECASE) check_01 = re.search(r"(?:(?:pasados|pasado|despues del|despues de el|despues de|despues|tras) (d+) (?:días|día|dias|dia)|(d+) (?:días|día|dias|dia) (?:pasados|pasado|despues del|despues de el|despues de|despues|tras))" + some_text + date_format, input_text, re.IGNORECASE) if not check_00 and not check_01: print("1") else: print("0")
I need to set in the variable some_text
a pattern that identify any alphanumeric substrings (that could possibly contain symbols included, such as :
, $
, #
, &
, ?
, ¿
, !
, ¡
, |
, °
, ,
, .
, (
, )
, ]
, [
, }
, {
), and with the possibility of containing uppercase and lowercase characters, but the only symbols that should not to be present, not even once, are ;
and .n
or .[s|]*n*
In this case I need to determine which cases does NOT meet, therefore, the if not
conditionals in the code.
The output you should get if everything in the algorithm works fine would be this:
0 #for example 1 0 #for example 2 0 #for example 3 0 #for example 4 1 #for example 5 1 #for example 6
Is it possible, within the same pattern that I want to place in the some_text
variable, to indicate a list with the symbols that I do NOT want to appear in that identification area of the pattern (in this case ;
and .[s|]*n*
)?
Advertisement
Answer
but the only symbols that should not to be present, not even once, are ; and .n or .[s|]n
For not allowing ;
you can simply use [^;]
.
Regarding the other two “patterns”: the [s|]
pattern makes a wrong assumption: a pipe symbol inside a character class will be interpreted literally. It seems you want to indicate with it that the s
is optional, but the asterisk already ensures this. The point must be escaped. So .s*?n
. But to disallow it, you can put it in a negative look-ahead: (?!.s*?n)
.
This leads to:
some_text = r"(?:(?!.s*?n)[^;])*"