Skip to content
Advertisement

Set alphanumeric regex pattern not accepting certain specific symbols

import re

#Examples:
input_text = "Recien el 2021-10-12 despues de 3 dias 2021-10-12" #NOT PASS
input_text = "Recien el 2021-10-12 hsah555sahsdhj. Ya despues de 3 dias hjsdfhjdsfhjdsf 2021-10-12" #NOT PASS
input_text = "Recien el 2021-10-12 hsah555sahsdhj; despues de 3 dias hjsdfhjdsfhjdsf 2021-10-12" #NOT PASS
input_text = "Recien el 2021-10-12 hsah555sahsdhj despues de 3 dias hjsdfhjdsfhjdsf.n 2021-10-12" #NOT PASS
input_text = "Recien el 2021-10-12 hsah555sahsdhj; mmm... creo que ya despues de 3 dias hjsdfhjdsfhjdsf.n 2021-10-12" #PASS
input_text = "Recien el 2021-10-12 hsah555sahsdhj.    nnn mmm... creo que ya despues de 3 dias hjsdfhjdsfhjdsf.n 2021-10-12" #PASS


some_text = r"[s|]*"  # <--- I NEED MODIFY THIS PATTERN
date_format = r"d*-d{2}-d{2}"

check_00 = re.search(date_format + some_text + r"(?:(?:pasados|pasado|despues del|despues de el|despues de|despues|tras) (d+) (?:días|día|dias|dia)|(d+) (?:días|día|dias|dia) (?:pasados|pasado|despues del|despues de el|despues de|despues|tras))", input_text, re.IGNORECASE)
check_01 = re.search(r"(?:(?:pasados|pasado|despues del|despues de el|despues de|despues|tras) (d+) (?:días|día|dias|dia)|(d+) (?:días|día|dias|dia) (?:pasados|pasado|despues del|despues de el|despues de|despues|tras))" + some_text + date_format, input_text, re.IGNORECASE)

if not check_00 and not check_01: print("1")
else: print("0")

I need to set in the variable some_text a pattern that identify any alphanumeric substrings (that could possibly contain symbols included, such as : , $, #, &, ?, ¿, !, ¡, |, °, , , ., (, ), ], [, }, { ), and with the possibility of containing uppercase and lowercase characters, but the only symbols that should not to be present, not even once, are ; and .n or .[s|]*n*

In this case I need to determine which cases does NOT meet, therefore, the if not conditionals in the code.

The output you should get if everything in the algorithm works fine would be this:

0  #for example 1
0  #for example 2
0  #for example 3
0  #for example 4
1  #for example 5
1  #for example 6

Is it possible, within the same pattern that I want to place in the some_text variable, to indicate a list with the symbols that I do NOT want to appear in that identification area of the pattern (in this case ; and .[s|]*n* )?

Advertisement

Answer

but the only symbols that should not to be present, not even once, are ; and .n or .[s|]n

For not allowing ; you can simply use [^;].

Regarding the other two “patterns”: the [s|] pattern makes a wrong assumption: a pipe symbol inside a character class will be interpreted literally. It seems you want to indicate with it that the s is optional, but the asterisk already ensures this. The point must be escaped. So .s*?n. But to disallow it, you can put it in a negative look-ahead: (?!.s*?n).

This leads to:

some_text = r"(?:(?!.s*?n)[^;])*"
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement