I am trying to make regex for capturing alphanumeric words with special characters. The search will be done on small string of 4 – 5 words max and should extract a word. The target string can be anywhere in the string but will be separated with spaces.
Eg:
"Bill No: THRD/20-21/110" "CRN No: GSTASP/20-21/066" "Identifier value: PCPL-2021-000152"
Need to get these values
THRD/20-21/110 GSTASP/20-21/066 PCPL-2021-000152
The special characters are limited to “/ -” .So far all my approaches have Failed
Advertisement
Answer
You can use a lookahead to qualify that next non-space substring has either /
or -
in it:
(?<=[ t])(?=[^ t]*[/-])([0-9a-zA-Z/-]+)
That only works for a substring following a [ t]
taking literally your statement Target string can be anywhere in the string but is always separated by spaces.
If you want to capture potentially at the start of the string, remove the lookbehind:
(?=[^ t]*[/-])([0-9a-zA-Z/-]+)
That will capture any substring with that defined character set that has at least one [/-]
in it (at the cost of the efficiency of using a [ t]
delimiter as an anchor…)
Note: If you use -
as a literal character in a character class it either needs to be escaped or at the end of the class. Otherwise, the -
defines a range in the character class. This is a sneaky bug that has bitten many with a regex trying to capture a literal -
.