I have strings that includes names and sometime a username in a string followed by a datetime stamp:
JavaScript
x
4
1
GN1RLWFH0546-2020-04-10-18-09-52-563945.txt
2
JOHN-DOE-2020-04-10-18-09-52-563946t64.txt
3
DESKTOP-OHK45JO-2020-04-09-02-27-11-451975.txt
4
I want to extract the usernames from this string:
JavaScript
1
4
1
GN1RLWFH0546
2
JOHN-DOE
3
DESKTOP-OHK45JO
4
I have tried different regex patterns the closest I came to extract was following:
JavaScript
1
4
1
GN1RLWFH0546
2
DESKTOP
3
JOHN
4
Using the following regex pattern:
JavaScript
1
3
1
names = re.search(r"(?([0-9A-Za-z]+))?", agent_str)
2
print(names.group(1))
3
Advertisement
Answer
You may get all text up to the first occurrence of -
+digits+-
:
JavaScript
1
2
1
^.*?(?=-d+-)
2
If the number must be exactly 4 digits (say, if it is a year), then replace +
with {4}
:
JavaScript
1
2
1
^.*?(?=-d{4}-)
2
See the regex demo
Details
^
– start of string.*?
– any 0+ chars other than line break chars, as few as possible(?=-d+-)
– up to the first occurrence of-
and 1+ digits (or, ifd{4}
is used, exactly four digits) and then-
(this part is not added to the match value as the positive lookahead is a non-consuming pattern).
See Python demo:
JavaScript
1
8
1
import re
2
strs = ["GN1RLWFH0546-2020-04-10-18-09-52-563945.txt", "JOHN-DOE-2020-04-10-18-09-52-563946t64.txt", "DESKTOP-OHK45JO-2020-04-09-02-27-11-451975.txt"]
3
rx = re.compile(r"^.*?(?=-d+-)")
4
for s in strs:
5
m = rx.search(s)
6
if m:
7
print("{} => '{}'".format(s, m.group()))
8
Output:
JavaScript
1
4
1
GN1RLWFH0546-2020-04-10-18-09-52-563945.txt => 'GN1RLWFH0546'
2
JOHN-DOE-2020-04-10-18-09-52-563946t64.txt => 'JOHN-DOE'
3
DESKTOP-OHK45JO-2020-04-09-02-27-11-451975.txt => 'DESKTOP-OHK45JO'
4