I’m working with the following DataFrame column containing Date |TimeStamp | Name | Message
as a string
JavaScript
x
4
1
59770 [08/10/18, 5:57:43 PM] Luke: Message
2
59771 [08/10/18, 5:57:48 PM] Luke: Message
3
59772 [08/10/18, 5:57:50 PM] Luke: Message
4
I use the following function to capture the Date.
JavaScript
1
3
1
def getdate(x):
2
res = re.search("dd/dd/dd",x)
3
and the following code to capture the rest of the data (TimeStamp | Name | Message) into columns:
JavaScript
1
3
1
df['Data'].str.extract(r's*(.{10})](.*):(.*)')
2
3
Is there a workaround to capture and extract all 4 entities together?
Please Advise
Advertisement
Answer
As an alternative you could use regex named groups
together with pandas extractall
.
JavaScript
1
20
20
1
import pandas as pd
2
import re
3
4
df = pd.DataFrame(
5
[" [08/10/18, 5:57:43 PM] Luke: Message",
6
" [08/10/18, 5:57:48 PM] Luke: Message",
7
" [08/10/18, 5:57:50 PM] Luke: Message"])
8
9
print(df)
10
11
regex = re.compile(
12
r"(?P<date>d{2}/d{2}/d{2}),s*"
13
r"(?P<timestamp>d+:d+:d+s[AP]M)]s+"
14
r"(?P<name>.+?):s*"
15
r"(?P<message>.+)$"
16
)
17
18
df_out = df[0].str.extractall(regex).droplevel(1)
19
print(df_out)
20
Output from df_out
JavaScript
1
5
1
date timestamp name message
2
0 08/10/18 5:57:43 PM Luke Message
3
1 08/10/18 5:57:48 PM Luke Message
4
2 08/10/18 5:57:50 PM Luke Message
5