I have a column in a dataframe with different strings.
Additional Information | IP=192.168.1.1, MAC ADDR=00:0a:95:9d:68:16, USER=kwfinn IP=192.168.0.1, MAC ADDR=00:0a:95:9d:68:17, USER=wattray Undefined System Error Specific groupname=CUSTGR1 IP=192.168.1.2, MAC ADDR=00:1B:44:11:3A:B7, USER=stwnck
What I want to do is to create new columns, IP Address and MAC Address with the corresponding values from the column above.
So that the expected output looks like this:
Additional Information |IP Address | MAC Address | IP=192.168.1.1, MAC ADDR=00:0a:95:9d:68:16, USER=kwfinn |192.168.1.1 |00:0a:95:9d:68:16| IP=192.168.0.1, MAC ADDR=00:0a:95:9d:68:17, USER=wattray|192.168.0.1 |00:0a:95:9d:68:17| Undefined System Error | | | Specific groupname=CUSTGR1 | | | IP=192.168.1.2, MAC ADDR=00:1B:44:11:3A:B7, USER=stwnck |192.168.1.2 |00:1B:44:11:3A:B7|
The problem is, that I cannot deal with the rows that does not contain IP and MAC. I tried splitting using np.where as well as finding partial matches but didn’t succeed.
Advertisement
Answer
Idea is use list comprehension with filtering if not missing value or None and exist , and =, pass to DataFrame constructor and last use DataFrame.join to original:
L = [dict(y.split("=") for y in v.split(", "))
if pd.notna(v) and ('=' in v) and (', ' in v)
else {}
for v in df['Additional Information']]
df1 = pd.DataFrame(L, index=df.index)
print (df1)
IP MAC ADDR USER
0 192.168.1.1 00:0a:95:9d:68:16 kwfinn
1 192.168.0.1 00:0a:95:9d:68:17 wattray
2 NaN NaN NaN
3 NaN NaN NaN
4 192.168.1.2 00:1B:44:11:3A:B7 stwnck
df = df.join(df1[['IP','MAC ADDR']])
print (df)
Additional Information IP
0 IP=192.168.1.1, MAC ADDR=00:0a:95:9d:68:16, US... 192.168.1.1
1 IP=192.168.0.1, MAC ADDR=00:0a:95:9d:68:17, US... 192.168.0.1
2 Undefined System Error NaN
3 Specific groupname=CUSTGR1 NaN
4 IP=192.168.1.2, MAC ADDR=00:1B:44:11:3A:B7, US... 192.168.1.2
MAC ADDR
0 00:0a:95:9d:68:16
1 00:0a:95:9d:68:17
2 NaN
3 NaN
4 00:1B:44:11:3A:B7