I have a column in a dataframe with different strings.
Additional Information | IP=192.168.1.1, MAC ADDR=00:0a:95:9d:68:16, USER=kwfinn IP=192.168.0.1, MAC ADDR=00:0a:95:9d:68:17, USER=wattray Undefined System Error Specific groupname=CUSTGR1 IP=192.168.1.2, MAC ADDR=00:1B:44:11:3A:B7, USER=stwnck
What I want to do is to create new columns, IP Address and MAC Address with the corresponding values from the column above.
So that the expected output looks like this:
Additional Information |IP Address | MAC Address | IP=192.168.1.1, MAC ADDR=00:0a:95:9d:68:16, USER=kwfinn |192.168.1.1 |00:0a:95:9d:68:16| IP=192.168.0.1, MAC ADDR=00:0a:95:9d:68:17, USER=wattray|192.168.0.1 |00:0a:95:9d:68:17| Undefined System Error | | | Specific groupname=CUSTGR1 | | | IP=192.168.1.2, MAC ADDR=00:1B:44:11:3A:B7, USER=stwnck |192.168.1.2 |00:1B:44:11:3A:B7|
The problem is, that I cannot deal with the rows that does not contain IP and MAC. I tried splitting using np.where as well as finding partial matches but didn’t succeed.
Advertisement
Answer
Idea is use list comprehension with filtering if not missing value or None and exist ,
and =
, pass to DataFrame
constructor and last use DataFrame.join
to original:
L = [dict(y.split("=") for y in v.split(", ")) if pd.notna(v) and ('=' in v) and (', ' in v) else {} for v in df['Additional Information']] df1 = pd.DataFrame(L, index=df.index) print (df1) IP MAC ADDR USER 0 192.168.1.1 00:0a:95:9d:68:16 kwfinn 1 192.168.0.1 00:0a:95:9d:68:17 wattray 2 NaN NaN NaN 3 NaN NaN NaN 4 192.168.1.2 00:1B:44:11:3A:B7 stwnck df = df.join(df1[['IP','MAC ADDR']]) print (df) Additional Information IP 0 IP=192.168.1.1, MAC ADDR=00:0a:95:9d:68:16, US... 192.168.1.1 1 IP=192.168.0.1, MAC ADDR=00:0a:95:9d:68:17, US... 192.168.0.1 2 Undefined System Error NaN 3 Specific groupname=CUSTGR1 NaN 4 IP=192.168.1.2, MAC ADDR=00:1B:44:11:3A:B7, US... 192.168.1.2 MAC ADDR 0 00:0a:95:9d:68:16 1 00:0a:95:9d:68:17 2 NaN 3 NaN 4 00:1B:44:11:3A:B7