My data –
import pandas as pd data = [['1', 'Electrical and Instrumentation -DI (Supervisor)'], ['2', 'CCM - DI'], ['3', 'Business Development and Marketing - DI'], ['4', 'MDs Office'], ['5', 'Administration - PI']] df = pd.DataFrame(data, columns = ['sl_no', 'dept']) df sl_no dept 1 Electrical and Instrumentation -DI (Supervisor) 2 CCM - DI 3 Business Development and Marketing - DI 4 MDs Office 5 Administration - PI
So in this data, I need to split dept column, and need to create another column where my values should be only PI or DI. Is it possible to create a code where it accurately takes out only PI or DI from dept column
My output –
sl_no dept division 1 Electrical and... -DI (Supervisor) DI 2 CCM - DI DI 3 Business Dev.... - DI DI 4 MDs Office None 5 Administration - PI PI
I have tried –
new = df['dept'].str.split(' - ', expand = True) df['division'] = new[1]
but this doesn’t serve the purpose, because ‘ – ‘ this dash is not same, somewhere it is occurring like ‘ -‘, so I have to rework on it, and also I get a few results like – PI (Supervisor) so I have to remove (Supervisor) part. The data is not same, it varies, I only need PI or DI. So I need a code in which, wherever it finds PI or DI , it will extract it and then create a new column where it only stores PI and DI only. Please help.
Advertisement
Answer
You might be able to just use str.extract
here:
df["division"] = df["dept"].str.extract(r'b(DI|PI)b')
For department values not having either DI
or PI
in them, the value of division
would be NaN
.