Split string and put the extracted string to anoth…

My data –

import pandas as pd

data = [['1', 'Electrical and Instrumentation -DI (Supervisor)'], ['2', 'CCM - DI'], 
        ['3', 'Business Development and Marketing - DI'], ['4', 'MDs Office'],
       ['5', 'Administration - PI']]
df = pd.DataFrame(data, columns = ['sl_no', 'dept'])
df

sl_no   dept
1       Electrical and Instrumentation -DI (Supervisor)
2       CCM - DI
3       Business Development and Marketing - DI
4       MDs Office
5       Administration - PI

So in this data, I need to split dept column, and need to create another column where my values should be only PI or DI. Is it possible to create a code where it accurately takes out only PI or DI from dept column

My output –

sl_no   dept                                 division
1       Electrical and... -DI (Supervisor)    DI
2       CCM - DI                              DI
3       Business Dev.... - DI                 DI
4       MDs Office                           None
5       Administration - PI                  PI

I have tried –

new = df['dept'].str.split(' - ', expand = True)
df['division'] = new[1]

but this doesn’t serve the purpose, because ‘ – ‘ this dash is not same, somewhere it is occurring like ‘ -‘, so I have to rework on it, and also I get a few results like – PI (Supervisor) so I have to remove (Supervisor) part. The data is not same, it varies, I only need PI or DI. So I need a code in which, wherever it finds PI or DI , it will extract it and then create a new column where it only stores PI and DI only. Please help.

Answer

You might be able to just use str.extract here:

df["division"] = df["dept"].str.extract(r'b(DI|PI)b')

For department values not having either DI or PI in them, the value of division would be NaN.

Split string and put the extracted string to another new column in python

Advertisement

Answer