Skip to content
Advertisement

Split string and put the extracted string to another new column in python

My data –

import pandas as pd

data = [['1', 'Electrical and Instrumentation -DI (Supervisor)'], ['2', 'CCM - DI'], 
        ['3', 'Business Development and Marketing - DI'], ['4', 'MDs Office'],
       ['5', 'Administration - PI']]
df = pd.DataFrame(data, columns = ['sl_no', 'dept'])
df

sl_no   dept
1       Electrical and Instrumentation -DI (Supervisor)
2       CCM - DI
3       Business Development and Marketing - DI
4       MDs Office
5       Administration - PI

So in this data, I need to split dept column, and need to create another column where my values should be only PI or DI. Is it possible to create a code where it accurately takes out only PI or DI from dept column

My output –

sl_no   dept                                 division
1       Electrical and... -DI (Supervisor)    DI
2       CCM - DI                              DI
3       Business Dev.... - DI                 DI
4       MDs Office                           None
5       Administration - PI                  PI

I have tried –

new = df['dept'].str.split(' - ', expand = True)
df['division'] = new[1]

but this doesn’t serve the purpose, because ‘ – ‘ this dash is not same, somewhere it is occurring like ‘ -‘, so I have to rework on it, and also I get a few results like – PI (Supervisor) so I have to remove (Supervisor) part. The data is not same, it varies, I only need PI or DI. So I need a code in which, wherever it finds PI or DI , it will extract it and then create a new column where it only stores PI and DI only. Please help.

Advertisement

Answer

You might be able to just use str.extract here:

df["division"] = df["dept"].str.extract(r'b(DI|PI)b')

For department values not having either DI or PI in them, the value of division would be NaN.

Advertisement