Skip to content
Advertisement

Split columns by space or dash – python

I have a pandas df with mixed formatting for a specific column. It contains the qtr and year. I’m hoping to split this column into separate columns. But the formatting contains a space or a second dash between qtr and year.

I’m hoping to include a function that splits the column by a blank space or a second dash.

df = pd.DataFrame({              
'Qtr' : ['APR-JUN 2019','JAN-MAR 2019','JAN-MAR 2015','JUL-SEP-2020','OCT-DEC 2014','JUL-SEP-2015'],                           
})

out:

            Qtr
0  APR-JUN 2019 # blank
1  JAN-MAR 2019 # blank
2  JAN-MAR 2015 # blank
3  JUL-SEP-2020 # second dash
4  OCT-DEC 2014 # blank
5  JUL-SEP-2015 # second dash

split by blank

df[['Qtr', 'Year']] = df['Qtr'].str.split(' ', 1, expand=True)

split by second dash

df[['Qtr', 'Year']] = df['Qtr'].str.split('-', 1, expand=True)

intended output:

            Qtr  Year
0       APR-JUN  2019
1       JAN-MAR  2019
2       JAN-MAR  2015
3       JUL-SEP  2020
4       OCT-DEC  2014
5       JUL-SEP  2015 

Advertisement

Answer

You can use a regular expression with the extract function of the string accessor.

df[['Qtr', 'Year']] = df['Qtr'].str.extract(r'(w{3}-w{3}).(d{4})')
print(df)

Result

       Qtr  Year
0  APR-JUN  2019
1  JAN-MAR  2019
2  JAN-MAR  2015
3  JUL-SEP  2020
4  OCT-DEC  2014
5  JUL-SEP  2015
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement