Skip to content
Advertisement

How to split words into different columns in dataframe?

I am new to coding , recently started learning to code. Currently I am stuck in the process to split a column. Please help me

I have this dataframe

data = ['TOOK22JAN1515100HG','BOOK22FEB1643200GH','TOOK22MAR1742200HG']
df= pd.DataFrame(data)

and I want to split it into

0   TOOK22JAN1515100HG       TOOK    22-01-15   15100  HG
1   BOOK22FEB1643200GH       BOOK    22-02-16   43200  GH
2   TOOK22MAR1742200HG       TOOK    22-03-17   42200  HG

Really appreciate for taking your time and answering to my problem.

PS: this is just an example of option symbol which is combination of Index + date + strike + type (stock market)

Advertisement

Answer

Use str.extract to explode your string:

pattern = r'(?P<id>[A-Z]{4})(?P<date>w{7})(?P<val>d+)(?P<misc>[A-Z]{2})'

df = df.join(df[0].str.extract(pattern))

df['date'] = pd.to_datetime(df['date'])
df['val'] = df['val'].astype(int)
print(df)

# Output
                    0    id       date    val misc
0  TOOK22JAN1515100HG  TOOK 2015-01-22  15100   HG
1  BOOK22FEB1643200GH  BOOK 2016-02-22  43200   GH
2  TOOK22MAR1742200HG  TOOK 2017-03-22  42200   HG
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement