Skip to content
Advertisement

How to split a columns based on the index of the string in the columns while using a efficient method to parse all the Dataframe

I have a column filled with a string value:

col_1
10500
25020
35640
45440
50454
62150
75410

I want to be able to create two other columns with strings values that have been splitted from the first. Also I want an efficient way to do that.

Supposed result :

col_1 col_2 col_3
10500 10 500
25020 25 020
35640 35 640
45440 45 440
50454 50 454
62150 62 150
75410 75 410

So far I was trying to go with vectorization, but hasn’t been able to implement it yet.

For the split part, I parse the row (with iterows, and I know that iterows has to be avoid as much as possible.) and create a list that can be used to populate the new tabs, but in my opinion this way is a too archaic.

Also, how can I efficiently, modify each cell ? Like adding a comma, or operating on them ?

Thank you.

Advertisement

Answer

Use str accessor:

df = df.join(df['col_1'].astype(str).str.extract('(?P<col_2>d{2})(?P<col_3>d{3})'))
print(df)

# Output:
   col_1 col_2 col_3
0  10500    10   500
1  25020    25   020
2  35640    35   640
3  45440    45   440
4  50454    50   454
5  62150    62   150
6  75410    75   410

Or simple in few steps:

df['col_1'] = df['col_1'].astype(str)
df['col_2'] = df['col_1'].str[:2]
df['col_3'] = df['col_1'].str[2:]
print(df)

# Output
   col_1 col_2 col_3
0  10500    10   500
1  25020    25   020
2  35640    35   640
3  45440    45   440
4  50454    50   454
5  62150    62   150
6  75410    75   410

Another example:

df['col_1'] = df['col_1'].astype(str)
df['col_4'] = df['col_1'].str[:2] + '-' + df['col_1'].str[2:]
print(df)

# Output
   col_1   col_4
0  10500  10-500
1  25020  25-020
2  35640  35-640
3  45440  45-440
4  50454  50-454
5  62150  62-150
6  75410  75-410
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement