I have a column filled with a string value:
col_1 |
---|
10500 |
25020 |
35640 |
45440 |
50454 |
62150 |
75410 |
I want to be able to create two other columns with strings values that have been splitted from the first. Also I want an efficient way to do that.
Supposed result :
col_1 | col_2 | col_3 |
---|---|---|
10500 | 10 | 500 |
25020 | 25 | 020 |
35640 | 35 | 640 |
45440 | 45 | 440 |
50454 | 50 | 454 |
62150 | 62 | 150 |
75410 | 75 | 410 |
So far I was trying to go with vectorization, but hasn’t been able to implement it yet.
For the split part, I parse the row (with iterows, and I know that iterows has to be avoid as much as possible.) and create a list that can be used to populate the new tabs, but in my opinion this way is a too archaic.
Also, how can I efficiently, modify each cell ? Like adding a comma, or operating on them ?
Thank you.
Advertisement
Answer
Use str
accessor:
df = df.join(df['col_1'].astype(str).str.extract('(?P<col_2>d{2})(?P<col_3>d{3})')) print(df) # Output: col_1 col_2 col_3 0 10500 10 500 1 25020 25 020 2 35640 35 640 3 45440 45 440 4 50454 50 454 5 62150 62 150 6 75410 75 410
Or simple in few steps:
df['col_1'] = df['col_1'].astype(str) df['col_2'] = df['col_1'].str[:2] df['col_3'] = df['col_1'].str[2:] print(df) # Output col_1 col_2 col_3 0 10500 10 500 1 25020 25 020 2 35640 35 640 3 45440 45 440 4 50454 50 454 5 62150 62 150 6 75410 75 410
Another example:
df['col_1'] = df['col_1'].astype(str) df['col_4'] = df['col_1'].str[:2] + '-' + df['col_1'].str[2:] print(df) # Output col_1 col_4 0 10500 10-500 1 25020 25-020 2 35640 35-640 3 45440 45-440 4 50454 50-454 5 62150 62-150 6 75410 75-410