I have a column filled with a string value:
| col_1 |
|---|
| 10500 |
| 25020 |
| 35640 |
| 45440 |
| 50454 |
| 62150 |
| 75410 |
I want to be able to create two other columns with strings values that have been splitted from the first. Also I want an efficient way to do that.
Supposed result :
| col_1 | col_2 | col_3 |
|---|---|---|
| 10500 | 10 | 500 |
| 25020 | 25 | 020 |
| 35640 | 35 | 640 |
| 45440 | 45 | 440 |
| 50454 | 50 | 454 |
| 62150 | 62 | 150 |
| 75410 | 75 | 410 |
So far I was trying to go with vectorization, but hasn’t been able to implement it yet.
For the split part, I parse the row (with iterows, and I know that iterows has to be avoid as much as possible.) and create a list that can be used to populate the new tabs, but in my opinion this way is a too archaic.
Also, how can I efficiently, modify each cell ? Like adding a comma, or operating on them ?
Thank you.
Advertisement
Answer
Use str accessor:
df = df.join(df['col_1'].astype(str).str.extract('(?P<col_2>d{2})(?P<col_3>d{3})'))
print(df)
# Output:
col_1 col_2 col_3
0 10500 10 500
1 25020 25 020
2 35640 35 640
3 45440 45 440
4 50454 50 454
5 62150 62 150
6 75410 75 410
Or simple in few steps:
df['col_1'] = df['col_1'].astype(str) df['col_2'] = df['col_1'].str[:2] df['col_3'] = df['col_1'].str[2:] print(df) # Output col_1 col_2 col_3 0 10500 10 500 1 25020 25 020 2 35640 35 640 3 45440 45 440 4 50454 50 454 5 62150 62 150 6 75410 75 410
Another example:
df['col_1'] = df['col_1'].astype(str) df['col_4'] = df['col_1'].str[:2] + '-' + df['col_1'].str[2:] print(df) # Output col_1 col_4 0 10500 10-500 1 25020 25-020 2 35640 35-640 3 45440 45-440 4 50454 50-454 5 62150 62-150 6 75410 75-410