I have a column filled with a string value:
col_1 |
---|
10500 |
25020 |
35640 |
45440 |
50454 |
62150 |
75410 |
I want to be able to create two other columns with strings values that have been splitted from the first. Also I want an efficient way to do that.
Supposed result :
col_1 | col_2 | col_3 |
---|---|---|
10500 | 10 | 500 |
25020 | 25 | 020 |
35640 | 35 | 640 |
45440 | 45 | 440 |
50454 | 50 | 454 |
62150 | 62 | 150 |
75410 | 75 | 410 |
So far I was trying to go with vectorization, but hasn’t been able to implement it yet.
For the split part, I parse the row (with iterows, and I know that iterows has to be avoid as much as possible.) and create a list that can be used to populate the new tabs, but in my opinion this way is a too archaic.
Also, how can I efficiently, modify each cell ? Like adding a comma, or operating on them ?
Thank you.
Advertisement
Answer
Use str
accessor:
JavaScript
x
13
13
1
df = df.join(df['col_1'].astype(str).str.extract('(?P<col_2>d{2})(?P<col_3>d{3})'))
2
print(df)
3
4
# Output:
5
col_1 col_2 col_3
6
0 10500 10 500
7
1 25020 25 020
8
2 35640 35 640
9
3 45440 45 440
10
4 50454 50 454
11
5 62150 62 150
12
6 75410 75 410
13
Or simple in few steps:
JavaScript
1
15
15
1
df['col_1'] = df['col_1'].astype(str)
2
df['col_2'] = df['col_1'].str[:2]
3
df['col_3'] = df['col_1'].str[2:]
4
print(df)
5
6
# Output
7
col_1 col_2 col_3
8
0 10500 10 500
9
1 25020 25 020
10
2 35640 35 640
11
3 45440 45 440
12
4 50454 50 454
13
5 62150 62 150
14
6 75410 75 410
15
Another example:
JavaScript
1
14
14
1
df['col_1'] = df['col_1'].astype(str)
2
df['col_4'] = df['col_1'].str[:2] + '-' + df['col_1'].str[2:]
3
print(df)
4
5
# Output
6
col_1 col_4
7
0 10500 10-500
8
1 25020 25-020
9
2 35640 35-640
10
3 45440 45-440
11
4 50454 50-454
12
5 62150 62-150
13
6 75410 75-410
14