I have the following dataframe and I’m trying to extract the string that has the ABC followed by it’s numbers.
Description |
---|
ABC12345679 132465 |
Test ABC12346548 |
Test ABC1231321 4645 |
I have tried:
JavaScript
x
2
1
data['extract'] = 'ABC'+data['Description'].str.split('ABC').str[1]
2
But its giving me what it comes after on instances that there’s more text after the ABC* like so:
Description |
---|
ABC12345679 132465 |
ABC12346548 |
ABC1231321 4645 |
And I need the column to only contain data as following:
Description |
---|
ABC12345679 |
ABC12346548 |
ABC1231321 |
Any assistance on this?
Advertisement
Answer
We can use regex to extract the necessary part of the string.
Here we are checking for atleast one [A-C] and 0 or more[0-9]
JavaScript
1
2
1
data['extract'] = data.Description.str.extract(r'([A-C]+[0-9]*)')
2
or (based on need)
JavaScript
1
2
1
data['extract'] = data.Description.str.extract(r'([A-C]+[0-9]+)')
2
Output
JavaScript
1
5
1
Description extract
2
0 ABC12345679 132465 ABC12345679
3
1 Test ABC12346548 ABC12346548
4
2 Test ABC1231321 4645 ABC1231321
5