I have the following dataframe and I’m trying to extract the string that has the ABC followed by it’s numbers.
Description |
---|
ABC12345679 132465 |
Test ABC12346548 |
Test ABC1231321 4645 |
I have tried:
data['extract'] = 'ABC'+data['Description'].str.split('ABC').str[1]
But its giving me what it comes after on instances that there’s more text after the ABC* like so:
Description |
---|
ABC12345679 132465 |
ABC12346548 |
ABC1231321 4645 |
And I need the column to only contain data as following:
Description |
---|
ABC12345679 |
ABC12346548 |
ABC1231321 |
Any assistance on this?
Advertisement
Answer
We can use regex to extract the necessary part of the string.
Here we are checking for atleast one [A-C] and 0 or more[0-9]
data['extract'] = data.Description.str.extract(r'([A-C]+[0-9]*)')
or (based on need)
data['extract'] = data.Description.str.extract(r'([A-C]+[0-9]+)')
Output
Description extract 0 ABC12345679 132465 ABC12345679 1 Test ABC12346548 ABC12346548 2 Test ABC1231321 4645 ABC1231321