Skip to content
Advertisement

Extracting Specific Text From column in dataframe

I have the following dataframe and I’m trying to extract the string that has the ABC followed by it’s numbers.

Description
ABC12345679 132465
Test ABC12346548
Test ABC1231321 4645

I have tried:

data['extract'] = 'ABC'+data['Description'].str.split('ABC').str[1]

But its giving me what it comes after on instances that there’s more text after the ABC* like so:

Description
ABC12345679 132465
ABC12346548
ABC1231321 4645

And I need the column to only contain data as following:

Description
ABC12345679
ABC12346548
ABC1231321

Any assistance on this?

Advertisement

Answer

We can use regex to extract the necessary part of the string.
Here we are checking for atleast one [A-C] and 0 or more[0-9]

data['extract'] = data.Description.str.extract(r'([A-C]+[0-9]*)')

or (based on need)

data['extract'] = data.Description.str.extract(r'([A-C]+[0-9]+)')

Output

    Description             extract
0   ABC12345679 132465      ABC12345679
1   Test ABC12346548        ABC12346548
2   Test ABC1231321 4645    ABC1231321
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement