Skip to content
Advertisement

Segregate a column data based on regex using pandas

I have a dataframe like as shown below

JavaScript

I would like to create 3 new columns

val_num – will store ONLY NUMBER values that comes along with symbols ex: 1234 (from >1234) and 1000 (from <1000) but WILL NOT STORE 31 (from 31sadj) because it doesn’t have any symbol

val_str – will store only values a mix of NUMBER,symbols,ALPHABETS or just plain alphabets ex: 31sadj. It can have any symbols except >,<,=

val_symbol – will store ONLY 3 symbols like >, <, =

I tried the below but it isn’t accurate

JavaScript

I expect my output to be like as shown below

enter image description here

Advertisement

Answer

You can use

JavaScript

You want to work on a mixed data type column, so the first operation is to convert the data to string with astype(str).

The val_num column is populated with b(d+(?:.d+)?)b matches, integer or float numbers matched as whole words (b stands for a word boundary).

The val_str column is populated with ([^<>=]*[a-zA-Z][^<>=]*) matches, that searches for zero or more chars other than <, > and =, then a letter and then again zero or more chars other than <, > and =.

The output I get:

JavaScript
Advertisement