Segregate a column data based on regex using pandas

Question

I have a dataframe like as shown below I would like to create 3 new columns val_num - will store ONLY NUMBER values that comes along with symbols ex: 1234 (from >1234) and 1000 (from <1000) but WILL NOT STORE 31 (from 31sadj) because it doesn't have any symbol val_str - will store only values a mix of NUMBER,symbols,ALPHABETS or

Accepted Answer

You can usedf['val_SYMBOL'] = df['val'].astype(str).str.extract(r'([<>=]+)').fillna('=')df['val_num'] = df['val'].astype(str).str.extract(r'b(d+(?:.d+)?)b')df['val_str'] = df['val'].astype(str).str.extract(r'([^<>=]*[a-zA-Z][^<>=]*)')You want to work on a mixed data type column, so the first operation is to convert the data to string with astype(str).The val_num column is populated with b(d+(?:.d+)?)b matches, integer or float numbers matched as whole words (b stands for a word boundary).The val_str column is populated with ([^<>=]*[a-zA-Z][^<>=]*) matches, that searches for zero or more chars other than <, > and =, then a letter and then again zero or more chars other than <, > and =.The output I get:>>> df val val_SYMBOL val_num val_str0 >1234 > 1234 NaN1 <> <> NaN NaN2 <1000 < 1000 NaN3

Advertisement

Answer