Skip to content
Advertisement

python substrn cells in a column dataframe

I have this data frame with this kind of column:

JavaScript

I need to clean this up and leave from “DCG_” up to where "</div>" begins:

JavaScript

Most of the cells in this column vary where the “DCG_” is located as well as the "</div>", I’m trying to use the following code line for this:

JavaScript

but it just returns all null

Advertisement

Answer

Use pd.Series.str.extract, where you specify a regular expression and extract anything in any capture groups in the first match:

JavaScript

which gives:

JavaScript

Regex explanation Try it online:

JavaScript
Advertisement