Extract specific symbols from pandas cells, then replace them with values from a dict where they are keys

Question

My data looks like this, well these are two of the columns: These are MSC codes, corresponding to different areas of science. I need to replace each code with the corresponding subject from this dict here: https://mathscinet.ams.org/msnhtml/msc2020.pdf , some of them are: """ 00 General and overarching topics; collections 01 History and biography 03 Mathematical logic and foundations 05 Combinatorics

Accepted Answer

Here are a couple of options:Using a regex to get the first two chars of each &#8220;word&#8221;:df["MSC Secondary"] = (    df["MSC Secondary"]    .str.extractall(r"[ (](w{2})")[0]    .map(d)    .groupby(level=0).agg(list))Using:str.extractall apply the regex [ (](w{2}) to get the first two characters from all words in each rowmap map the dict, d over the [0] (zero-eth) match groupgroupby(level=0).agg(list) to group the Series by index (level=0) and put them back into lists (.agg(list))Through a few chained pandas str methods:d = dict(...)df["MSC Secondary"] = (    df["MSC Secondary"]    .str.strip("()")    .str.split()    .explode()    .str[:2]    .map(d)    .groupby(level=0)    .agg(list))#   MSC Primary                                      MSC Secondary# 0       05C25  [Combinatorics, Group theory and generalizations]# 1       20-04     [Group theory and generalizations, Statistics]# 2       13F20  [Nonassociative rings and algebras, Topologica...# 3       05Exx  [Group theory and generalizations, Group theor...# 4       20G40                                    [Combinatorics]Here we use:pandas.Series.str.strip to remove the parenthesespandas.Series.str.split to split the substrings into listspandas.Series.explode to turn every element in each list into it&#8217;s own rowstr[:2] to slice of the first two charactersmap to map your linked dictgrouby(level=0).agg(list) to group the Series by index (level=0) and put them back into lists (.agg(list))

Advertisement

Answer