Skip to content
Advertisement

Applying abbreviation to the column of a dataframe based on another column of the same dataframe

I have two columns in the dataframe, one of which is a class and another is a description. In the description I have some abbreviations. I want to expand these abbreviations based on the class value. I have a dictionary with class as key and in the value I have another dictionary with abbreviations and its full form. Since these abbreviations mean different based on the class. eg :- IT could mean ether Information Transport or Information Technology based on the class label.

I tried groupby, but was not able to get it back in the original dataframe. Any help is much appreciated. Thanks

This is how I was trying:

grouped = df.groupby('class')
for n,j in grouped:
    j['description'].str.split().apply(lambda x: ' '.join([abb[n].get(e, e) for e in x]))

example

Advertisement

Answer

Here is a working example that takes the row as input and looks up the class value in the dictionary, and replaces strings description with the corresponding value in the dict:

import pandas as pd

abb = {'IT':{'SQL':'Structured Query Language'},'Sales':{'SQL':'Sales Qualified Lead'}}

data = [{'class':'IT', 'description':'SQL developer'},{'class':'Sales', 'description':'senior SQL'}]
df = pd.DataFrame(data)

def replace_strings(row):
    text = row['description']
    for key, value in abb[row['class']].items():
        text = text.replace(key, value)
    return text

df['description'] = df.apply(replace_strings, axis=1)
class description
0 IT Structured Query Language developer
1 Sales senior Sales Qualified Lead
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement