I have two columns in the dataframe, one of which is a class and another is a description. In the description I have some abbreviations. I want to expand these abbreviations based on the class value. I have a dictionary with class as key and in the value I have another dictionary with abbreviations and its full form. Since these abbreviations mean different based on the class. eg :- IT could mean ether Information Transport or Information Technology based on the class label.
I tried groupby, but was not able to get it back in the original dataframe. Any help is much appreciated. Thanks
This is how I was trying:
grouped = df.groupby('class') for n,j in grouped: j['description'].str.split().apply(lambda x: ' '.join([abb[n].get(e, e) for e in x]))
Advertisement
Answer
Here is a working example that takes the row as input and looks up the class
value in the dictionary, and replaces strings description
with the corresponding value in the dict:
import pandas as pd abb = {'IT':{'SQL':'Structured Query Language'},'Sales':{'SQL':'Sales Qualified Lead'}} data = [{'class':'IT', 'description':'SQL developer'},{'class':'Sales', 'description':'senior SQL'}] df = pd.DataFrame(data) def replace_strings(row): text = row['description'] for key, value in abb[row['class']].items(): text = text.replace(key, value) return text df['description'] = df.apply(replace_strings, axis=1)
class | description | |
---|---|---|
0 | IT | Structured Query Language developer |
1 | Sales | senior Sales Qualified Lead |