For the below:
summary_df = (df .groupby(['provider', 'master_id']) .agg( content_type_id =('content_type_id', 'first'), title =('title', 'first'), release_year =('release_year', 'first'), ... subs =('burned_in_sub_language', lambda x: str(sorted(i.lower() for i in x.dropna().unique()))) ) .reset_index() )
What would be the proper way to do this before named aggregates were introduced, including the aliasing of columns?
Advertisement
Answer
As mentioned by Henry Yik, use .agg() followed by .rename().
For example:
summary_df = (df .groupby(['provider', 'master_id']) .agg({'content_type_id':'first', 'title': 'first',}) .rename(columns={ 'content_type_id': 'something else', 'title': 'changed_name',}) )