Edit: 2022NOV21
How do we chain df.col.str.split()
since this returns the split columns if expand = True
I am trying to split a column after performing .melt()
. If I use assign I end up using the original column and the melted column actually does not even exist.
df = pd.DataFrame().from_dict({ 'id' : [1,2,3,4], '2022_amt' : [10.1,20.2,30.3, 40.4], '2022_qty' : [10,20,30,40] }) df = ( df .melt( id_vars=['id'], value_vars=['2022_amt', '2022_qty'], var_name='fy', value_name='num' ) # can i chain any pd.Series.str.[METHOD] here # .assign( # year=df.fy.str.split('_', expand=True)[0], # t=df.fy.str.split('_', expand=True)[1] # ) ) # i can add the two columns in this way but can we use chain to expand dataframe df df[['year', 't']] = df.fy.str.split('_', expand=True) df = df.drop(columns = ['fy'])
Advertisement
Answer
Using expand
converts it into a DataFrame, which you do not really want here; secondly with chaining, use an anonymous function to refer to the previous dataframe:
(df .melt(id_vars='id',var_name='fy',value_name='num') assign(year = lambda df: df.fy.str.split('_').str[0], t = lambda df: df.fy.str.split('_').str[1]) ) id fy num year t 0 1 2022_amt 10.1 2022 amt 1 2 2022_amt 20.2 2022 amt 2 3 2022_amt 30.3 2022 amt 3 4 2022_amt 40.4 2022 amt 4 1 2022_qty 10.0 2022 qty 5 2 2022_qty 20.0 2022 qty 6 3 2022_qty 30.0 2022 qty 7 4 2022_qty 40.0 2022 qty
For your use case, there are simpler, more efficient ways to do this:
- with
pd.stack
:
df = df.set_index('id') df.columns = df.columns.str.split('_', expand = True) df.columns.names = ['year', 't'] df.stack(['year', 't']).reset_index(name='num') id year t num 0 1 2022 amt 10.1 1 1 2022 qty 10.0 2 2 2022 amt 20.2 3 2 2022 qty 20.0 4 3 2022 amt 30.3 5 3 2022 qty 30.0 6 4 2022 amt 40.4 7 4 2022 qty 40.0
- with
pivot_longer
frompyjanitor
:
# pip install pyjanitor import pandas as pd import janitor as jn df.pivot_longer(index = 'id', names_to = ('year','t'), names_sep = '_') id year t value 0 1 2022 amt 10.1 1 2 2022 amt 20.2 2 3 2022 amt 30.3 3 4 2022 amt 40.4 4 1 2022 qty 10.0 5 2 2022 qty 20.0 6 3 2022 qty 30.0 7 4 2022 qty 40.0