Skip to content
Advertisement

Python Pivoting dataframe that has mulitple ID columns

from a database I get the following table into a python dataframe df:

FunctionID FunctionText FunctionModule UserGroup
1 Fct1 ModX GroupA
2 Fct2 ModX GroupA
2 Fct2 ModX GroupB
3 Fct3 ModY GroupB
3 Fct3 ModY GroupC
.
3000 Fct3000 ModZ GroupF

My goal is to get a pivot-like table that looks like this one:

FunctionID FunctionText FunctionModule GroupA GroupB GroupC GroupF
1 Fct1 ModX X
2 Fct2 ModX X X
3 Fct3 ModY X X
.
3000 Fct3000 ModZ X

So, the first 3 columns shall stay as is whereas the entries of UserGroup column shall be the added columns of the pivot table.

I tried some approaches like

df2 = df.pivot_table(index=df.iloc[:3],columns='UserGroup',values='UserGroup')

but without success. Maybe I have to use pivot() or stack() or unstack() ? But none of those led me to the desired target table.

Advertisement

Answer

IIUC, you could add an extra column and pivot_table with aggfunc='first':

(df
 .assign(value='x') # or value=df['UserGroup'] if you want the names as value
 .pivot_table(index=list(df.columns[:3]),
              columns='UserGroup',
              values='value',
              aggfunc='first',
              fill_value='',
              )
)

output:

UserGroup                              GroupA GroupB GroupC
FunctionID FunctionText FunctionModule                     
1          Fct1         ModX                x              
2          Fct2         ModX                x      x       
3          Fct3         ModY                       x      x

NB. note that index=df.iloc[:3] will also select ‘UserGroup’, I used index=list(df.columns[:3]) here

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement