from a database I get the following table into a python dataframe df:
| FunctionID | FunctionText | FunctionModule | UserGroup | 
|---|---|---|---|
| 1 | Fct1 | ModX | GroupA | 
| 2 | Fct2 | ModX | GroupA | 
| 2 | Fct2 | ModX | GroupB | 
| 3 | Fct3 | ModY | GroupB | 
| 3 | Fct3 | ModY | GroupC | 
| . | … | … | … | 
| 3000 | Fct3000 | ModZ | GroupF | 
My goal is to get a pivot-like table that looks like this one:
| FunctionID | FunctionText | FunctionModule | GroupA | GroupB | GroupC | … | GroupF | 
|---|---|---|---|---|---|---|---|
| 1 | Fct1 | ModX | X | … | |||
| 2 | Fct2 | ModX | X | X | … | ||
| 3 | Fct3 | ModY | X | X | … | ||
| . | … | … | … | … | … | … | … | 
| 3000 | Fct3000 | ModZ | … | X | 
So, the first 3 columns shall stay as is whereas the entries of UserGroup column shall be the added columns of the pivot table.
I tried some approaches like
df2 = df.pivot_table(index=df.iloc[:3],columns='UserGroup',values='UserGroup')
but without success. Maybe I have to use pivot() or stack() or unstack() ? But none of those led me to the desired target table.
Advertisement
Answer
IIUC, you could add an extra column and pivot_table with aggfunc='first':
(df
 .assign(value='x') # or value=df['UserGroup'] if you want the names as value
 .pivot_table(index=list(df.columns[:3]),
              columns='UserGroup',
              values='value',
              aggfunc='first',
              fill_value='',
              )
)
output:
UserGroup GroupA GroupB GroupC FunctionID FunctionText FunctionModule 1 Fct1 ModX x 2 Fct2 ModX x x 3 Fct3 ModY x x
NB. note that index=df.iloc[:3] will also select ‘UserGroup’, I used index=list(df.columns[:3]) here