Skip to content
Advertisement

Make dummy variable for categorical data, based on ID column with duplicate values in python

I have the following pandas dataframe:

    ID    value
0   1     A
1   1     B
2   1     C
3   2     B
4   10    C
5   4     C
6   4     A

I want to make dummy variables for the values in the column ‘value’, for each value in the column ‘ID’. So I want it this:

    ID    A    B    C
0   1     1    1    1
1   2     0    1    0
2   10    0    0    1
3   4     1    0    1

How can I do this in python?

Advertisement

Answer

Use crosstab with limit counts to 1 by DataFrame.clip:

df1  = (pd.crosstab(df['ID'], df['value'])
          .clip(upper=1)
          .reset_index()
          .rename_axis(None, axis=1))
print (df1)
   ID  A  B  C
0   1  1  1  1
1   2  0  1  0
2   4  1  0  1
3  10  0  0  1
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement