Skip to content
Advertisement

Add row values as new columns in a dataframe

I have a dataframe like follows:

A     B     C
a1    b1    c1
a1    b2    c2
a2    b1    c3
a2    b2    c4

I want to convert it to something like:

A     B1        
a1    c1        
a2    c3        

or

A     B1   B2     
a1    c1   c2     
a2    c3   c4     

The values in B which are repeated are column names whose values are in C column. I want to add all or some of them as the columns of the dataframe

In fact the dataset is created by flattening a tree, there are more columns each of which is an internal node. the first column is root, C are leaves

These are some tries by me:

desired_cols = ["b1","b2"] # desired cols 
df = main_df[main_df['B'] == "b1"]
for c in desired_cols:
    new_col = main_df[main_df['B'] == c]['C']
    df[c] = new_col

However, it doesn’t work and new columns are not aligned vertically. It adds nan in new columns.

Advertisement

Answer

Try:

df.pivot('A', columns='B')

You will get only two rows, but your four rows are 2 rows duplicated anyhow.

    A   C
B       b1  b2
0   a1  c1  c2
1   a2  c3  c4

Update

If you want to dynamically select your pivot index, you can select all columns present in your df, except ‘B’ and ‘C’ as such:

df2 = df.pivot(index=list(df.columns[~df.columns.isin(['B', 'C'])]), columns='B')

Update 2

If you want to only create specific columns from values in B, slice the dataframe first:

df2 = df[df.B.isin(desired_cols)].pivot('A', columns='B').reset_index()

Where desired_cols = ['b1'] or desired_cols = ['b1', 'b2']

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement