I have a dataframe like follows:
A B C a1 b1 c1 a1 b2 c2 a2 b1 c3 a2 b2 c4
I want to convert it to something like:
A B1 a1 c1 a2 c3
or
A B1 B2 a1 c1 c2 a2 c3 c4
The values in B which are repeated are column names whose values are in C column. I want to add all or some of them as the columns of the dataframe
In fact the dataset is created by flattening a tree, there are more columns each of which is an internal node. the first column is root, C are leaves
These are some tries by me:
desired_cols = ["b1","b2"] # desired cols df = main_df[main_df['B'] == "b1"] for c in desired_cols: new_col = main_df[main_df['B'] == c]['C'] df[c] = new_col
However, it doesn’t work and new columns are not aligned vertically. It adds nan
in new columns.
Advertisement
Answer
Try:
df.pivot('A', columns='B')
You will get only two rows, but your four rows are 2 rows duplicated anyhow.
A C B b1 b2 0 a1 c1 c2 1 a2 c3 c4
Update
If you want to dynamically select your pivot index, you can select all columns present in your df, except ‘B’ and ‘C’ as such:
df2 = df.pivot(index=list(df.columns[~df.columns.isin(['B', 'C'])]), columns='B')
Update 2
If you want to only create specific columns from values in B, slice the dataframe first:
df2 = df[df.B.isin(desired_cols)].pivot('A', columns='B').reset_index()
Where desired_cols = ['b1']
or desired_cols = ['b1', 'b2']