Skip to content
Advertisement

How to create columns from rows given key:value pair in the column in pandas?

I have the DF of this kind:

pd.DataFrame({'label':['A','test1: A','test2: A','B','test1: B','test3: B'],
             'value': [1,2,3,4,5,6]})

      label  value
0         A      1
1  test1: A      2
2  test2: A      3
3         B      4
4  test1: B      5
5  test3: B      6

And I need to convert to this:

pd.DataFrame({'label':['A','B'],
             'value': [1,4],
             'test1:':[2,5],
             'test2:':[3,None],
             'test3:':[None,6]})

  label  value  test1:  test2:  test3:
0     A      1       2     3.0     NaN
1     B      4       5     NaN     6.0

I need to keep label for unique value and keys are merged to the right if present in the data. Keys may vary and be of different names for one value.

Feel free to share how to rename the question because I could not find the better way to name the problem.

EDIT: Partly this solution contains what I need however there is no decent way to add columns representing key in the label column. Ideally something like a function with df input is needed.

Advertisement

Answer

Extract information into two data frames and merge them.

df2 = df[df['label'].str.contains('test')]
df3 = df2['label'].str.split(expand=True).rename(columns={0: "test", 1: "label"})
df3['value'] = df2['value']
df3 = df3.pivot_table(index='label', columns='test', values='value')
df2 = df[~df['label'].str.contains('test')]
df4 = pd.merge(df2, df3, on='label')

Output

  label  value  test1:  test2:  test3:
0     A      1     2.0     3.0     NaN
1     B      4     5.0     NaN     6.0
Advertisement