I have the DF of this kind:
pd.DataFrame({'label':['A','test1: A','test2: A','B','test1: B','test3: B'], 'value': [1,2,3,4,5,6]}) label value 0 A 1 1 test1: A 2 2 test2: A 3 3 B 4 4 test1: B 5 5 test3: B 6
And I need to convert to this:
pd.DataFrame({'label':['A','B'], 'value': [1,4], 'test1:':[2,5], 'test2:':[3,None], 'test3:':[None,6]}) label value test1: test2: test3: 0 A 1 2 3.0 NaN 1 B 4 5 NaN 6.0
I need to keep label for unique value and keys are merged to the right if present in the data. Keys may vary and be of different names for one value.
Feel free to share how to rename the question because I could not find the better way to name the problem.
EDIT: Partly this solution contains what I need however there is no decent way to add columns representing key in the label column. Ideally something like a function with df input is needed.
Advertisement
Answer
Extract information into two data frames and merge them.
df2 = df[df['label'].str.contains('test')] df3 = df2['label'].str.split(expand=True).rename(columns={0: "test", 1: "label"}) df3['value'] = df2['value'] df3 = df3.pivot_table(index='label', columns='test', values='value') df2 = df[~df['label'].str.contains('test')] df4 = pd.merge(df2, df3, on='label')
Output
label value test1: test2: test3: 0 A 1 2.0 3.0 NaN 1 B 4 5.0 NaN 6.0