I have a dataframe which looks like below,
Here is the same data in table format which you can copy/paste,
JavaScript
x
6
1
SourceName SourceType Edge TargetName TargetType
2
cardiac myosin DISEASE induce myocarditis DISEASE
3
cardiac myosin DISEASE induce heart disease DISEASE
4
nitric CHEMICAL inhibit chrysin CHEMICAL
5
peptide magainin CHEMICAL exhibited tumor DISEASE
6
Here is the same data in dictionary format which you can copy/paste,
JavaScript
1
13
13
1
{'id': [1, 2, 3, 4],
2
'SourceName': ['cardiac myosin',
3
'cardiac myosin',
4
'nitric',
5
'peptide magainin'],
6
'SourceType': ['DISEASE', 'DISEASE', 'CHEMICAL', 'CHEMICAL'],
7
'Edge': ['induce', 'induce', 'inhibit', 'exhibited'],
8
'TargetName': ['myocarditis',
9
'heart disease',
10
'chrysin',
11
'tumor'],
12
'TargetType': ['DISEASE', 'DISEASE', 'CHEMICAL', 'DISEASE']}
13
I tried using below code, but some of the SourceName was having wrong type, eg ‘peptide magainin’ should be a CHEMICAL, but it comes under DISEASE which is incorrect.
JavaScript
1
3
1
df1 = df.groupby(["id","SourceType","TargetType"])['SourceName', 'Edge', 'TargetName'].aggregate(lambda x: x).unstack().reset_index()
2
df1.columns=df1.columns.tolist()
3
Sample output which is incorrect, can someone help me with this, thanks.
Expected output:
Advertisement
Answer
I don’t understand exactly what you try to achieve with the new structure, but it can be done by grouping once by “SourceType” and once by “TargetType”, then merging the resulting dataframes:
JavaScript
1
17
17
1
source_df = pd.DataFrame()
2
target_df = pd.DataFrame()
3
4
for s, sub_df in df.groupby('SourceType'):
5
source_sub_df = sub_df[['id', 'SourceName']]
6
source_sub_df.columns = ['id', f'SourceType_{s}']
7
source_df = pd.concat([source_df, source_sub_df])
8
9
for t, sub_df in df.groupby('TargetType'):
10
target_sub_df = sub_df[['id', 'Edge', 'TargetName']]
11
target_sub_df.columns = ['id', 'Edge', f'TargetType_{t}']
12
target_df = pd.concat([target_df, target_sub_df])
13
14
df_out = source_df.merge(target_df, on='id').sort_values('id').reset_index(drop=True)
15
16
print(df_out)
17
Output:
JavaScript
1
6
1
id SourceType_CHEMICAL SourceType_DISEASE Edge TargetType_CHEMICAL TargetType_DISEASE
2
0 1 NaN cardiac myosin induce NaN myocarditis
3
1 2 NaN cardiac myosin induce NaN heart disease
4
2 3 nitric NaN inhibit chrysin NaN
5
3 4 peptide magainin NaN exhibited NaN tumor
6