I have this pandas df with 2 columns
JavaScript
x
10
10
1
target drugs
2
3
0 ACE2 gene [angiotensin II,rosiglitazone,irbesartan,valsar]
4
1 Elastases [heparin,prednisolone,montelukast,formoterol]
5
2 MAPK14 protein [oxaprozin,nilotinib,imatinib,tocilizumab]
6
3 TMPRSS2 gene [enzalutamide,camostat]
7
4 Toll-like receptors [rituximab,atorvastatin,artesunate,tazarotene]
8
5 Ubiquitin [sunitinib,lapatinib,atorvastatin,edaravone]
9
6 ezrin [erlotinib,crizotinib,sorafenib,everolimus]
10
and I want to create a plot that clusters the drugs into their target, so there will be 7 clusters (7 targets) , I am not sure how to do it..
This is the df:
JavaScript
1
7
1
import pandas as pd
2
3
data = {'target': ['ACE2 gene', 'Elastases', 'MAPK14 protein, human', 'TMPRSS2 gene', 'Toll-like receptors', 'Ubiquitin' , 'ezrin'],'drugs': [['angiotensin II','rosiglitazone','irbesartan'], ['heparin','prednisolone','montelukast','formoterol'] , ['oxaprozin','nilotinib','imatinib','tocilizumab'] , ['enzalutamide','camostat'] , ['rituximab','atorvastatin','artesunate','tazarotene'] , ['sunitinib','lapatinib','atorvastatin','edaravone'], ['erlotinib','crizotinib','sorafenib','everolimus']]
4
}
5
6
df = pd.DataFrame(data)
7
Advertisement
Answer
You can plot scatterplot
with seaborn
like below: (Because you say in the comments of other answer, You have problem with answer, I send another answer with another approach.)
JavaScript
1
28
28
1
import matplotlib.pyplot as plt
2
import pandas as pd
3
from itertools import chain
4
import seaborn as sns
5
6
df = pd.DataFrame(data = {'target': ['ACE2 gene', 'Elastases', 'MAPK14 protein, human', 'TMPRSS2 gene', 'Toll-like receptors', 'Ubiquitin' , 'ezrin'],
7
'drugs': [['angiotensin II','rosiglitazone','irbesartan'], ['heparin','prednisolone','montelukast','formoterol'] ,
8
['oxaprozin','nilotinib','imatinib','tocilizumab'] , ['enzalutamide','camostat'] ,
9
['rituximab','atorvastatin','artesunate','tazarotene'] , ['sunitinib','lapatinib','atorvastatin','edaravone'],
10
['erlotinib','crizotinib','sorafenib','everolimus']]})
11
12
df['times'] = df['drugs'].apply(lambda x : len(x))
13
df = df.loc[df.index.repeat(df['times'])].reset_index(drop=True)
14
df['drug'] = df.groupby('target')['drugs'].transform(lambda x: list(y[idx] for idx, y in enumerate(x)))
15
df = df.drop(['drugs','times'], axis=1)
16
df['unq_id'] = df.index+1
17
18
19
fig, axe = plt.subplots(figsize=(20,10))
20
axe.axis('off')
21
sns.scatterplot(data=df, x="unq_id", y="target", hue="target", ax= axe, s=1000)
22
for _, point in df.iterrows():
23
axe.text(point['unq_id']-0.2, point['target'], point['drug'], rotation=45, size=18)
24
25
plt.setp(axe.get_legend().get_texts(), fontsize='22') # for legend text
26
plt.setp(axe.get_legend().get_title(), fontsize='32') # for legend title
27
plt.show()
28
Output: