My dataFrame has the following column, which shows pressure and corresponding volume measured for different samples, e.g. s_1p
: pressure for sample-1 & s1_nv
: corresponding volume for the same sample. I want to show all volume columns on the x-axis and pressure on the y-axis of the same plot (not sub-plot) and legend labelled as the sample number.
df=
s1_p s1_nv s9_p s9_nv s21_p s21_nv s26_p s26_nv s32_p s32_nv s37_p s37_nv s49_p s49_nv s52_p s52_nv s105_p s105_nv s118_p s118_nv
0 0.977966 0.000544 0.928902 0.000000 1.140129 0.000000 1.002083 0.000000 0.958008 0.000000 1.301460 0.000000 0.964661 0.000000 0.976303 0.001193 1.002914 0.000246 1.008736 0.000129
1 1.022041 0.001087 0.953850 0.000000 1.175056 0.000153 1.079422 0.000208 0.980461 0.001955 1.328903 0.000000 0.986282 0.000000 1.004578 0.003279 1.034515 0.000246 1.038673 0.000385
2 1.050316 0.001268 0.984619 0.000000 1.204163 0.000153 1.140961 0.000208 1.012062 0.002557 1.357178 0.000000 1.015388 0.000125 1.031189 0.004621 1.056137 0.000246 1.061127 0.000513
3 1.082748 0.001268 1.010399 0.000261 1.224953 0.000153 1.249901 0.000208 1.029526 0.002557 1.382958 0.000191 1.033684 0.000125 1.062790 0.004770 1.085243 0.000493 1.094391 0.000513
4 1.109360 0.001268 1.031189 0.000261 1.247406 0.000153 1.314766 0.000208 1.075264 0.003159 1.407074 0.000381 1.066948 0.000125 1.097717 0.004770 1.136803 0.000493 1.130981 0.000513
5 1.127655 0.001268 1.056969 0.000261 1.277344 0.000306 1.459465 0.000417 1.130150 0.003460 1.446159 0.000381 1.113518 0.000250 1.138466 0.004919 1.160919 0.000739 1.149277 0.000641
6 1.160087 0.001268 1.086075 0.000261 1.302292 0.000459 1.629112 0.000624 1.150108 0.003610 1.472771 0.000381 1.140129 0.000250 1.160088 0.005068 1.225784 0.000739 1.177551 0.000898
7 1.209152 0.001268 1.117676 0.000392 1.328072 0.000459 1.658218 0.000624 1.171730 0.003911 1.514351 0.000571 1.209984 0.000250 1.212479 0.005217 1.293144 0.000739 1.247406 0.000898
8 1.259048 0.001268 1.151772 0.000392 1.370483 0.000612 1.748863 0.000624 1.249069 0.005114 1.555100 0.000571 1.278175 0.000250 1.270691 0.005217 1.372978 0.000739 1.310608 0.000898
9 1.283165 0.001268 1.180878 0.000392 1.399590 0.000612 1.920174 0.000624 1.290649 0.005415 1.575890 0.000571 1.297302 0.000375 1.379631 0.005217 1.420380 0.000986 1.334724 0.000898
10 1.362167 0.001268 1.227448 0.000392 1.426201 0.000612 2.064041 0.000833 1.333893 0.005716 1.602501 0.000761 1.351357 0.000500 1.466949 0.005217 1.592522 0.001232 1.507698 0.001283
11 1.446991 0.001449 1.278175 0.000392 1.475266 0.000612 2.252815 0.000833 1.434517 0.006919 1.635765 0.000761 1.385452 0.000500 1.636597 0.005664 1.757179 0.001232 1.666534 0.001796
12 1.473602 0.001630 1.297302 0.000522 1.541794 0.000765 2.432442 0.000833 1.603333 0.010077 1.698967 0.000761 1.518509 0.000625 1.802917 0.005664 1.778801 0.001726 1.698967 0.001796
13 1.667366 0.001630 1.316429 0.000522 1.639923 0.000765 2.614563 0.000833 1.626617 0.010077 1.790444 0.000761 1.693977 0.000750 1.840340 0.005664 1.800423 0.002218 1.870277 0.002181
14 1.837845 0.001630 1.344704 0.000652 1.712273 0.000919 2.812485 0.000833 1.809570 0.010679 1.828697 0.000761 1.715599 0.000750 1.972565 0.006111 1.988365 0.002958 2.044083 0.002181
15 2.042419 0.001630 1.412063 0.000783 1.861130 0.000919 2.984627 0.000833 1.831192 0.010679 1.856972 0.000761 1.876098 0.000750 2.142212 0.006410 2.167160 0.002958 2.083168 0.002438
16 2.222878 0.001630 1.476929 0.000783 2.029114 0.001531 3.014565 0.001041 2.003334 0.011732 1.964249 0.000951 2.058220 0.001000 2.173813 0.006410 2.209572 0.003204 2.250320 0.002566
17 2.256142 0.001630 1.497719 0.000913 2.052398 0.001531 3.169243 0.001041 2.026619 0.011882 2.134727 0.000951 2.265290 0.001125 2.325165 0.006708 2.385040 0.003451 2.417473 0.002695
18 2.422463 0.001630 1.672356 0.001305 2.163834 0.001531 3.354691 0.001041 2.198761 0.013687 2.299385 0.001142 2.439095 0.001125 2.495644 0.007005 2.556351 0.003697 2.449905 0.002695
When I used the following code, it does the job.
S1_P=df['s1_p']
S1_V=df['s1_nv'] #(similarly for other samples)
plt.plot(S1_P, S1_V, color='r', label='S1')
plt.plot(S9_P, S9_V, color='g', label='S9')
plt.plot(S21_P, S21_V, color='g', label='S21')
But problem is that I have to call all the individual columns as a series and then again and again for the plot.
df.plot(x=["s1_p", 's9_p', 's21_p'] y=["s1_v", 's9_v', 's21_v']) showed error.
I want to automate the process so that I don’t have to call each individual column for the plot.
Any suggestion to plot the data in a single plot using seaborn or matplotlib
Advertisement
Answer
Starting from the dataframe you provided, the simplest way I am aware of drawing the plot you want is re-shape the dataframe in a proper way and then plot it.
Dataframe re-shaping
You need to re-shape your data in a dataframe with 3 columns: sample
, pressure
and volume
. In order to do so, I save data in a new dataframe DF
:
samples = list(set([col.replace('s', '').replace('_p', '').replace('_nv', '') for col in df.columns]))
DF = pd.DataFrame(columns = ['sample', 'pressure', 'volume'])
for sample in samples:
df_tmp = pd.DataFrame()
for col in df.columns:
if f's{sample}_' in col:
df_tmp['sample'] = len(df[col])*[sample]
if col.endswith('p'):
df_tmp['pressure'] = df[col]
else:
df_tmp['volume'] = df[col]
DF = DF.append(df_tmp)
DF['sample'] = DF['sample'].astype(int)
DF = DF.sort_values(by = 'sample', ignore_index = True)
DF['sample'] = DF['sample'].astype(str)
sample pressure volume
0 1 1.127655 0.001268
1 1 0.977966 0.000544
2 1 1.022041 0.001087
3 1 1.050316 0.001268
4 1 1.082748 0.001268
5 1 1.109360 0.001268
6 1 1.160087 0.001268
7 1 1.209152 0.001268
8 1 1.283165 0.001268
9 1 1.259048 0.001268
Complete Code
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv(r'data/data.csv')
samples = list(set([col.replace('s', '').replace('_p', '').replace('_nv', '') for col in df.columns]))
DF = pd.DataFrame(columns = ['sample', 'pressure', 'volume'])
for sample in samples:
df_tmp = pd.DataFrame()
for col in df.columns:
if f's{sample}_' in col:
df_tmp['sample'] = len(df[col])*[sample]
if col.endswith('p'):
df_tmp['pressure'] = df[col]
else:
df_tmp['volume'] = df[col]
DF = DF.append(df_tmp)
DF['sample'] = DF['sample'].astype(int)
DF = DF.sort_values(by = 'sample', ignore_index = True)
DF['sample'] = DF['sample'].astype(str)
fig, ax = plt.subplots()
sns.scatterplot(ax = ax, data = DF, x = 'volume', y = 'pressure', hue = 'sample')
plt.show()
Plot
Now you can plot your data, for example you can use seaborn.scatterplot
:
fig, ax = plt.subplots()
sns.scatterplot(ax = ax, data = DF, x = 'volume', y = 'pressure', hue = 'sample')
plt.show()