I have a big dataframe which includes 30 samples, measured one every 6 sec over days. It looks something like this: DATE_TIME SAMPLE VALUE 2020-12-10 10:52:48 1 3.22 2020-12-10 10:52:54 2 2.93 2020-12-10 10:53:00 3 2.27 ... ... ... 2020-12-10 16:27:13 1 1.66 2020-12-10 16:27:19 2 1.15 2020-12-10 16:27:25 3 1.23 I want to plot the time series for each

Multiple lines chart from dataframe with looping samples

I have a big dataframe which includes 30 samples, measured one every 6 sec over days. It looks something like this:

DATE_TIME	SAMPLE	VALUE
2020-12-10 10:52:48	1	3.22
2020-12-10 10:52:54	2	2.93
2020-12-10 10:53:00	3	2.27
…	…	…
2020-12-10 16:27:13	1	1.66
2020-12-10 16:27:19	2	1.15
2020-12-10 16:27:25	3	1.23

I want to plot the time series for each individual sample (multiple line chart). I tried:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
all_data = pd.read_csv("data.csv")

time_df=pd.DataFrame({'x':all_data['DATE_TIME'],'y1':all_data['SAMPLE']==1,'y2':all_data['SAMPLE']==2})
plt.plot('x','y1', data=time_df, marker= 'o',markerfacecolor='blue', markersize=1, color='skyblue', linewidth=4)
plt.plot('x','y2', data=time_df, marker= 'o',markerfacecolor='green', markersize=1, color='skyblue', linewidth=4)
plt.show()

But it’s not working, I get a strange figure: bad figure

I also tried making individual dataframes for the samples and it works but I’m sure there must be a more efficient way to do this.

SAMPLE1_df=all_data.loc[all_data["SAMPLE"] == 1]
SAMPLE2_df_df=all_data.loc[all_data["SAMPLE"] == 2]

fig = go.Figure()
fig.add_trace(go.Scatter(x=SAMPLE1_df_df["DATE_TIME"], y=SAMPLE1_df["VALUE"], mode='lines', name= "SAMPLE1"))
fig.add_trace(go.Scatter(x=SAMPLE2_df_df["DATE_TIME"], y=SAMPLE2_df["VALUE"], mode='lines', name= "SAMPLE2"))
fig.show()

idea of the figure I want

Answer

If you have plot a dataframe with several columns, you get the desired result. You can transform your dataframe to such by groupby or set_index:

all_data.groupby(["DATE_TIME", "SAMPLE"])["VALUE"].mean().unstack("SAMPLE").interpolate(method='linear').plot()

or, if you do not have duplicates

all_data.set_index(["DATE_TIME", "SAMPLE"])["VALUE"].unstack("SAMPLE").interpolate(method='linear').plot()

Advertisement

Answer