I have a big dataframe which includes 30 samples, measured one every 6 sec over days. It looks something like this:
DATE_TIME | SAMPLE | VALUE |
---|---|---|
2020-12-10 10:52:48 | 1 | 3.22 |
2020-12-10 10:52:54 | 2 | 2.93 |
2020-12-10 10:53:00 | 3 | 2.27 |
… | … | … |
2020-12-10 16:27:13 | 1 | 1.66 |
2020-12-10 16:27:19 | 2 | 1.15 |
2020-12-10 16:27:25 | 3 | 1.23 |
I want to plot the time series for each individual sample (multiple line chart). I tried:
import pandas as pd import numpy as np import matplotlib.pyplot as plt all_data = pd.read_csv("data.csv") time_df=pd.DataFrame({'x':all_data['DATE_TIME'],'y1':all_data['SAMPLE']==1,'y2':all_data['SAMPLE']==2}) plt.plot('x','y1', data=time_df, marker= 'o',markerfacecolor='blue', markersize=1, color='skyblue', linewidth=4) plt.plot('x','y2', data=time_df, marker= 'o',markerfacecolor='green', markersize=1, color='skyblue', linewidth=4) plt.show()
But it’s not working, I get a strange figure:
I also tried making individual dataframes for the samples and it works but I’m sure there must be a more efficient way to do this.
SAMPLE1_df=all_data.loc[all_data["SAMPLE"] == 1] SAMPLE2_df_df=all_data.loc[all_data["SAMPLE"] == 2] fig = go.Figure() fig.add_trace(go.Scatter(x=SAMPLE1_df_df["DATE_TIME"], y=SAMPLE1_df["VALUE"], mode='lines', name= "SAMPLE1")) fig.add_trace(go.Scatter(x=SAMPLE2_df_df["DATE_TIME"], y=SAMPLE2_df["VALUE"], mode='lines', name= "SAMPLE2")) fig.show()
Advertisement
Answer
If you have plot a dataframe with several columns, you get the desired result. You can transform your dataframe to such by groupby
or set_index
:
all_data.groupby(["DATE_TIME", "SAMPLE"])["VALUE"].mean().unstack("SAMPLE").interpolate(method='linear').plot()
or, if you do not have duplicates
all_data.set_index(["DATE_TIME", "SAMPLE"])["VALUE"].unstack("SAMPLE").interpolate(method='linear').plot()