I need to create a graph from data with python.
I took my inspiration from various website and I’ve made this script :
import plotly.express as px import plotly.graph_objs as go import statsmodels.api as sm value = [1, 2, 3, 4, 5, 5, 5, 6, 6, 7, 8] date = [ 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020] fig = px.scatter(x=date, y=value ) fig.add_trace(go.Scatter(x=date, y=value, mode='lines',name='MB Used' )) trend = sm.OLS(value,sm.add_constant(date)).fit().fittedvalues fig.add_traces(go.Scatter(x=date, y=trend,mode = 'lines', name='trendline')) fig
This script allow to generate this graph :
For the x axe, I would like to display the value like that 2020-01-01-06:00
but when I change my list like that :
date = [ 2020-01-01-06:00, 2020-01-01-12:00, 2020-01-01-18:00, 2020-01-02-06:00, 2020-01-02-12:00, 2020-01-02-18:00, 2020-01-03-06:00, 2020-01-03-12:00, 2020-01-03-18:00, 2020-01-04-06:00, 2020-01-04-12:00 ]
The error is :
File "<ipython-input-13-4958920545c3>", line 6 date = [ 2020-01-01-06:00, 2020-01-01-12:00, 2020-01-01-18:00, 2020-01-02-06:00, 2020-01-02-12:00, 2020-01-02-18:00, 2020-01-03-06:00, 2020-01-03-12:00, 2020-01-03-18:00, 2020-01-04-06:00, 2020-01-04-12:00 ] ^ SyntaxError: invalid token
If I try that :
date = [ '2020-01-01-06:00', '2020-01-01-12:00', '2020-01-01-18:00', '2020-01-02-06:00', '2020-01-02-12:00', '2020-01-02-18:00', '2020-01-03-06:00', '2020-01-03-12:00', '2020-01-03-18:00', '2020-01-04-06:00', '2020-01-04-12:00' ]
The error is :
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-15-e06e438ca2eb> in <module> 10 fig.add_trace(go.Scatter(x=date, y=value, mode='lines',name='MB Used' )) 11 ---> 12 trend = sm.OLS(value,sm.add_constant(date)).fit().fittedvalues 13 14 fig.add_traces(go.Scatter(x=date, y=trend,mode = 'lines', name='trendline')) ~/.local/lib/python3.6/site-packages/statsmodels/tools/tools.py in add_constant(data, prepend, has_constant) 303 raise ValueError('Only implementd 2-dimensional arrays') 304 --> 305 is_nonzero_const = np.ptp(x, axis=0) == 0 306 is_nonzero_const &= np.all(x != 0.0, axis=0) 307 if is_nonzero_const.any(): <__array_function__ internals> in ptp(*args, **kwargs) ~/.local/lib/python3.6/site-packages/numpy/core/fromnumeric.py in ptp(a, axis, out, keepdims) 2541 else: 2542 return ptp(axis=axis, out=out, **kwargs) -> 2543 return _methods._ptp(a, axis=axis, out=out, **kwargs) 2544 2545 ~/.local/lib/python3.6/site-packages/numpy/core/_methods.py in _ptp(a, axis, out, keepdims) 228 def _ptp(a, axis=None, out=None, keepdims=False): 229 return um.subtract( --> 230 umr_maximum(a, axis, None, out, keepdims), 231 umr_minimum(a, axis, None, None, keepdims), 232 out TypeError: cannot perform reduce with flexible type
Please, could you show me how to change that ?
Advertisement
Answer
The answer:
In the following code snippet I’ve replaced your dates with floats following this approach to serialize timestamps. This way you can use your dates both as input to sm.OLS
and as one of a few more steps to get your dates displayed in the figure with your desired format.
The plot:
The details:
There are several reasons why you are not getting your desired result in your provided code snippet. First of all, none of the attempts of constuctring lists of date and time values are easily recognizable by the functions you are applying here. In date = [ '2020-01-01-06:00', '2020-01-01-12:00',...]
you should remove one of the hyphens to get ['2020-01-01 06:00', '2020-01-01 12:00'...]
instead. But even with a more widely recognizable list of timestamps, statsmodels will to my knowledge not accept those in sm.OLS()
. And in the end, applying sensible labels to non-standard x-axis tickmarks can be (one of very few) real challenges in plotly.
Please not that the irregegular appearances of gridlines reflect the structure of your data. You’re missing observations for timestamps that end with 00-00-00
to represent a 24 hour cycle.
The code:
# imports import plotly.express as px import plotly.graph_objs as go import statsmodels.api as sm import datetime as dt # data value = [1, 2, 3, 4, 5, 5, 5, 6, 6, 7, 8] date = [ 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020] date_h = ['2020-01-01 06:00', '2020-01-01 12:00', '2020-01-01 18:00', '2020-01-02 06:00', '2020-01-02 12:00', '2020-01-02 18:00', '2020-01-03 06:00', '2020-01-03 12:00', '2020-01-03 18:00', '2020-01-04 06:00', '2020-01-04 12:00' ] # organize data in a pandas dataframe df = pd.DataFrame({'value':value, 'date':date, 'date_h':pd.to_datetime(date_h)}) # function to serilaize irregular timestmps def serial_date(date1): temp = dt.datetime(1899, 12, 30) # Note, not 31st Dec but 30th! delta = date1 - temp return float(delta.days) + (float(delta.seconds) / 86400) df['date_s'] = [serial_date(d) for d in df['date_h']] # set up base figure fig = px.scatter(x=df['date_s'], y=df['value'] ) fig.add_trace(go.Scatter(x=df['date_s'], y=df['value'], mode='lines',name='MB Used' )) # setup for linear regression using sm.OLS Y=df['value'] independent=['date_s'] X=df[independent] X=sm.add_constant(X) # estimate trend trend = sm.OLS(Y,X).fit().fittedvalues # add trendline to figure fig.add_traces(go.Scatter(x=df['date_s'], y=trend,mode = 'lines', name='trendline')) # specify tick0, tickvals and ticktext to achiece desired x-axis format fig.update_layout(yaxis=dict(title=''), xaxis=dict(title='', tick0= df['date_s'].iloc[0], tickvals= df['date_s'], ticktext = df['date_h']) ) fig.show()