Goal
Plot subsets of rows in a Pandas DataFrame by selecting a specific value of a column.
Ideally plot it in jupyter notebook.
What I did
I have minimal knowledge of Javascript, so I have managed to plot by running Bokeh server with everything written in Python.
However, I couldn’t make it in Jupyter notebook with a Javascript callback. My way of doing it sounds a bit stupid: splitting the DataFrame into subsets by values of a column and putting them into a dict, then I can select a subset by the active selection from a RadioGroup.
This is my code example:
import pandas as pd import bokeh from bokeh.io import output_notebook, show import bokeh.plotting as bp import bokeh.models as bm from bokeh.layouts import column, row data = { 'Datetime': ['2020-04-09T10:23:38Z', '2020-04-09T22:23:38Z','2020-04-09T23:23:38Z', '2020-01-09T10:23:38Z', '2020-01-09T22:23:38Z', '2020-01-09T23:23:38Z'], 'Month': ['Apr', 'Apr', 'Apr', 'Jan', 'Jan', 'Jan'], 'Values': [1.2, 1.3, 1.5, 1.1, 3, 1.3] } df = pd.DataFrame.from_dict(data) month_list = df['Month'].unique().tolist() plot_height = 600 plot_width = 1000 col2plot = 'Values' month_dict = {} for m in month_list: subset = df[df['Month'] == m].reset_index(drop=True) month_dict[m] = subset[['Datetime', col2plot]].to_dict() p1 = bp.figure( plot_height=plot_height, plot_width=plot_width, title='Values', toolbar_location=None, tools="hover", tooltips=[("DateTime", "@Datetime")] ) src = bm.ColumnDataSource(df[df['Month'] == 'Jan'].reset_index(drop=True)) p1.line(x='index', y=col2plot, alpha=0.8, source=src) month_selector = bm.widgets.RadioGroup(labels=month_list, active=1) jscode = """ var month = cb_obj.labels[cb_obj.active] //selected month const new_data = source[month] src.data = new_data src.change.omit() """ callback = bm.CustomJS(args=dict(src=src, source=month_dict), code=jscode) month_selector.js_on_change('active', callback) output_notebook() show(row(p1, month_selector))
The code runs but by selecting a certain month, the plot isn’t updating. This is probably due to the bad handling of the JS callback, any ideas for fixing this? Thanks a lot for your help!
Advertisement
Answer
Issues with your code:
- In
p.line
, you’re using theindex
column. But when you callpd.DataFrame.to_dict()
, the column is not there. Can be fixed by adding yet another.reset_index()
before.to_dict()
to_dict()
returns data in the form of a dict of dicts, butColumnDataSource
needs a dict of lists. Replace the call withto_dict('list')
src.change.omit()
– a typo here, it should beemit
. But since you’re replacing the wholedata
attribute instead of just changing some of the data, you can simply remove the line altogether