I am trying to build a feature in a Bokeh dashboard which allows the user to cluster data. I am using the following example as a template, here is the link:- Clustering in Bokeh example
Here is the code from this example:-
import numpy as np from sklearn import cluster, datasets from sklearn.preprocessing import StandardScaler from bokeh.layouts import column, row from bokeh.plotting import figure, output_file, show print("nn*** This example may take several seconds to run before displaying. ***nn") N = 50000 PLOT_SIZE = 400 # generate datasets. np.random.seed(0) noisy_circles = datasets.make_circles(n_samples=N, factor=.5, noise=.04) noisy_moons = datasets.make_moons(n_samples=N, noise=.05) centers = [(-2, 3), (2, 3), (-2, -3), (2, -3)] blobs1 = datasets.make_blobs(centers=centers, n_samples=N, cluster_std=0.4, random_state=8) blobs2 = datasets.make_blobs(centers=centers, n_samples=N, cluster_std=0.7, random_state=8) colors = np.array([x for x in ('#00f', '#0f0', '#f00', '#0ff', '#f0f', '#ff0')]) colors = np.hstack([colors] * 20) # create clustering algorithms dbscan = cluster.DBSCAN(eps=.2) birch = cluster.Birch(n_clusters=2) means = cluster.MiniBatchKMeans(n_clusters=2) spectral = cluster.SpectralClustering(n_clusters=2, eigen_solver='arpack', affinity="nearest_neighbors") affinity = cluster.AffinityPropagation(damping=.9, preference=-200) # change here, to select clustering algorithm (note: spectral is slow) algorithm = dbscan # <- SELECT ALG plots =[] for dataset in (noisy_circles, noisy_moons, blobs1, blobs2): X, y = dataset X = StandardScaler().fit_transform(X) # predict cluster memberships algorithm.fit(X) if hasattr(algorithm, 'labels_'): y_pred = algorithm.labels_.astype(int) else: y_pred = algorithm.predict(X) p = figure(output_backend="webgl", title=algorithm.__class__.__name__, width=PLOT_SIZE, height=PLOT_SIZE) p.circle(X[:, 0], X[:, 1], color=colors[y_pred].tolist(), alpha=0.1,) plots.append(p) # generate layout for the plots layout = column(row(plots[:2]), row(plots[2:])) output_file("clustering.html", title="clustering with sklearn") show(layout)
The example allows the user to cluster data. Within the code, you can specify which algorithm to use; in the code pasted above, the algorithm is dbscan. I tried to modify the code so that I can add in a widget which would allow the user to specify the algorithm to use :-
from bokeh.models.annotations import Label import numpy as np from sklearn import cluster, datasets from sklearn.preprocessing import StandardScaler from bokeh.layouts import column, row from bokeh.plotting import figure, output_file, show from bokeh.models import CustomJS, Select print("nn*** This example may take several seconds to run before displaying. ***nn") N = 50000 PLOT_SIZE = 400 # generate datasets. np.random.seed(0) noisy_circles = datasets.make_circles(n_samples=N, factor=.5, noise=.04) noisy_moons = datasets.make_moons(n_samples=N, noise=.05) centers = [(-2, 3), (2, 3), (-2, -3), (2, -3)] blobs1 = datasets.make_blobs(centers=centers, n_samples=N, cluster_std=0.4, random_state=8) blobs2 = datasets.make_blobs(centers=centers, n_samples=N, cluster_std=0.7, random_state=8) colors = np.array([x for x in ('#00f', '#0f0', '#f00', '#0ff', '#f0f', '#ff0')]) colors = np.hstack([colors] * 20) # create clustering algorithms dbscan = cluster.DBSCAN(eps=.2) birch = cluster.Birch(n_clusters=2) means = cluster.MiniBatchKMeans(n_clusters=2) spectral = cluster.SpectralClustering(n_clusters=2, eigen_solver='arpack', affinity="nearest_neighbors") affinity = cluster.AffinityPropagation(damping=.9, preference=-200) kmeans = cluster.KMeans(n_clusters=2) ############################select widget for different clustering algorithms############ menu =[('DBSCAN','dbscan'),('Birch','birch'),('MiniBatchKmeans','means'),('Spectral','spectral'),('Affinity','affinity'),('K-means','kmeans')] select = Select(title="Option:", value="DBSCAN", options=menu) select.js_on_change("value", CustomJS(code=""" console.log('select: value=' + this.value, this.toString()) """)) # change here, to select clustering algorithm (note: spectral is slow) algorithm = select.value ############################################################ plots =[] for dataset in (noisy_circles, noisy_moons, blobs1, blobs2): X, y = dataset X = StandardScaler().fit_transform(X) # predict cluster memberships algorithm.fit(X) if hasattr(algorithm, 'labels_'): y_pred = algorithm.labels_.astype(int) else: y_pred = algorithm.predict(X) p = figure(output_backend="webgl", title=algorithm.__class__.__name__, width=PLOT_SIZE, height=PLOT_SIZE) p.circle(X[:, 0], X[:, 1], color=colors[y_pred].tolist(), alpha=0.1,) plots.append(p) # generate layout for the plots layout = column(select,row(plots[:2]), row(plots[2:])) output_file("clustering.html", title="clustering with sklearn") show(layout)
However, I get this error when I try to run it:-
AttributeError: 'str' object has no attribute 'fit'
Can anyone tell me what I am missing in order to fix this?
Also, and if not too hard to do, I would like to add in a numeric input widget which allows the user to select the number of clusters for each algorithm to find. Suggestions?
Many thanks :)
EDIT
Here is the current state of the code with @Tony solution.
''' Example inspired by an example from the scikit-learn project: http://scikit-learn.org/stable/auto_examples/cluster/plot_cluster_comparison.html ''' #https://github.com/bokeh/bokeh/blob/branch-2.4/examples/webgl/clustering.py from bokeh.models.annotations import Label import numpy as np from sklearn import cluster, datasets from sklearn.preprocessing import StandardScaler from bokeh.layouts import column, row from bokeh.plotting import figure, output_file, show from bokeh.models import CustomJS, Select print("nn*** This example may take several seconds to run before displaying. ***nn") N = 50000 PLOT_SIZE = 400 # generate datasets. np.random.seed(0) noisy_circles = datasets.make_circles(n_samples=N, factor=.5, noise=.04) noisy_moons = datasets.make_moons(n_samples=N, noise=.05) centers = [(-2, 3), (2, 3), (-2, -3), (2, -3)] blobs1 = datasets.make_blobs(centers=centers, n_samples=N, cluster_std=0.4, random_state=8) blobs2 = datasets.make_blobs(centers=centers, n_samples=N, cluster_std=0.7, random_state=8) colors = np.array([x for x in ('#00f', '#0f0', '#f00', '#0ff', '#f0f', '#ff0')]) colors = np.hstack([colors] * 20) # create clustering algorithms dbscan = cluster.DBSCAN(eps=.2) birch = cluster.Birch(n_clusters=2) means = cluster.MiniBatchKMeans(n_clusters=2) spectral = cluster.SpectralClustering(n_clusters=2, eigen_solver='arpack', affinity="nearest_neighbors") affinity = cluster.AffinityPropagation(damping=.9, preference=-200) kmeans = cluster.KMeans(n_clusters=2) menu =[('DBSCAN','dbscan'),('Birch','birch'),('MiniBatchKmeans','means'),('Spectral','spectral'),('Affinity','affinity'),('K-means','kmeans')] select = Select(title="Option:", value="DBSCAN", options=menu) select.js_on_change("value", CustomJS(code=""" console.log('select: value=' + this.value, this.toString()) """)) # change here, to select clustering algorithm (note: spectral is slow) #algorithm = select.value algorithm = None if select.value == 'dbscan': algorithm = dbscan # use dbscan algorithm function elif select.value == 'birch': algorithm = birch # use birch algorithm function elif select.value == 'means': algorithm = means # use means algorithm function elif select.value == 'spectral': algorithm = spectral elif select.value == 'affinity': algorithm = affinity elif select.value == 'kmeans': algorithm = 'kmeans' if algorithm is not None: plots =[] for dataset in (noisy_circles, noisy_moons, blobs1, blobs2): X, y = dataset X = StandardScaler().fit_transform(X) # predict cluster memberships algorithm.fit(X) ######################This is what appears to be the problem###################### if hasattr(algorithm, 'labels_'): y_pred = algorithm.labels_.astype(int) else: y_pred = algorithm.predict(X) p = figure(output_backend="webgl", title=algorithm.__class__.__name__, width=PLOT_SIZE, height=PLOT_SIZE) p.circle(X[:, 0], X[:, 1], color=colors[y_pred].tolist(), alpha=0.1,) plots.append(p) else: print('Please select an algorithm first') # generate layout for the plots layout = column(select,row(plots[:2]), row(plots[2:])) output_file("clustering.html", title="clustering with sklearn") show(layout)
See algorithm.fit(X)
this is where the error occurs.
Error message:-
AttributeError: 'NoneType' object has no attribute 'fit' --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) m:bokehdashclusteringbokeh.py in 67 68 # predict cluster memberships ---> 69 algorithm.fit(X) 70 if hasattr(algorithm, 'labels_'): 71 y_pred = algorithm.labels_.astype(int) AttributeError: 'NoneType' object has no attribute 'fit'
Advertisement
Answer
I don’t know sklearn
but comparing both your examples I can see the following:
- the
Select
is a Bokeh model which hasvalue
attribute of typestring
. Soselect.value
is a string - the
dbscan
is an algorithm function
So when you do algorithm = dbscan
you assign an algorithm function to your algorithm
variable and when you do algorithm = select.value
in your second example you assign just a string to it so it won’t work because string
doesn’t have the fit()
function. You should do something like this:
algorithm = None if select.value == 'DBSCAN': algorithm = dbscan # use dbscan algorithm function elif select.value == 'Birch': algorithm = birch # use birch algorithm function elif select.value == 'MiniBatchKmeans': algorithm = means # use means algorithm function etc... if algorithm is not None: plots =[] for dataset in (noisy_circles, noisy_moons, blobs1, blobs2): ... else: print('Please select an algorithm first')