My Dataset
- In numpy array
- np.shape(data)-> (6989, 4)
- stats.describe(data)-> DescribeResult(nobs=6989, minmax=(array([0., 0., 0., 0.]), array([ 299.99, 86785. , 10997. , 13222. ])), mean=array([ 12.47994992, 3407.00243239, 27.23293747, 109.72370869]), variance=array([1.42652452e+02, 4.71755188e+07, 6.17027586e+04, 2.92787820e+05]), skewness=array([ 4.27783176, 4.50762479, 31.57678605, 15.68962365]), kurtosis=array([ 58.23586935, 27.33838487, 1163.74537023, 302.6384056 ]))
- stats.describe(clusterer.labels_)-> DescribeResult(nobs=6989, minmax=(array([0., 0., 0., 0.]), array([ 299.99, 86785. , 10997. , 13222. ])), mean=array([ 12.47994992, 3407.00243239, 27.23293747, 109.72370869]), variance=array([1.42652452e+02, 4.71755188e+07, 6.17027586e+04, 2.92787820e+05]), skewness=array([ 4.27783176, 4.50762479, 31.57678605, 15.68962365]), kurtosis=array([ 58.23586935, 27.33838487, 1163.74537023, 302.6384056 ]))
- np.shape(clusterer.labels_)-> (6989,)
Original Dataset
- https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html
- np.shape(data_original)-> (1797, 64)
- np.shape(clusterer.labels_)-> 1797
- stats.describe(clusterer.labels_)-> DescribeResult(nobs=1797, minmax=(-1, 9), mean=1.555370061213133, variance=9.243730890261299, skewness=0.8760784771049832, kurtosis=-0.4263956978117518)
CODE Original guide that I am following all code
color_palette = sns.color_palette('Paired', 12)
cluster_colors = [color_palette[x] if x >= 0
                  else (0.5, 0.5, 0.5)
                  for x in clusterer.labels_]
cluster_member_colors = [sns.desaturate(x, p) for x, p in zip(cluster_colors, clusterer.probabilities_)]
plt.scatter(*projection.T, 
            s=20, 
            linewidth=0, 
            c=cluster_member_colors, 
            alpha=0.25)
ERROR
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-175-64c069b8643a> in <module>
      2 cluster_colors = [color_palette[x] if x >= 0
      3                   else (0.5, 0.5, 0.5)
----> 4                   for x in clusterer.labels_]
      5 cluster_member_colors = [sns.desaturate(x, p) for x, p in zip(cluster_colors, clusterer.probabilities_)]
      6 plt.scatter(*projection.T, 
<ipython-input-175-64c069b8643a> in <listcomp>(.0)
      2 cluster_colors = [color_palette[x] if x >= 0
      3                   else (0.5, 0.5, 0.5)
----> 4                   for x in clusterer.labels_]
      5 cluster_member_colors = [sns.desaturate(x, p) for x, p in zip(cluster_colors, clusterer.probabilities_)]
      6 plt.scatter(*projection.T, 
IndexError: list index out of range
Tried Solutions
- I have no nan values in my dataset I have tried print(np.isnan( np.sum(clusterer.labels_)))ant it wasFalse
- I can see here what is programmatically the problem that my code array starts with 0 numbering the elements. The issue is that the same code has been used with both mine and the original dataset. And it gives no error with the original dataset and it gives error with mine. – https://stackoverflow.com/a/1098660/10270590
Advertisement
Answer
The issue was solved by adding more colors. Ex.:
color_palette = sns.color_palette('Paired', 1000)
