I’m in my way of studing anomaly detection for speech data. My original code written with LSTM, but I’m in faced to imbalance dataset. So I’m trying to have some insights from Pyod.
On trying from Pyod sampling data, I just copied and pasted their code to my colab, but I encounter error as “ValueError: ‘c’ argument has 1000 elements, which is inconsistent with ‘x’ and ‘y’ with size 500.”
import numpy as np import pandas as pd import matplotlib.pyplot as plt from pyod.utils.data import generate_data contamination = 0.1 # percentage of outliers 10% n_train = 500 # number of training points n_test = 500 # number of testing points n_features = 2 # number of features X_train, y_train, X_test, y_test = generate_data( n_train=n_train, n_test=n_test, n_features= n_features, contamination=contamination) # Make the 2d numpy array a pandas dataframe for each manipulation X_train_pd = pd.DataFrame(X_train) # print(X_train_pd) # print(y_train) # Plot plt.scatter(X_train_pd[0], X_train_pd[1], c=y_train, alpha=0.8) plt.title('Scatter plot pythonspot.com') plt.xlabel('x') plt.ylabel('y') plt.show()
Advertisement
Answer
it seems that c=y_train is the source of error. c option is for color: you might need to “translate” your y_train into some form of color format. Just to make the program running syntactically correct (but may not what you want), change to:
plt.scatter(X_train_pd[0], X_train_pd[1], c=[(1,0,0)]*len(X_train_pd[0]), alpha=0.8)