When attempting to perform a FAMD according to the instructions on https://pypi.org/project/light-famd/#factor-analysis-of-mixed-data-famd, I keep getting the same error over and over again, namely: TypeError: SparseDataFrame() takes no arguments
.
How to fix this problem? It occurs not only on my own data set but also whenever I try it on a basic randomly-generated data set such as created like this:
X_n = pd.DataFrame(data=np.random.randint(0,100,size=(10,2)),columns=list('AB')) X_c = pd.DataFrame(np.random.choice(list('abcde'),size=(10,4),replace=True),columns =list('CDEF')) X = pd.concat([X_n,X_c],axis=1)
The code is the following:
import pandas as pd import numpy as np import light_famd from light_famd import FAMD famd = FAMD(n_components=2) famd.fit(X) print(famd.explained_variance_) print(famd.column_correlation(X))
It gives the error already at famd.fit(X)
.
It does this not only for Light_FAMD but also for sklearn and prince (which I have also tried).
Advertisement
Answer
The sparse dataframe in pandas is outdated. Check their git
SparseDataFrame changed by DataFrame in line 25 one_hot.py default_fill_value was removed one_hot.py index was removed one_hot.py SparseData False one_hot.py to_dense remove in line 105 mfa.py
You can try to install the latest version from git:
python -m pip install git+https://github.com/Cauchemare/Light_FAMD.git
Should run fine:
import pandas as pd import numpy as np import light_famd from light_famd import FAMD X_n = pd.DataFrame(data=np.random.randint(0,100,size=(10,2)),columns=list('AB')) X_c = pd.DataFrame(np.random.choice(list('abcde'),size=(10,4),replace=True),columns =list('CDEF')) X = pd.concat([X_n,X_c],axis=1) famd = FAMD(n_components=2) famd.fit(X) print(famd.explained_variance_) [17.77604109 9.92849978]
This last part still throws some warnings:
print(famd.column_correlation(X))