I have a pandas dataframe like this:
User-Id Training-Id TrainingTaken 0 4327024 25 10 1 6662572 3 10 2 3757520 26 10
and I need to convert it to a Matrix like they do here: https://github.com/tr1ten/Anime-Recommender-System/blob/main/HybridRecommenderSystem.ipynb Cell 13.
So I did the following:
from lightfm import LightFM from lightfm.evaluation import precision_at_k import pandas as pd import numpy as np import matplotlib.pyplot as plt import pandas_profiling from scipy.sparse import csr_matrix from lightfm.evaluation import auc_score from lightfm.data import Dataset user_training_interaction = pd.pivot_table(trainingtaken, index='User-Id', columns='Training-Id', values='TrainingTaken') user_training_interaction.fillna(0,inplace=True)
user_training_csr = csr_matrix(user_training_interaction.values)
But I get this error:
--------------------------------------------------------------------------- DataError Traceback (most recent call last) <ipython-input-96-5a2c7ba28976> in <module> 10 from lightfm.data import Dataset 11 ---> 12 user_training_interaction = pd.pivot_table(trainingtaken, index='User-Id', columns='Training-Id', values='TrainingTaken') 13 user_training_interaction.fillna(0,inplace=True) 14 user_training_csr = csr_matrix(user_training_interaction.values) /anaconda/envs/azureml_py36/lib/python3.6/site-packages/pandas/core/reshape/pivot.py in pivot_table(data, values, index, columns, aggfunc, fill_value, margins, dropna, margins_name, observed) 110 111 grouped = data.groupby(keys, observed=observed) --> 112 agged = grouped.agg(aggfunc) 113 if dropna and isinstance(agged, ABCDataFrame) and len(agged.columns): 114 agged = agged.dropna(how="all") /anaconda/envs/azureml_py36/lib/python3.6/site-packages/pandas/core/groupby/generic.py in aggregate(self, func, engine, engine_kwargs, *args, **kwargs) 949 func = maybe_mangle_lambdas(func) 950 --> 951 result, how = self._aggregate(func, *args, **kwargs) 952 if how is None: 953 return result /anaconda/envs/azureml_py36/lib/python3.6/site-packages/pandas/core/base.py in _aggregate(self, arg, *args, **kwargs) 305 306 if isinstance(arg, str): --> 307 return self._try_aggregate_string_function(arg, *args, **kwargs), None 308 309 if isinstance(arg, dict): /anaconda/envs/azureml_py36/lib/python3.6/site-packages/pandas/core/base.py in _try_aggregate_string_function(self, arg, *args, **kwargs) 261 if f is not None: 262 if callable(f): --> 263 return f(*args, **kwargs) 264 265 # people may try to aggregate on a non-callable attribute /anaconda/envs/azureml_py36/lib/python3.6/site-packages/pandas/core/groupby/groupby.py in mean(self, numeric_only) 1396 "mean", 1397 alt=lambda x, axis: Series(x).mean(numeric_only=numeric_only), -> 1398 numeric_only=numeric_only, 1399 ) 1400 /anaconda/envs/azureml_py36/lib/python3.6/site-packages/pandas/core/groupby/generic.py in _cython_agg_general(self, how, alt, numeric_only, min_count) 1020 ) -> DataFrame: 1021 agg_blocks, agg_items = self._cython_agg_blocks( -> 1022 how, alt=alt, numeric_only=numeric_only, min_count=min_count 1023 ) 1024 return self._wrap_agged_blocks(agg_blocks, items=agg_items) /anaconda/envs/azureml_py36/lib/python3.6/site-packages/pandas/core/groupby/generic.py in _cython_agg_blocks(self, how, alt, numeric_only, min_count) 1128 1129 if not (agg_blocks or split_frames): -> 1130 raise DataError("No numeric types to aggregate") 1131 1132 if split_items: DataError: No numeric types to aggregate
What am I missing?
Advertisement
Answer
The Pandas Documentation states:
While pivot() provides general purpose pivoting with various data types (strings, numerics, etc.), pandas also provides pivot_table() for pivoting with aggregation of numeric data
Make sure the column is numeric. Without seeing how you create trainingtaken
I can’t provide more specific guidance. However the following may help:
- Make sure you handle “empty” values in that column. The Pandas guide is a very good place to start. Pandas points out that “a column of integers with even one missing values is cast to floating-point dtype”.
- If working with a dataframe, the column can be cast to a specific type via
your_df.your_col.astype(int)
or for your example,pd.trainingtaken.astype(int)