I have been trying to implement z-score normalization to all of the numeric values present in combined_data with the following code:
from scipy.stats import zscore # Calculate the zscores and drop zscores into new column combined_data['zscore'] = zscore(combined_data['zscore'])
Here, combined_data
is the combination of training and testing datasets as a dataframe and passed through one-hot encoding.
I am seeing the following error:
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) File ~/anaconda3/envs/tf-gpu/lib/python3.8/site-packages/pandas/core/indexes/base.py:2646, in Index.get_loc(self, key, method, tolerance) 2645 try: -> 2646 return self._engine.get_loc(key) 2647 except KeyError: File pandas/_libs/index.pyx:111, in pandas._libs.index.IndexEngine.get_loc() File pandas/_libs/index.pyx:138, in pandas._libs.index.IndexEngine.get_loc() File pandas/_libs/hashtable_class_helper.pxi:1619, in pandas._libs.hashtable.PyObjectHashTable.get_item() File pandas/_libs/hashtable_class_helper.pxi:1627, in pandas._libs.hashtable.PyObjectHashTable.get_item() KeyError: 'zscore' During handling of the above exception, another exception occurred: KeyError Traceback (most recent call last) Input In [29], in <cell line: 2>() 1 # Calculate the zscores and drop zscores into new column ----> 2 combined_data['zscore'] = zscore(combined_data['zscore']) File ~/anaconda3/envs/tf-gpu/lib/python3.8/site-packages/pandas/core/frame.py:2800, in DataFrame.__getitem__(self, key) 2798 if self.columns.nlevels > 1: 2799 return self._getitem_multilevel(key) -> 2800 indexer = self.columns.get_loc(key) 2801 if is_integer(indexer): 2802 indexer = [indexer] File ~/anaconda3/envs/tf-gpu/lib/python3.8/site-packages/pandas/core/indexes/base.py:2648, in Index.get_loc(self, key, method, tolerance) 2646 return self._engine.get_loc(key) 2647 except KeyError: -> 2648 return self._engine.get_loc(self._maybe_cast_indexer(key)) 2649 indexer = self.get_indexer([key], method=method, tolerance=tolerance) 2650 if indexer.ndim > 1 or indexer.size > 1: File pandas/_libs/index.pyx:111, in pandas._libs.index.IndexEngine.get_loc() File pandas/_libs/index.pyx:138, in pandas._libs.index.IndexEngine.get_loc() File pandas/_libs/hashtable_class_helper.pxi:1619, in pandas._libs.hashtable.PyObjectHashTable.get_item() File pandas/_libs/hashtable_class_helper.pxi:1627, in pandas._libs.hashtable.PyObjectHashTable.get_item() KeyError: 'zscore'
The dataset combined_data
contains 257673 rows & 198 columns
Here is the sample data of combined_data
id dur spkts dpkts sbytes dbytes rate sttl dttl sload ... state_CLO state_CON state_ECO state_FIN state_INT state_PAR state_REQ state_RST state_URN state_no 60662 60663 1.193334 10 10 608 646 15.921779 254 252 3673.740967 ... 0 0 0 1 0 0 0 0 0 0
I am new to such error. What am I doing wrong?
[UPDATE: The code was trying to create a separate column with with zscore which is not possible to do so as it is mentioned below]
Advertisement
Answer
You should apply the function zscore
to the whole dataframe, not to a non-existent column:
result = zscore(combined_data)
The result is a numpy array. You cannot make it a column of the original dataframe. But you can create another DataFrame:
pd.DataFrame(result, columns=combined_data.columns, index=combined_data.index)