getting “KeyError” while implementing Z-score on a dataset

I have been trying to implement z-score normalization to all of the numeric values present in combined_data with the following code:

from scipy.stats import zscore

# Calculate the zscores and drop zscores into new column
combined_data['zscore'] = zscore(combined_data['zscore'])

Here, combined_data is the combination of training and testing datasets as a dataframe and passed through one-hot encoding.

I am seeing the following error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/anaconda3/envs/tf-gpu/lib/python3.8/site-packages/pandas/core/indexes/base.py:2646, in Index.get_loc(self, key, method, tolerance)
   2645 try:
-> 2646     return self._engine.get_loc(key)
   2647 except KeyError:

File pandas/_libs/index.pyx:111, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/index.pyx:138, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:1619, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:1627, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'zscore'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
Input In [29], in <cell line: 2>()
      1 # Calculate the zscores and drop zscores into new column
----> 2 combined_data['zscore'] = zscore(combined_data['zscore'])

File ~/anaconda3/envs/tf-gpu/lib/python3.8/site-packages/pandas/core/frame.py:2800, in DataFrame.__getitem__(self, key)
   2798 if self.columns.nlevels > 1:
   2799     return self._getitem_multilevel(key)
-> 2800 indexer = self.columns.get_loc(key)
   2801 if is_integer(indexer):
   2802     indexer = [indexer]

File ~/anaconda3/envs/tf-gpu/lib/python3.8/site-packages/pandas/core/indexes/base.py:2648, in Index.get_loc(self, key, method, tolerance)
   2646         return self._engine.get_loc(key)
   2647     except KeyError:
-> 2648         return self._engine.get_loc(self._maybe_cast_indexer(key))
   2649 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2650 if indexer.ndim > 1 or indexer.size > 1:

File pandas/_libs/index.pyx:111, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/index.pyx:138, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:1619, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:1627, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'zscore'

The dataset combined_data contains 257673 rows & 198 columns

Here is the sample data of combined_data

id  dur spkts   dpkts   sbytes  dbytes  rate    sttl    dttl    sload   ... state_CLO   state_CON   state_ECO   state_FIN   state_INT   state_PAR   state_REQ   state_RST   state_URN   state_no
60662   60663   1.193334    10  10  608 646 15.921779   254 252 3673.740967 ... 0   0   0   1   0   0   0   0   0   0

image of sample data

I am new to such error. What am I doing wrong?

[UPDATE: The code was trying to create a separate column with with zscore which is not possible to do so as it is mentioned below]

Answer

You should apply the function zscore to the whole dataframe, not to a non-existent column:

result = zscore(combined_data)

The result is a numpy array. You cannot make it a column of the original dataframe. But you can create another DataFrame:

pd.DataFrame(result, columns=combined_data.columns, index=combined_data.index)

Advertisement

Answer