Skip to content
Advertisement

add_datepart( ) produces KeyError

I’m trying to split a date column in a pandas data frame using add_datepart( ).

trainingSetFirstCycle = pandas.read_csv(

"C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/Datasets/train_datetime_split.csv",
  low_memory=False, parse_dates=["date"])
trainingSetFirstCycle.rent_count = numpy.log(trainingSetFirstCycle.rent_count)

trainingSetFirstCycle
           date      time  season  ...  casual  registered  rent_count
0     2011-01-01  00:00:00       1  ...       3          13    2.772589
1     2011-01-01  01:00:00       1  ...       8          32    3.688879
2     2011-01-01  02:00:00       1  ...       5          27    3.465736
3     2011-01-01  03:00:00       1  ...       3          10    2.564949
4     2011-01-01  04:00:00       1  ...       0           1    0.000000
         ...       ...     ...  ...     ...         ...         ...
10881 2012-12-19  19:00:00       4  ...       7         329    5.817111
10882 2012-12-19  20:00:00       4  ...      10         231    5.484797
10883 2012-12-19  21:00:00       4  ...       4         164    5.123964
10884 2012-12-19  22:00:00       4  ...      12         117    4.859812
10885 2012-12-19  23:00:00       4  ...       4          84    4.477337
[10886 rows x 13 columns]


trainingSetFirstCycle.dtypes
date          datetime64[ns]

However, running trainingSetFirstCycle = add_datepart(trainingSetFirstCycle, trainingSetFirstCycle.date, drop=True) returns this error message:

raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [DatetimeIndex(['2011-01-01 00:00:00', '2011-01-01 01:00:00',n ... dtype='datetime64[ns]', length=10886, freq=None)] are in the [columns]"

I checked the documentation to see what I’d done wrong.

image

In the example shown, the definition of the data frame includes a dictionary consisting of the column name “date” and a list containing its first four values. So I reproduced this in my own data frame:

trainingSetFirstCycle = pandas.read_csv(
    "C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/Datasets/train_datetime_split.csv",
   {'date': ['2011-01-01', '2011-01-01 ', '2011-01-01', '2011-01-01']}, low_memory=False, parse_dates=["date"])

The result was this error message:

AttributeError: 'dict' object has no attribute 'encode'.

So, do you have an idea what it is I’m missing here? Thanks in advance.

Advertisement

Answer

Solved by rewording the parameter “date”

This has been a semantic bug. Python failed to compile trainingSetFirstCycle = add_datepart(trainingSetFirstCycle, trainingSetFirstCycle.date, drop=True) because it didn’t realize that trainingSetFirstCycle.date was supposed to be the column’s name. I changed the parameter to simply "date", which has solved the problem.

Apparently, my misconception was that whenever you give a data frame column as a parameter to a function, you have to refer to it as df.column , because that was the syntax of the function numpy.log(trainingSetFirstCycle.rent_count), where the data frame column “rent_count”, is the parameter of the function numpy.log.

Advertisement