I’m building a model to predict pedestrian casualties on the streets of New York, from a data set of 1.7 million records. I decided to build dummy features out of the
ON STREET NAME column, to see what predictive power that might provide. With that, I have approximately 7500 features.
I tried running that, and I immediately get an alert that the Jupyter kernel died. Tried it again, same thing happened. Considering how long the model takes to fit, and how hot the computer runs, when I try to fit on 100 features, I can only assume that
LogisticRegression() is not meant to handle such a feature set.
You should at least provide a log, or an example we can reproduce, so other people can determine the problem.
Side note 7500 features and 1.7 million rows assuming that’s a float for every element you got about 48 GB of data there, ram probably will be a major issue.
Finaly feature reduction methods like PCA or some feature selection method would probably help enough so you won’t need to change the model