Skip to content
Advertisement

Scikit-learn train_test_split with indices

How do I get the original indices of the data when using train_test_split()?

What I have is the following

JavaScript

But this does not give the indices of the original data. One workaround is to add the indices to data (e.g. data = [(i, d) for i, d in enumerate(data)]) and then pass them inside train_test_split and then expand again. Are there any cleaner solutions?

Advertisement

Answer

Scikit learn plays really well with Pandas, so I suggest you use it. Here’s an example:

JavaScript

You can directly call any scikit functions on DataFrame/Series and it will work.

Let’s say you wanted to do a LogisticRegression, here’s how you could retrieve the coefficients in a nice way:

JavaScript
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement