How do I create test and train samples from one dataframe with pandas?

Question

I have a fairly large dataset in the form of a dataframe and I was wondering how I would be able to split the dataframe into two random samples (80% and 20%) for training and testing. Thanks! Answer I would just use numpy's randn: And just to see this has worked:

Accepted Answer

I would just use numpy&#8217;s randn:In [11]: df = pd.DataFrame(np.random.randn(100, 2))In [12]: msk = np.random.rand(len(df)) < 0.8In [13]: train = df[msk]In [14]: test = df[~msk]And just to see this has worked:In [15]: len(test)Out[15]: 21In [16]: len(train)Out[16]: 79

Advertisement

Answer