Skip to content
Advertisement

Shuffle rows of a large csv

I want to shuffle this dataset to have a random set. It has 1.6 million rows but the first are 0 and the last 4, so I need pick samples randomly to have more than one class. The actual code prints only class 0 (meaning in just 1 class). I took advice from this platform but doesn’t work.

JavaScript

Advertisement

Answer

Because you read in your data using Pandas, you can also do the randomisation in a different way using pd.sample:

JavaScript

If this fails, it might be good to check out the amount of unique values and how frequent they appear. If the first 1,599,999 are 0 and the last is only 4, then the chances are that you won’t get any 4.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement