I have a DataFrame variable called data with ~ 6 mil rows and I’d like to process it 50 rows at a time. I have the following code:
#Only 4001 for debugging purposes
for i in range(0,4001,50):
  print(str(i) + " - " + str(i+49))
  current_batch = data["text"].loc[i:(i+49)]
  print("Batch size: " + str(len(current_batch.tolist())))
However it seems the slices obtained are not 50 rows in length. In fact they seem to be random (although every time I re-run the program they stay consistent. The first one is always 34, then always 48 etc …). Here is a sample output:
0 - 49 Batch size: 34 50 Batch size: 48 ...
Is this an expected behavior from the DataFrame class?
Advertisement
Answer
That’s what happens when you use loc. And that’s what happens when you use iloc:
 
						
