I have a DataFrame variable called data with ~ 6 mil rows and I'd like to process it 50 rows at a time. I have the following code: However it seems the slices obtained are not 50 rows in length. In fact they seem to be random (although every time I re-run the program they stay consistent. The first one

Dataframe row slicing is not consistent

I have a DataFrame variable called data with ~ 6 mil rows and I’d like to process it 50 rows at a time. I have the following code:

#Only 4001 for debugging purposes
for i in range(0,4001,50):
  print(str(i) + " - " + str(i+49))
  current_batch = data["text"].loc[i:(i+49)]
  print("Batch size: " + str(len(current_batch.tolist())))

JavaScript
​x
 
#Only 4001 for debugging purposes
for i in range(0,4001,50):
  print(str(i) + " - " + str(i+49))
  current_batch = data["text"].loc[i:(i+49)]
  print("Batch size: " + str(len(current_batch.tolist())))
​

However it seems the slices obtained are not 50 rows in length. In fact they seem to be random (although every time I re-run the program they stay consistent. The first one is always 34, then always 48 etc …). Here is a sample output:

0 - 49
Batch size: 34
50
Batch size: 48
...

JavaScript
 
0 - 49
Batch size: 34
50
Batch size: 48
...
​

Is this an expected behavior from the DataFrame class?

Answer

That’s what happens when you use loc. And that’s what happens when you use iloc:

Advertisement

Answer