I have a DataFrame
variable called data
with ~ 6 mil rows and I’d like to process it 50 rows at a time. I have the following code:
#Only 4001 for debugging purposes for i in range(0,4001,50): print(str(i) + " - " + str(i+49)) current_batch = data["text"].loc[i:(i+49)] print("Batch size: " + str(len(current_batch.tolist())))
However it seems the slices obtained are not 50 rows in length. In fact they seem to be random (although every time I re-run the program they stay consistent. The first one is always 34, then always 48 etc …). Here is a sample output:
0 - 49 Batch size: 34 50 Batch size: 48 ...
Is this an expected behavior from the DataFrame
class?
Advertisement
Answer
That’s what happens when you use loc
. And that’s what happens when you use iloc
: