I have a DataFrame
variable called data
with ~ 6 mil rows and I’d like to process it 50 rows at a time. I have the following code:
JavaScript
x
6
1
#Only 4001 for debugging purposes
2
for i in range(0,4001,50):
3
print(str(i) + " - " + str(i+49))
4
current_batch = data["text"].loc[i:(i+49)]
5
print("Batch size: " + str(len(current_batch.tolist())))
6
However it seems the slices obtained are not 50 rows in length. In fact they seem to be random (although every time I re-run the program they stay consistent. The first one is always 34, then always 48 etc …). Here is a sample output:
JavaScript
1
6
1
0 - 49
2
Batch size: 34
3
50
4
Batch size: 48
5
6
Is this an expected behavior from the DataFrame
class?
Advertisement
Answer
That’s what happens when you use loc
. And that’s what happens when you use iloc
: