I have a large dataframe with 423244 lines. I want to split this in to 4. I tried the following code which gave an error? ValueError: array split does not result in an equal division
JavaScript
x
3
1
for item in np.split(df, 4):
2
print item
3
How to split this dataframe in to 4 groups?
Advertisement
Answer
Use np.array_split
:
JavaScript
1
8
1
Docstring:
2
Split an array into multiple sub-arrays.
3
4
Please refer to the ``split`` documentation. The only difference
5
between these functions is that ``array_split`` allows
6
`indices_or_sections` to be an integer that does *not* equally
7
divide the axis.
8
JavaScript
1
34
34
1
In [1]: import pandas as pd
2
3
In [2]: df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
4
'foo', 'bar', 'foo', 'foo'], :
5
'B' : ['one', 'one', 'two', 'three', :
6
'two', 'two', 'one', 'three'], :
7
'C' : randn(8), 'D' : randn(8)}) :
8
9
In [3]: print df
10
A B C D
11
0 foo one -0.174067 -0.608579
12
1 bar one -0.860386 -1.210518
13
2 foo two 0.614102 1.689837
14
3 bar three -0.284792 -1.071160
15
4 foo two 0.843610 0.803712
16
5 bar two -1.514722 0.870861
17
6 foo one 0.131529 -0.968151
18
7 foo three -1.002946 -0.257468
19
20
In [4]: import numpy as np
21
In [5]: np.array_split(df, 3)
22
Out[5]:
23
[ A B C D
24
0 foo one -0.174067 -0.608579
25
1 bar one -0.860386 -1.210518
26
2 foo two 0.614102 1.689837,
27
A B C D
28
3 bar three -0.284792 -1.071160
29
4 foo two 0.843610 0.803712
30
5 bar two -1.514722 0.870861,
31
A B C D
32
6 foo one 0.131529 -0.968151
33
7 foo three -1.002946 -0.257468]
34