I have a large dataframe with 423244 lines. I want to split this in to 4. I tried the following code which gave an error? ValueError: array split does not result in an equal division
for item in np.split(df, 4): print item
How to split this dataframe in to 4 groups?
Advertisement
Answer
Use np.array_split
:
Docstring: Split an array into multiple sub-arrays. Please refer to the ``split`` documentation. The only difference between these functions is that ``array_split`` allows `indices_or_sections` to be an integer that does *not* equally divide the axis.
In [1]: import pandas as pd In [2]: df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', ...: 'foo', 'bar', 'foo', 'foo'], ...: 'B' : ['one', 'one', 'two', 'three', ...: 'two', 'two', 'one', 'three'], ...: 'C' : randn(8), 'D' : randn(8)}) In [3]: print df A B C D 0 foo one -0.174067 -0.608579 1 bar one -0.860386 -1.210518 2 foo two 0.614102 1.689837 3 bar three -0.284792 -1.071160 4 foo two 0.843610 0.803712 5 bar two -1.514722 0.870861 6 foo one 0.131529 -0.968151 7 foo three -1.002946 -0.257468 In [4]: import numpy as np In [5]: np.array_split(df, 3) Out[5]: [ A B C D 0 foo one -0.174067 -0.608579 1 bar one -0.860386 -1.210518 2 foo two 0.614102 1.689837, A B C D 3 bar three -0.284792 -1.071160 4 foo two 0.843610 0.803712 5 bar two -1.514722 0.870861, A B C D 6 foo one 0.131529 -0.968151 7 foo three -1.002946 -0.257468]