Skip to content
Advertisement

Populating an even distribution of values across multiple axis?

Basic Example:

# Given params such as:
params = {
    'cols': 8,
    'rows': 4, 
    'n': 4
}
# I'd like to produce (or equivalent):
       col0  col1  col2  col3  col4  col5  col6  col7
row_0     0     1     2     3     0     1     2     3
row_1     1     2     3     0     1     2     3     0
row_2     2     3     0     1     2     3     0     1
row_3     3     0     1     2     3     0     1     2

Axis Value Counts:

  • Where the axis all have an equal distribution of values
df.apply(lambda x: x.value_counts(), axis=1)

       0  1  2  3
row_0  2  2  2  2
row_1  2  2  2  2
row_2  2  2  2  2
row_3  2  2  2  2
df.apply(lambda x: x.value_counts())

   col0  col1  col2  col3  col4  col5  col6  col7
0     1     1     1     1     1     1     1     1
1     1     1     1     1     1     1     1     1
2     1     1     1     1     1     1     1     1
3     1     1     1     1     1     1     1     1

My attempt thus far:

import itertools
import pandas as pd

def create_df(cols, rows, n):
    x = itertools.cycle(list(itertools.permutations(range(n))))
    df = pd.DataFrame(index=range(rows), columns=range(cols))
    df[:] = np.reshape([next(x) for _ in range((rows*cols)//n)], (rows, cols))
    #df = df.T.add_prefix('row_').T
    #df = df.add_prefix('col_')
    return df 

params = {
    'cols': 8,
    'rows': 4, 
    'n': 4
}
df = create_df(**params)

Output:

   0  1  2  3  4  5  6  7
0  0  1  2  3  0  1  3  2
1  0  2  1  3  0  2  3  1
2  0  3  1  2  0  3  2  1
3  1  0  2  3  1  0  3  2

# Correct on this Axis:
>>> df.apply(lambda x: x.value_counts(), axis=1)
   0  1  2  3
0  2  2  2  2
1  2  2  2  2
2  2  2  2  2
3  2  2  2  2

# Incorrect on this Axis:
>>> df.apply(lambda x: x.value_counts())
     0  1    2    3    4  5    6    7
0  3.0  1  NaN  NaN  3.0  1  NaN  NaN
1  1.0  1  2.0  NaN  1.0  1  NaN  2.0
2  NaN  1  2.0  1.0  NaN  1  1.0  2.0
3  NaN  1  NaN  3.0  NaN  1  3.0  NaN

So, I have the conditions I need on one axis, but not on the other.

How can I update my method/create a method to meet both conditions?

Advertisement

Answer

You can tile you input and use a custom roll to shift each row independently:

c = params['cols']
r = params['rows']
n = params['n']
a = np.arange(params['n']) # or any input

b = np.tile(a, (r, c//n))
# array([[0, 1, 2, 3, 0, 1, 2, 3],
#        [0, 1, 2, 3, 0, 1, 2, 3],
#        [0, 1, 2, 3, 0, 1, 2, 3],
#        [0, 1, 2, 3, 0, 1, 2, 3]])

idx = np.arange(r)[:, None]
shift = (np.tile(np.arange(c), (r, 1)) - np.arange(r)[:, None])

df = pd.DataFrame(b[idx, shift])

Output:

   0  1  2  3  4  5  6  7
0  0  1  2  3  0  1  2  3
1  3  0  1  2  3  0  1  2
2  2  3  0  1  2  3  0  1
3  1  2  3  0  1  2  3  0

Alternative order:

idx = np.arange(r)[:, None]
shift = (np.tile(np.arange(c), (r, 1)) + np.arange(r)[:, None]) % c

df = pd.DataFrame(b[idx, shift])

Output:

   0  1  2  3  4  5  6  7
0  0  1  2  3  0  1  2  3
1  1  2  3  0  1  2  3  0
2  2  3  0  1  2  3  0  1
3  3  0  1  2  3  0  1  2

Other alternative: use a custom strided_indexing_roll function.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement