Basic Example:
# Given params such as: params = { 'cols': 8, 'rows': 4, 'n': 4 } # I'd like to produce (or equivalent): col0 col1 col2 col3 col4 col5 col6 col7 row_0 0 1 2 3 0 1 2 3 row_1 1 2 3 0 1 2 3 0 row_2 2 3 0 1 2 3 0 1 row_3 3 0 1 2 3 0 1 2
Axis Value Counts:
- Where the axis all have an equal distribution of values
df.apply(lambda x: x.value_counts(), axis=1) 0 1 2 3 row_0 2 2 2 2 row_1 2 2 2 2 row_2 2 2 2 2 row_3 2 2 2 2
df.apply(lambda x: x.value_counts()) col0 col1 col2 col3 col4 col5 col6 col7 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1
My attempt thus far:
import itertools import pandas as pd def create_df(cols, rows, n): x = itertools.cycle(list(itertools.permutations(range(n)))) df = pd.DataFrame(index=range(rows), columns=range(cols)) df[:] = np.reshape([next(x) for _ in range((rows*cols)//n)], (rows, cols)) #df = df.T.add_prefix('row_').T #df = df.add_prefix('col_') return df params = { 'cols': 8, 'rows': 4, 'n': 4 } df = create_df(**params)
Output:
0 1 2 3 4 5 6 7 0 0 1 2 3 0 1 3 2 1 0 2 1 3 0 2 3 1 2 0 3 1 2 0 3 2 1 3 1 0 2 3 1 0 3 2 # Correct on this Axis: >>> df.apply(lambda x: x.value_counts(), axis=1) 0 1 2 3 0 2 2 2 2 1 2 2 2 2 2 2 2 2 2 3 2 2 2 2 # Incorrect on this Axis: >>> df.apply(lambda x: x.value_counts()) 0 1 2 3 4 5 6 7 0 3.0 1 NaN NaN 3.0 1 NaN NaN 1 1.0 1 2.0 NaN 1.0 1 NaN 2.0 2 NaN 1 2.0 1.0 NaN 1 1.0 2.0 3 NaN 1 NaN 3.0 NaN 1 3.0 NaN
So, I have the conditions I need on one axis, but not on the other.
How can I update my method/create a method to meet both conditions?
Advertisement
Answer
You can tile
you input and use a custom roll to shift each row independently:
c = params['cols'] r = params['rows'] n = params['n'] a = np.arange(params['n']) # or any input b = np.tile(a, (r, c//n)) # array([[0, 1, 2, 3, 0, 1, 2, 3], # [0, 1, 2, 3, 0, 1, 2, 3], # [0, 1, 2, 3, 0, 1, 2, 3], # [0, 1, 2, 3, 0, 1, 2, 3]]) idx = np.arange(r)[:, None] shift = (np.tile(np.arange(c), (r, 1)) - np.arange(r)[:, None]) df = pd.DataFrame(b[idx, shift])
Output:
0 1 2 3 4 5 6 7 0 0 1 2 3 0 1 2 3 1 3 0 1 2 3 0 1 2 2 2 3 0 1 2 3 0 1 3 1 2 3 0 1 2 3 0
Alternative order:
idx = np.arange(r)[:, None] shift = (np.tile(np.arange(c), (r, 1)) + np.arange(r)[:, None]) % c df = pd.DataFrame(b[idx, shift])
Output:
0 1 2 3 4 5 6 7 0 0 1 2 3 0 1 2 3 1 1 2 3 0 1 2 3 0 2 2 3 0 1 2 3 0 1 3 3 0 1 2 3 0 1 2
Other alternative: use a custom strided_indexing_roll
function.