Convert 2D dataframe to 3D numpy array based on unique ID

Question

I have a dataframe in this format: ... though my dataframe is much larger, with more than 500 hundred IDs. I want to convert this 2D - dataframe into a 3D array in this format (num_time_samples, value, ID). Essentially I would like to have one 2D-array for every unique ID. I plan on using the value column to build lag

Accepted Answer

Say you havedf = pd.DataFrame( { 'time column': [ '00:00:00', '00:15:00', '00:00:00', '00:15:00', ], 'ID column': [ 1, 1, 2, 2, ], 'Value': [ 10, 0, 6, 2, ], })where df actually is a subset of your dataframe keeping eveything data-type-naive.I want to convert this 2D – dataframe into a 3D array in this format (num_time_samples, value, ID).Why not doa = ( df .set_index(['time column', 'ID column']) .unstack(level=-1) # which leaves 'time column' as first dimension index .to_numpy() .reshape( ( df['ID column'].unique().size, df['time column'].unique().size, 1, ) ))a looks like>>> aarray([[[10], [ 6]], [[ 0], [ 2]]], dtype=int64)>>> a.shape(2, 2, 1)>>> a.ndim3a is structured as time column × ID column × Value (and indexable accordingly). E.g. let’s get individuals’ 00:15:00-data>>> a[1] # <=> a[1, ...] <=> a[1, :, :]array([[0], [2]], dtype=int64)Let’s get the first and second individual’s time series, respectively,>>> a[:, 0] # <=> a[:, 0, :] <=> a[..., 0, :]array([[10], [ 0]], dtype=int64)>>> a[:, 1]array([[6], [2]], dtype=int64)

Advertisement

Answer