How to read .csv with a compound header into a xarray DataArray (using pandas)

Question

Given a dataset with the following structure: Given as a .csv: Note: some values are missing, not all variables are available for all locations, timestamps are available for every record, columns may appear out of order, but timestamp is reliably the first column. I'm not sure all these aspects are relevant to an optimal solution, but there they are. I

Accepted Answer

df = pd.read_csv('tst.csv', header=[0, 1], index_col=0).sort_index(1)time  var1       var2      var3      loc1  loc2 loc1 loc2 loc11     11.0  14.0   12   13   152     21.0   NaN   22   23   253      NaN  34.0   32   33   35However, to get into a 3-D array, we must project this into a cartesian product of the axes available to us.cols = pd.MultiIndex.from_product(df.columns.levels)d1 = df.reindex(columns=cols)d1   var1       var2      var3        loc1  loc2 loc1 loc2 loc1 loc21  11.0  14.0   12   13   15  NaN2  21.0   NaN   22   23   25  NaN3   NaN  34.0   32   33   35  NaNThen use numpy.reshape and numpy.transposed1.values.reshape(3, 3, 2).transpose(1, 0, 2)array([[[ 11.,  14.],        [ 21.,  NaN],        [ NaN,  34.]],       [[ 12.,  13.],        [ 22.,  23.],        [ 32.,  33.]],       [[ 15.,  NaN],        [ 25.,  NaN],        [ 35.,  NaN]]])

Advertisement

Answer