Skip to content
Advertisement

How to read .csv with a compound header into a xarray DataArray (using pandas)

Given a dataset with the following structure:

time  var1  var2  var2  var1  var3
      loc1  loc1  loc2  loc2  loc1
1     11    12    13    14    15
2     21    22    23          25
3           32    33    34    35

Given as a .csv:

time,var1,var2,var2,var1,var3
,loc1,loc1,loc2,loc2,loc1
1,11,12,13,14,15
2,21,22,23,,25
3,,32,33,34,35

Note: some values are missing, not all variables are available for all locations, timestamps are available for every record, columns may appear out of order, but timestamp is reliably the first column. I’m not sure all these aspects are relevant to an optimal solution, but there they are.

I didn’t have too much trouble setting up an xarray three dimensional array that would allow me to access values by timestamp, location, variable name. It was looping through the location names after determining the unique ones, filtering the data by location and adding the results one location at a time. But I am wondering what a pythonic and, for lack of a better word, pandastic solution would look like?

Question: Is there some compact and efficient way, likely using pandas and xarray, to load this dataset or any similar one (with different variable and location names) from .csv into a 3d-array like an xarray DataArray?

Advertisement

Answer

df = pd.read_csv('tst.csv', header=[0, 1], index_col=0).sort_index(1)

time  var1       var2      var3
      loc1  loc2 loc1 loc2 loc1
1     11.0  14.0   12   13   15
2     21.0   NaN   22   23   25
3      NaN  34.0   32   33   35

However, to get into a 3-D array, we must project this into a cartesian product of the axes available to us.

cols = pd.MultiIndex.from_product(df.columns.levels)

d1 = df.reindex(columns=cols)
d1

   var1       var2      var3     
   loc1  loc2 loc1 loc2 loc1 loc2
1  11.0  14.0   12   13   15  NaN
2  21.0   NaN   22   23   25  NaN
3   NaN  34.0   32   33   35  NaN

Then use numpy.reshape and numpy.transpose

d1.values.reshape(3, 3, 2).transpose(1, 0, 2)

array([[[ 11.,  14.],
        [ 21.,  NaN],
        [ NaN,  34.]],

       [[ 12.,  13.],
        [ 22.,  23.],
        [ 32.,  33.]],

       [[ 15.,  NaN],
        [ 25.,  NaN],
        [ 35.,  NaN]]])
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement