Skip to content
Advertisement

ValueError: cannot convert a DataFrame with a non-unique MultiIndex into xarray

These are the data I want to convert which are saved in CSV. And some of the longitude and latitude may are repeated, actually, they are extracted from a NetCDF file.

lon
Out[56]: 
0       121.25
1       121.75
2       122.25
3       122.75
4       123.25
 
3819    109.75
3820    110.25
3821    108.75
3822    109.25
3823    109.75
Name: E, Length: 3824, dtype: float64
lat
Out[57]: 
0       53.25
1       53.25
2       53.25
3       53.25
4       53.25
 
3819    19.25
3820    19.25
3821    18.75
3822    18.75
3823    18.75
Name: N, Length: 3824, dtype: float64
pr
Out[58]: 
0       136.094444
1        95.242593
2       120.557407
3        92.844444
4       106.596296
   
3819    176.818519
3820    512.942593
3821    271.687037
3822    359.205556
3823    242.946296
Name: annual, Length: 3824, dtype: float64

So I want to convert them to xarray because I need the ‘pr’ to be 2D(with no repeated long or lat) like the following one.

<xarray.DataArray 'Temperature_surface' (lat: 153, lon: 257)>
array([[258.67383, 258.57382, 258.87384, ..., 249.67383, 246.57382, 244.97383],
       [258.57382, 258.77383, 258.67383, ..., 245.27383, 246.77383, 251.47383],
       [258.57382, 258.47382, 258.27383, ..., 246.67383, 246.07382, 251.47383],
       ...,
       [300.77383, 300.77383, 300.67383, ..., 302.37384, 302.27383, 302.27383],
       [300.87384, 300.77383, 300.67383, ..., 302.37384, 302.37384, 302.27383],
       [300.87384, 300.97382, 300.97382, ..., 302.37384, 302.37384, 302.27383]],
      dtype=float32)
Coordinates:
  
  * lat       (lat) float32 56.0 55.75 55.5 55.25 55.0 ... 18.75 18.5 18.25 18.0
  * lon       (lon) float32 72.0 72.25 72.5 72.75 ... 135.2 135.5 135.8 136.0

Here is my code:

import pandas as pd


data=pd.read_csv('E:DesktopData ProcessingCorrect NewCSVChina_R95P.csv')
lon=data['E']
lat=data['N']
pr=data['annual']
df=pd.DataFrame({
    'lon':lon,
    'lat':lat,
    'pr':pr
    })
df=df.set_index(['lon','lat'])

df is like this

Out[97]: 
                      pr
lon    lat              
121.25 53.25  136.094444
121.75 53.25   95.242593
122.25 53.25  120.557407
122.75 53.25   92.844444
123.25 53.25  106.596296
                 ...
109.75 19.25  176.818519
110.25 19.25  512.942593
108.75 18.75  271.687037
109.25 18.75  359.205556
109.75 18.75  242.946296

[3824 rows x 1 columns]

And then when I use df.to_xarray() I got the errorValueError: cannot convert a DataFrame with a non-unique MultiIndex into xarray

What should I do ? Thanks for answering!

Advertisement

Answer

As your error says, you have a non-unique index. This causes a problem in xarray because you are potentially sending contradictory data to it. Each longitude and latitude should have a unique value. So you either need to drop duplicates, or average the values in each lon/lat. The following will work if you simply have duplicates:

df=df.drop_duplicates.reset_index(drop=True)
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement