-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xarray.open_dataset has issues if the dataset returned by the backend contains a multiindex #7139
Comments
Hi @lukasbindreiter, could you add the whole error traceback please? |
I can see this type of decoding breaking some assumption in the file reading process. A full traceback would help identify where. I think the real solution is actually #4490, so you could explicitly provide a coder. |
Here is the full stacktrace: ---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In [12], line 7
----> 7 loaded = xr.open_dataset("multiindex.nc", engine="netcdf4-multiindex", handle_multiindex=True)
8 print(loaded)
File ~/.local/share/virtualenvs/test-oePfdNug/lib/python3.8/site-packages/xarray/backends/api.py:537, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, backend_kwargs, **kwargs)
530 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
531 backend_ds = backend.open_dataset(
532 filename_or_obj,
533 drop_variables=drop_variables,
534 **decoders,
535 **kwargs,
536 )
--> 537 ds = _dataset_from_backend_dataset(
538 backend_ds,
539 filename_or_obj,
540 engine,
541 chunks,
542 cache,
543 overwrite_encoded_chunks,
544 inline_array,
545 drop_variables=drop_variables,
546 **decoders,
547 **kwargs,
548 )
549 return ds
File ~/.local/share/virtualenvs/test-oePfdNug/lib/python3.8/site-packages/xarray/backends/api.py:345, in _dataset_from_backend_dataset(backend_ds, filename_or_obj, engine, chunks, cache, overwrite_encoded_chunks, inline_array, **extra_tokens)
340 if not isinstance(chunks, (int, dict)) and chunks not in {None, "auto"}:
341 raise ValueError(
342 f"chunks must be an int, dict, 'auto', or None. Instead found {chunks}."
343 )
--> 345 _protect_dataset_variables_inplace(backend_ds, cache)
346 if chunks is None:
347 ds = backend_ds
File ~/.local/share/virtualenvs/test-oePfdNug/lib/python3.8/site-packages/xarray/backends/api.py:239, in _protect_dataset_variables_inplace(dataset, cache)
237 if cache:
238 data = indexing.MemoryCachedArray(data)
--> 239 variable.data = data
File ~/.local/share/virtualenvs/test-oePfdNug/lib/python3.8/site-packages/xarray/core/variable.py:2795, in IndexVariable.data(self, data)
2793 @Variable.data.setter # type: ignore[attr-defined]
2794 def data(self, data):
-> 2795 raise ValueError(
2796 f"Cannot assign to the .data attribute of dimension coordinate a.k.a IndexVariable {self.name!r}. "
2797 f"Please use DataArray.assign_coords, Dataset.assign_coords or Dataset.assign as appropriate."
2798 )
ValueError: Cannot assign to the .data attribute of dimension coordinate a.k.a IndexVariable 'measurement'. Please use DataArray.assign_coords, Dataset.assign_coords or Dataset.assign as appropriate. |
Looks like the backend logic needs some updates to make it compatible with the new xarray data model with explicit indexes (i.e., possible indexed coordinates with name != dimension like for multi-index levels now), e.g., here: Lines 234 to 241 in 8eea8bb
|
Based on your suggestion above I tried this single line fix which resolved my issue: #7150 However I'm not sure if this is the correct approach, since I'm not all to deeply familiar with the indexing model. |
What happened?
As a follow up of this comment: #6752 (comment) I'm currently trying to implement a custom
NetCDF4
backend that allows me to also handle multiindices when loading a NetCDF dataset usingxr.open_dataset
.I'm using the following two functions to convert the dataset to a NetCDF compatible version and back again:
#1077 (comment).
Here is a small code example:
Creating the dataset
Saving as NetCDF
And loading again
Custom Backend
While the manual patching for saving is currently still required, I tried to at least work around the added function call in
open_dataset
by creating a custom NetCDF Backend:The error
but this works:
So I'm guessing
xarray
is performing some operation on the dataset returned by the backend, and one of those leads to a failure if there is a multiindex already contained.What did you expect to happen?
I expected that it doesn't matter wheter
decode_compress_to_multi_index
is called inside the backend or afterwards, and the same dataset will be returned each time.Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
No response
Anything else we need to know?
I'm also open to other suggestions how I could simplify the usage of multiindices, maybe there is an approach that doesn't require a custom backend at all?
Environment
xarray: 2022.9.0
pandas: 1.5.0
numpy: 1.23.3
scipy: 1.9.1
netCDF4: 1.5.4
pydap: None
h5netcdf: None
h5py: 3.7.0
Nio: None
zarr: None
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.3.2
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.6.0
cartopy: 0.19.0.post1
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: 0.13.0
flox: None
numpy_groupies: None
setuptools: 65.3.0
pip: 22.2.2
conda: None
pytest: 7.1.3
IPython: 8.5.0
sphinx: 4.5.0
The text was updated successfully, but these errors were encountered: