Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GroupBy(multiple groupers) #9372

Merged
merged 12 commits into from
Aug 26, 2024
Merged
45 changes: 30 additions & 15 deletions doc/user-guide/groupby.rst
Original file line number Diff line number Diff line change
Expand Up @@ -81,9 +81,11 @@ You can index out a particular group:

ds.groupby("letters")["b"]

Just like in pandas, creating a GroupBy object is cheap: it does not actually
Just like in pandas, creating a ``GroupBy`` object is cheap: it does not actually
split the data until you access particular values.

To group by multiple variables, see the section on `Grouper Objects`_

dcherian marked this conversation as resolved.
Show resolved Hide resolved
Binning
~~~~~~~

Expand Down Expand Up @@ -180,19 +182,6 @@ This last line is roughly equivalent to the following::
results.append(group - alt.sel(letters=label))
xr.concat(results, dim='x')

Iterating and Squeezing
~~~~~~~~~~~~~~~~~~~~~~~

Previously, Xarray defaulted to squeezing out dimensions of size one when iterating over
a GroupBy object. This behaviour is being removed.
You can always squeeze explicitly later with the Dataset or DataArray
:py:meth:`DataArray.squeeze` methods.

.. ipython:: python

next(iter(arr.groupby("x", squeeze=False)))


.. _groupby.multidim:

Multidimensional Grouping
Expand Down Expand Up @@ -236,6 +225,8 @@ applying your function, and then unstacking the result:
stacked = da.stack(gridcell=["ny", "nx"])
stacked.groupby("gridcell").sum(...).unstack("gridcell")

Alternatively, you can groupby both `lat` and `lon` at the :ref:`same time <groupby.multiple>`.

.. _groupby.groupers:

Grouper Objects
Expand Down Expand Up @@ -276,7 +267,8 @@ is identical to

ds.groupby(x=UniqueGrouper())

and

Similarly,

.. code-block:: python

Expand All @@ -303,3 +295,26 @@ is identical to
from xarray.groupers import TimeResampler

ds.resample(time=TimeResampler("ME"))


.. _groupby.multiple:

Grouping by multiple variables
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Use grouper objects to group by multiple dimensions:

.. ipython:: python

from xarray.groupers import UniqueGrouper

da.groupby(lat=UniqueGrouper(), lon=UniqueGrouper()).sum()


Different groupers can be combined to construct sophisticated GroupBy operations.

.. ipython:: python

from xarray.groupers import BinGrouper

ds.groupby(x=BinGrouper(bins=[5, 15, 25]), letters=UniqueGrouper()).sum()
5 changes: 5 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,11 @@ New Features
~~~~~~~~~~~~
- Make chunk manager an option in ``set_options`` (:pull:`9362`).
By `Tom White <https://github.com/tomwhite>`_.
- Support for :ref:`grouping by multiple variables <groupby.multiple>`.
This is quite new, so please check your results and report bugs.
Binary operations after grouping by multiple arrays are not supported yet.
(:issue:`1056`, :issue:`9332`, :issue:`324`, :pull:`9372`).
By `Deepak Cherian <https://github.com/dcherian>`_.
- Allow data variable specific ``constant_values`` in the dataset ``pad`` function (:pull:`9353``).
By `Tiago Sanona <https://github.com/tsanona>`_.

Expand Down
19 changes: 7 additions & 12 deletions xarray/core/dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -6801,27 +6801,22 @@ def groupby(
groupers = either_dict_or_kwargs(group, groupers, "groupby") # type: ignore
group = None

grouper: Grouper
rgroupers: tuple[ResolvedGrouper, ...]
if group is not None:
if groupers:
raise ValueError(
"Providing a combination of `group` and **groupers is not supported."
)
grouper = UniqueGrouper()
rgroupers = (ResolvedGrouper(UniqueGrouper(), group, self),)
else:
if len(groupers) > 1:
raise ValueError("grouping by multiple variables is not supported yet.")
if not groupers:
raise ValueError("Either `group` or `**groupers` must be provided.")
group, grouper = next(iter(groupers.items()))

rgrouper = ResolvedGrouper(grouper, group, self)
rgroupers = tuple(
ResolvedGrouper(grouper, group, self)
for group, grouper in groupers.items()
)

return DataArrayGroupBy(
self,
(rgrouper,),
restore_coord_dims=restore_coord_dims,
)
return DataArrayGroupBy(self, rgroupers, restore_coord_dims=restore_coord_dims)

@_deprecate_positional_args("v2024.07.0")
def groupby_bins(
Expand Down
19 changes: 8 additions & 11 deletions xarray/core/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -10397,25 +10397,22 @@ def groupby(
groupers = either_dict_or_kwargs(group, groupers, "groupby") # type: ignore
group = None

rgroupers: tuple[ResolvedGrouper, ...]
if group is not None:
if groupers:
raise ValueError(
"Providing a combination of `group` and **groupers is not supported."
)
rgrouper = ResolvedGrouper(UniqueGrouper(), group, self)
rgroupers = (ResolvedGrouper(UniqueGrouper(), group, self),)
else:
if len(groupers) > 1:
raise ValueError("Grouping by multiple variables is not supported yet.")
elif not groupers:
if not groupers:
raise ValueError("Either `group` or `**groupers` must be provided.")
for group, grouper in groupers.items():
rgrouper = ResolvedGrouper(grouper, group, self)
rgroupers = tuple(
ResolvedGrouper(grouper, group, self)
for group, grouper in groupers.items()
)

return DatasetGroupBy(
self,
(rgrouper,),
restore_coord_dims=restore_coord_dims,
)
return DatasetGroupBy(self, rgroupers, restore_coord_dims=restore_coord_dims)

@_deprecate_positional_args("v2024.07.0")
def groupby_bins(
Expand Down
Loading
Loading