-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ASV Benchmark warning and timeouts #9890
Comments
Regarding
Rolling with dask is very slow now: import numpy as np
import pandas as pd
import xarray as xr
def randn(shape, frac_nan=None, chunks=None, seed=0):
rng = np.random.default_rng(seed)
if chunks is None:
x = rng.standard_normal(shape)
else:
import dask.array as da
rng = da.random.default_rng(seed)
x = rng.standard_normal(shape, chunks=chunks)
if frac_nan is not None:
inds = rng.choice(range(x.size), int(x.size * frac_nan))
x.flat[inds] = np.nan
return x
nx = 3000
long_nx = 30000
ny = 200
nt = 1000
window = 20
randn_xy = randn((nx, ny), frac_nan=0.1)
randn_xt = randn((nx, nt))
randn_t = randn((nt,))
randn_long = randn((long_nx,), frac_nan=0.1)
ds = xr.Dataset(
{
"var1": (("x", "y"), randn_xy),
"var2": (("x", "t"), randn_xt),
"var3": (("t",), randn_t),
},
coords={
"x": np.arange(nx),
"y": np.linspace(0, 1, ny),
"t": pd.date_range("1970-01-01", periods=nt, freq="D"),
"x_coords": ("x", np.linspace(1.1, 2.1, nx)),
},
)
window_ = 20
min_periods = 5
use_bottleneck = False
%timeit ds.rolling(x=window_, center=False, min_periods=min_periods).reduce(np.nansum).load()
# 601 ms ± 43.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
ds = ds.chunk({"x": 100, "y": 50, "t": 50})
%timeit ds.rolling(x=window_, center=False, min_periods=min_periods).reduce(np.nansum).load()
# 1min 9s ± 1.31 s per loop (mean ± std. dev. of 7 runs, 1 loop each) |
Regarding
The warning shows up here: import numpy as np
import xarray as xr
n = 100
ds = xr.Dataset(
{
"a": xr.DataArray(np.r_[np.repeat(1, n), np.repeat(2, n)]),
"b": xr.DataArray(np.arange(2 * n)),
"c": xr.DataArray(np.arange(2 * n)),
}
)
method = "sum"
with xr.set_options(use_flox=False):
getattr(ds.groupby("b"), method)().compute()
# with xr.set_options(use_flox=True):
# getattr(ds.groupby("b"), method)().compute() |
The Dask slowdown is kind of expected. Historically, rolling would blow up your chunk sizes with a factor equivalent to the window size, i.e. 20 in your case. This is generally not harmful if your chunks are tiny like they are in your example. It will crash your stuff though if you have 100MiB chunks and a window of size 20 or more, that's why we now keep chunk sizes consistent. That makes things more scalable but hurts performance for tiny chunks :( Things should run fine again if you would use approx. 100MiB chunks |
we should be setting |
Can you elaborate? Only for this benchmark or generally for .reduce? |
ah my bad. I misread it. Yes your diagnosis seems right to me. We should update the benchmark. |
What is your issue?
#9889 (comment)
Quite a few timeouts and warnings nowadays with the benchmarks, but it runs now at least.
Failing after 115m
The text was updated successfully, but these errors were encountered: