Disable bottleneck by default? #7344

dcherian · 2022-12-01T17:26:11Z

What is your issue?

Our choice to enable bottleneck by default results in quite a few issues about numerical stability and funny dtype behaviour: #7336, #7128, #2370, #1346 (and probably more)

Shall we disable it by default?

TomNicholas · 2022-12-01T19:24:24Z

I kinda think correctness by default is more important than performance, especially if the default performance isn't prohibitively slow.

max-sixty · 2022-12-01T22:22:02Z

I'd be fine with disabling, since bottleneck doesn't seem to be actively maintained.

Though I would say it's numerically unstable rather than incorrect! I would always want it enabled, but it does make sense to default to the conservative option.

I had dreams of making numbagg into a better bottleneck — it's just as fast, much more flexible, and integrates really well with xarray. But those dreams have not come to pass (yet!).

shoyer · 2022-12-04T01:55:34Z

The case where Bottleneck really makes a difference was for moving window statistics, where it uses a smarter algorithm than our current NumPy implementation, which creating a moving window view.

Otherwise, I agree, it probably isn't worth the trouble.

That said -- we could also switch to smarter NumPy based algorithms to implement most moving window calculations, e.g,. using np.nancumsum for moving window means.

markelg · 2022-12-14T13:54:06Z

I fully agree that correctness is the priority. Note however that some functions now require bottleneck, like ffill and bfill (I am not sure if there are more cases). They may need to be modified so they can run without bottleneck.

shoyer · 2022-12-14T18:24:04Z

I think it's OK to still require bottleneck for ffill and bfill:

There are no numerical concerns: these functions simply repeat numbers forward (or backwards).
There is no good alternative to using a loop, and writing the loop in NumPy would be probitively slow.

headtr1ck · 2023-03-10T20:31:28Z

Maybe it is just a problem of documenting it more clearly?

riley-brady · 2023-05-24T23:12:48Z

I want to add a +1 to disable it by default. It's pretty common to be using float32 precision arrays. I have a rolling mean operation early on in some code and the errors balloon over time in subsequent processes. This was a super obscure bug to track down as well.

bqi343 · 2024-03-26T20:54:58Z

Though I would say it's numerically unstable rather than incorrect! I would always want it enabled, but it does make sense to default to the conservative option.

I recently stumbled across a case where bottleneck gives a completely incorrect answer or segfaults when taking the max of a small array (similar to: pydata/bottleneck#381 - a fix was proposed but never merged).

I think it's OK to still require bottleneck for ffill and bfill:

As of #8389 bottleneck is no longer used for ffill and bfill by default. Given the issues w/ precision and correctness, is there any remaining reason to not disable it by default? If any remaining functions require bottleneck, it would make sense to enable bottleneck by default for those functions only.

shoyer · 2024-03-26T22:25:32Z

I would support disabling bottleneck by default for now, and eventually permanently removing support for bottleneck. Numbagg seems to be a more than capable and much more maintainable replacement.

max-sixty · 2024-03-26T22:26:51Z

As an update — numbagg is in a pretty good place now, and xarray defaults to using numbagg over bottleneck for overlapping functions when both are installed.

(I didn't finish my fuzzing work in numbagg, so I'm not as confident as I'd hope to be, but also no one has reported anything for a long time, so the "fuzzing from the world" has been successful...)

Numbagg doesn't do all of bottlenecks functions — there are functions that aren't practical to write with numba, since they require specialized data structures, such as moving max.

...though I'm not sure if that updates us towards or away from disabling it by default — the substitute is better than it used to be, but also the substitute overrides many of its functions by default when installed...

bqi343 · 2024-03-27T00:22:25Z

...though I'm not sure if that updates us towards or away from disabling it by default — the substitute is better than it used to be, but also the substitute overrides many of its functions by default when installed...

If I don't have numbagg installed but happen to have bottleneck installed, I would prefer for the default behavior to be using np.nanmax instead of the incorrect bottleneck.nanmax from the issue I linked above.

dcherian added the topic-performance label Dec 1, 2022

dcherian mentioned this issue Dec 13, 2022

Improve performance for large loads with dask Ouranosinc/xclim#912

Closed

headtr1ck mentioned this issue Mar 10, 2023

Instability when calculating standard deviation #7336

Closed

4 tasks

dcherian mentioned this issue Apr 13, 2023

Optimize ndrolling nanreduce #4325

Open

maawoo mentioned this issue Dec 11, 2023

Revert #29 Jena-Earth-Observation-School/sdc-tools#32

Merged

riley-brady mentioned this issue Jul 12, 2024

Rewrite TestRPmatrix::test_local_exceedance_imp_pass to not rely on loading data CLIMADA-project/climada_python#915

Open

dcherian added the topic-rolling label Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable bottleneck by default? #7344

Disable bottleneck by default? #7344

dcherian commented Dec 1, 2022

TomNicholas commented Dec 1, 2022

max-sixty commented Dec 1, 2022

shoyer commented Dec 4, 2022

markelg commented Dec 14, 2022

shoyer commented Dec 14, 2022

headtr1ck commented Mar 10, 2023

riley-brady commented May 24, 2023

bqi343 commented Mar 26, 2024 •

edited

Loading

shoyer commented Mar 26, 2024

max-sixty commented Mar 26, 2024

bqi343 commented Mar 27, 2024 •

edited

Loading

Disable bottleneck by default? #7344

Disable bottleneck by default? #7344

Comments

dcherian commented Dec 1, 2022

What is your issue?

TomNicholas commented Dec 1, 2022

max-sixty commented Dec 1, 2022

shoyer commented Dec 4, 2022

markelg commented Dec 14, 2022

shoyer commented Dec 14, 2022

headtr1ck commented Mar 10, 2023

riley-brady commented May 24, 2023

bqi343 commented Mar 26, 2024 • edited Loading

shoyer commented Mar 26, 2024

max-sixty commented Mar 26, 2024

bqi343 commented Mar 27, 2024 • edited Loading

bqi343 commented Mar 26, 2024 •

edited

Loading

bqi343 commented Mar 27, 2024 •

edited

Loading