Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instability when calculating standard deviation #7336

Closed
4 tasks
ShihengDuan opened this issue Nov 29, 2022 · 4 comments
Closed
4 tasks

Instability when calculating standard deviation #7336

ShihengDuan opened this issue Nov 29, 2022 · 4 comments

Comments

@ShihengDuan
Copy link

What happened?

I noticed that for some large values (not really that large) and lots of samples, the data.std() yields different values than np.std(data). This seems to be related to the magnitude. See attached code here:

nino34_tas_picontrol_detrend = nino34_tas_picontrol-298
std_dev = nino34_tas_picontrol_detrend.std()
print(std_dev.data)
std_dev = nino34_tas_picontrol.std()
print(std_dev.data)
nino34_tas_picontrol_detrend = nino34_tas_picontrol-10
std_dev = nino34_tas_picontrol_detrend.std()
print(std_dev.data)

and the results are:

1.4448999166488647
24.911161422729492
20.054718017578125

image

So I guess this is related to the magnitude, but not sure. Anyone has similar issue?

What did you expect to happen?

Adding or subtracting a constant should not change the standard deviation.
See screenshot here about what the data look like:
image

Minimal Complete Verifiable Example

No response

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.4 (main, Mar 31 2022, 08:41:55) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1160.71.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.8.1

xarray: 2022.6.0
pandas: 1.4.4
numpy: 1.22.3
scipy: 1.8.1
netCDF4: 1.6.1
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.2
nc_time_axis: 1.4.1
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.5
dask: 2022.9.0
distributed: 2022.9.0
matplotlib: 3.5.2
cartopy: 0.21.0
seaborn: None
numbagg: None
fsspec: 2022.10.0
cupy: None
pint: None
sparse: 0.13.0
flox: None
numpy_groupies: None
setuptools: 65.5.0
pip: 22.2.2
conda: None
pytest: None
IPython: 8.6.0
sphinx: None

@ShihengDuan ShihengDuan added bug needs triage Issue that has not been reviewed by xarray team member labels Nov 29, 2022
@ShihengDuan
Copy link
Author

image
When I adding or subtracting a constant value, the std is changing. Any idea why or anywhere I'm wrong?

@max-sixty
Copy link
Collaborator

max-sixty commented Nov 30, 2022

This is probably bottleneck: #7128

Do these reproduce after xr.set_options(use_bottleneck=False)?

@ShihengDuan
Copy link
Author

ShihengDuan commented Nov 30, 2022

Thanks, it works now.
image

So I guess the default setting is to use bottleneck? It might be helpful to add some hints in the docs.

@TomNicholas TomNicholas added upstream issue and removed needs triage Issue that has not been reviewed by xarray team member labels Nov 30, 2022
@headtr1ck
Copy link
Collaborator

Closing this issue now as completed. Discussion about documentation can be continued in #7344

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants