-
-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Smart automatic dask wrapping #530
Comments
cc @dcherian : are there other options we should consider? |
Is there a discussion ongoing with Dask devs? I think it should consider |
Not from our end at least. |
I'd be willing to review accept, and guide a PR for this (with the proposed solution), but I don't know enough about load-balancing to create a solution that is good enough. |
Probably somewhere in Dask we look at x.nbytes when instead we should call
nbytes(x) . A fix upstream for that would be welcome.
…On Wed, Nov 10, 2021, 7:05 AM Hameer Abbasi ***@***.***> wrote:
I'd be willing to review accept, and guide a PR for this (with the
proposed solution), but I don't know enough about load-balancing to create
a solution that is good enough.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#530 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTHDEN5XGMUOJTMCPYLULKC2JANCNFSM5HMATQUA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
I recommend raising upstream. Pinging me directly is no longer a reliable
way to report issues to Dask.
…On Thu, Nov 11, 2021, 1:34 PM Matthew Rocklin ***@***.***> wrote:
Probably somewhere in Dask we look at x.nbytes when instead we should call
nbytes(x) . A fix upstream for that would be welcome.
On Wed, Nov 10, 2021, 7:05 AM Hameer Abbasi ***@***.***>
wrote:
> I'd be willing to review accept, and guide a PR for this (with the
> proposed solution), but I don't know enough about load-balancing to create
> a solution that is good enough.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#530 (comment)>, or
> unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AACKZTHDEN5XGMUOJTMCPYLULKC2JANCNFSM5HMATQUA>
> .
> Triage notifications on the go with GitHub Mobile for iOS
> <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
> or Android
> <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
>
>
|
Is your feature request related to a problem? Please describe.
Creating a dask array from a
SparseArray
, if chunks are not given, dask will automatically rechunk the data based on its size and the config (dask'sarray.chunk-size
). The point of sparse arrays is that they can be enormous but still only hold a few values. Dask doesn't see that. As a result, from a single array many chunks are created, which multiplies the number of tasks in dask's graph and thus affects performance.Evidently, when working directly with
sparse
anddask
, one can explicitly give the chunks of the requested dask array, but this is not possible when using it under the hood.My use case is a
xarray.DataArray
using asparse.COO
array that is included in axarray.apply_ufunc
call together with a dask-backedDataArray
. In that case, xarray sends all inputs todask.array.apply_gufunc
and the wrapping into dask happens indask.array.core.asarray
. Our option is thus to pre-wrap our sparse array to a dask array, before the computation. I think it would be interesting if this was done implicitly.Describe the solution you'd like
The cleanest option I see is to implement
SparseArray.to_dask_array
. It will be detected and used by dask automatically. There we could wrap to a dask array taking into account that the real size of the array is from.nnz
and not.shape
. Optionally, we could read the config of dask to respectarray.chunk-size
.Describe alternatives you've considered
Alternatives are:
But I felt that here was the best place.
Additional context
Raised by issue pangeo-data/xESMF#127.
Example
The text was updated successfully, but these errors were encountered: