Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type of sample of Exponential{T} is not T #1902

Open
wouterwln opened this issue Sep 17, 2024 · 11 comments · May be fixed by #1905
Open

Type of sample of Exponential{T} is not T #1902

wouterwln opened this issue Sep 17, 2024 · 11 comments · May be fixed by #1905

Comments

@wouterwln
Copy link

wouterwln commented Sep 17, 2024

MWE:

θ = 1.0f0
d = Exponential(θ)
typeof(rand(Exponential(θ))) == typeof(θ)

returns false

Because Exponential sampling is also used when sampling Gamma with shape 1, the problem also occurs there. For Gamma this introduces a type instability, since the return type for sampling for a Gamma depends on whether or not the alpha parameter is equal to 1.0.

@bvdmitri
Copy link
Contributor

This is problematic here

julia> Gamma(1f0, 1f0) |> rand
1.1458127367196822

julia> Gamma(2f0, 1f0) |> rand
1.662515f0

@andreasnoack
Copy link
Member

Please search the issue tracker for several similar issues. I think this issue might be redundant.

@wouterwln
Copy link
Author

Unfortunately it is not. It was introduced in 3946acc which was committed last month. We found this in our downstream test pipeline, so unfortunately this is a new issue.

@andreasnoack
Copy link
Member

Which version of Distributions are you using? On master I'm getting

julia> typeof(rand(Exponential(θ))) == typeof(θ)
true

and that is a consequence of the commit that you link to (which I had forgotten).

@wouterwln
Copy link
Author

wouterwln commented Sep 18, 2024

You are right, on master this seems to be fixed. Reading the discussion in #1885 improved my understanding. The issue that is found by our tests is that eltype(Gamma(1.0f0, 1.0f0)) is not Float32 (for reference, for Normal this returns Float32). The story here implies that eltype should return the type of the samples. Is that correct? If this is not the case, the problem is downstream and we have to make an exception downstream.

Edit: in the documentation it says the following:

eltype(::Type{Sampleable})

  The default element type of a sample. This is the type of elements of the
  samples generated by the rand method. However, one can provide an array of
  different element types to store the samples using rand!.

@bvdmitri
Copy link
Contributor

bvdmitri commented Sep 18, 2024

Indeed for Gamma eltype fallbacks to

Base.eltype(::Type{<:Sampleable{F,Continuous}}) where {F} = Float64

and for Normal eltype is defined as

Base.eltype(::Type{Normal{T}}) where {T} = T

in our code base we basically implicitly tested that eltype(distribution) == typeof(rand(distribution)) and it failed. I would suggest to add those tests in Distributions.jl as well because now the implementation for Gamma does not follow the documented behaviour of eltype(::Type{Sampleable}). Specifically this line This is the type of elements of the
samples generated by the rand method.
which is not for Gamma (and maybe other distributions)

julia> eltype(Gamma(1f0, 1f0)) == typeof(rand(Gamma(1f0, 1f0)))
false

Also this behaviour leads to this inconsistency, where semantically similar code returns different container types

julia> rand(Gamma(1f0, 1f0), 1)
1-element Vector{Float64}:
 1.2745195627212524

julia> [ rand(Gamma(1f0, 1f0)) ]
1-element Vector{Float32}:
 3.234097

@andreasnoack
Copy link
Member

in our code base we basically implicitly tested that eltype(distribution) == typeof(rand(distribution))

I think it's best to avoid eltype(::Distribution) completely. Please refer to the many long discussion in the issue tracker. As an example of the issue here consider the case

julia> eltype(Normal(1//1, 1//1))
Rational{Int64}

julia> rand(Normal(1//1, 1//1))
0.3227598879915683

so eltype(::Normal) is just lying to you.

@bvdmitri
Copy link
Contributor

You cannot expect users to refer to long discussion in the issue tracker. The issue tracker is for developers, not for users. Users are looking at the documentation of the package. And the documentation is misleading. If the documentation is lying it must be changed.

@bvdmitri
Copy link
Contributor

As a side note, obtaining the exact sample type without having to call eltype(rand(...)) is indeed a highly useful feature for preallocating containers, as the documentation suggests. However, it seems like this functionality is currently somewhat broken. For example, even with your code:

julia> distribution = Normal(1//1, 1//1)
Normal{Rational{Int64}}=1//1, σ=1//1)

julia> container = zeros(eltype(distribution), 10);

julia> Distributions.rand!(distribution, container)
ERROR: MethodError: no method matching randn(::Random.TaskLocalRNG, ::Type{Rational{Int64}})

This issue makes it challenging to write generic code that preallocates containers for samples from arbitrary distributions, as eltype doesn’t seem to be designed for this purpose and is "lying". Is there an alternative method to handle this?

@andreasnoack
Copy link
Member

andreasnoack commented Sep 18, 2024

My point is that this is an open issue and part of the discussion here is just repeating arguments from other issues.

obtaining the exact sample type without having to call eltype(rand(...)) is indeed a highly useful feature for preallocating containers

This is the eternal challenge. There is no easy way to know this type. You can

  1. Compute the elements and promote the container as you go
  2. Ask Julia's type inference (please don't do this)
  3. Try to guess

When you try to write generic code then 3 almost always ends up being wrong at some point. Part of the issue here is that Distributions was original written just for Float64 and Int64 parameters and random variates and then later made generic which invalidated some of the original assumptions. But now I'm repeating explanations already available in other issues.

@bvdmitri
Copy link
Contributor

I understand your point, but the documentation is a primary source of information for most users (and not GitHub issues) and seems to be incorrect in its current form. If the previous original assumptions were invalidated it should be reflected in the documentation too. Now the documentation states that eltype returns the type of the samples, which isn't accurate. Would it be possible to at the very least just update the documentation? This will certainly prevent confusion and the need for repeated explanations in other issues.

@devmotion devmotion linked a pull request Sep 25, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants