Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Risk of hash collisions with Dry::Core::Memoizable #63

Open
JacobEvelyn opened this issue Oct 21, 2021 · 1 comment
Open

Risk of hash collisions with Dry::Core::Memoizable #63

JacobEvelyn opened this issue Oct 21, 2021 · 1 comment
Labels

Comments

@JacobEvelyn
Copy link

Hello! I'm writing on behalf of the team behind MemoWise, a Ruby memoization gem that aims to provide fast memoized lookups. First of all, I wanted to say we're big fans of Dry::Core::Memoizable—had we known about it years ago we might not have ever written our gem and instead just used yours. 😄

In a recent push to increase performance, we took inspiration from Dry::Core::Memoizable and used Array#hash directly to produce our hash keys, rather than using array keys (which the hash has to compare using both Array#hash and a more expensive Array#eql? call). This is a really clever solution @flash-gordon came up with, and it made our gem much faster.

However, after discussions among our team we've decided we don't feel comfortable with this approach because of its risk of hash collisions: two different sets of method arguments could hash to the same integer, and our gem would return incorrect memoized results. While unlikely, collisions are not impossible (which is why hashes use both #hash and #eql? when comparing keys). This behavior would also be nondeterministic and hard for users to debug because Ruby uses pseudorandom hash functions so collisions will vary between different runs of the Ruby interpreter.

We haven't noticed any hash collisions ourselves, but because the bug would be subtle we can't be confident it hasn't already happened to us, and for the time being we do not consider this optimization safe. We wanted to let you know we plan to update MemoWise to use array hash keys again instead of directly calling .hash to produce integer keys, and give you the opportunity to similarly update Dry::Core::Memoizable if you agree with our conclusion. Please let us know what you decide to do! If you decide to remove Array#hash, we'll update our benchmarks to compare against the latest version of Dry::Core::Memoizable. If you decide not to, that's okay too—we'll just link to this issue explaining the differences between our gems.

@flash-gordon
Copy link
Member

I've never found time to answer this properly but I always felt the problem is significant. From what I can see right now ruby has space of 10^18 (=2^63) variants for .hash results. The chance of collision on 1M numbers is < 10^-6, less than 1 millionth. If I have time for revisiting the current implementation I might think of something more reliable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants