Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(SMT): reverse mutations generation, mutations serialization #355

Merged
merged 20 commits into from
Dec 27, 2024

Conversation

polydez
Copy link
Contributor

@polydez polydez commented Dec 12, 2024

Related to 0xPolygonMiden/miden-node#527

While refactoring account inclusion proofs in-memory storage, I realized that it's very hard to keep persistent state for additional "initial" structure, and it would be easier to keep only reverse mutation sets and the latest state. Thus, we need to store only reverse mutations set, but latest state we can still reconstruct from the database.

In this PR I implemented reverse mutations set generation on mutations applying (using a new method apply_mutations_with_reversion(). Also implemented serialization of mutations set and added benchmarks for mutations calculation/applying.

@polydez polydez requested a review from bobbinth December 12, 2024 11:46
@polydez polydez marked this pull request as ready for review December 12, 2024 11:48
@polydez polydez changed the title feat: revert mutations generation, mutations serialization feat: reverse mutations generation, mutations serialization Dec 12, 2024
@polydez polydez changed the title feat: reverse mutations generation, mutations serialization feat(SMT): reverse mutations generation, mutations serialization Dec 12, 2024
Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you! Not an in-depth review yet, but I left a few small comments inline.

The only concern I have is that now we produce reverse mutations set on each applying of mutations set. For the node it's fine, because we always need them after changing of the latest accounts SMT. But if someone else uses our SMT, it might be unnecessary to calculate such reversion on each mutations set applying.

Yeah - I was thinking about this as well. Could you run benchmarks to see how this affects things - especially for relatively large trees? (we may already have benchmarks for this).

If the effect is significant, I wonder if there is some clean interface to let the user apply mutations with or without generating the reverse set. (but let's run the benchmarks first).

CHANGELOG.md Outdated Show resolved Hide resolved
src/merkle/smt/mod.rs Show resolved Hide resolved
src/merkle/smt/mod.rs Outdated Show resolved Hide resolved
@polydez
Copy link
Contributor Author

polydez commented Dec 16, 2024

If the effect is significant, I wonder if there is some clean interface to let the user apply mutations with or without generating the reverse set.

One of solutions would be introducing new feature for that behavior.

@bobbinth
Copy link
Contributor

One of solutions would be introducing new feature for that behavior.

As discussed offline, having 2 separate methods would be a more flexible solution (because there may be situations where the same codebase may want to use different versions of the method). Maybe, under the hood, both methods would use a single method that could look something like:

fn apply_mutations(
    &mut self,
    mutations: MutationSet<DEPTH, Self::Key, Self::Value>,
    return_inverse_mutations: bool,
) -> Result<MutationSet<DEPTH, Self::Key, Self::Value>, MerkleError>

@polydez
Copy link
Contributor Author

polydez commented Dec 17, 2024

Could you run benchmarks to see how this affects things - especially for relatively large trees? (we may already have benchmarks for this).

If the effect is significant, I wonder if there is some clean interface to let the user apply mutations with or without generating the reverse set. (but let's run the benchmarks first).

I implemented benchmark for apply_mutations and here are the results on MacBook Pro with Apple M3 Max (unnecessary lines were removed):


With generation of reverse mutations set:

apply_mutations/SimpleSmt: apply_mutations/1000
time: [30.044 ms 30.826 ms 31.697 ms]

apply_mutations/SimpleSmt: apply_mutations/100000
time: [3.7554 s 3.8398 s 3.9335 s]

Without generation of reverse mutations set:

apply_mutations/SimpleSmt: apply_mutations/1000
time: [23.827 ms 24.089 ms 24.289 ms]
change: [-5.3989% -3.4174% -1.7821%] (p = 0.00 < 0.05)
Performance has improved.

apply_mutations/SimpleSmt: apply_mutations/100000
time: [3.2683 s 3.4088 s 3.5618 s]
change: [-15.168% -11.224% -6.7039%] (p = 0.00 < 0.05)
Performance has improved.


Here we can see up to 11% of performance improvement (for 100,000 pairs). The bad news here is that it takes up to 3.6 seconds to apply mutation set (without any other operations, like computations of mutation set). It's higher than current limit per block (65,536 accounts), but I think, it's too slow for fitting in 2 seconds per block limit.

@bobbinth
Copy link
Contributor

Here we can see up to 11% of performance improvement (for 100,000 pairs). The bad news here is that it takes up to 3.6 seconds to apply mutation set (without any other operations, like computations of mutation set). It's higher than current limit per block (65,536 accounts), but I think, it's too slow for fitting in 2 seconds per block limit.

To clarify: how big are the trees in these benchmarks and how many leaves are we updating? For now, we can assume that we probably won't be updating more than 10K leaves per block - so, I'd benchmark this number. Could you benchmark how long this takes on a fairly large tree (e.g., 1M or 10M entries)?

If this still takes a long time, we may need to start thinking about how to optimize the apply_mutations() method.

@polydez
Copy link
Contributor Author

polydez commented Dec 18, 2024

To clarify: how big are the trees in these benchmarks and how many leaves are we updating? For now, we can assume that we probably won't be updating more than 10K leaves per block - so, I'd benchmark this number. Could you benchmark how long this takes on a fairly large tree (e.g., 1M or 10M entries)?

In these benchmarks we set up trees to have 10x more leaves than updates (10K leaves for 1K updates and 1M leaves for 100K updates). I will try separate benchmarks for 1M and 10M trees with 10K updating leaves.

@polydez
Copy link
Contributor Author

polydez commented Dec 18, 2024

If this still takes a long time, we may need to start thinking about how to optimize the apply_mutations() method.

We should also benchmark compute_mutations() method (and other time-consuming phases) and add "time spent" metrics for all block building phases because mutations application is just a part of block building.

@polydez
Copy link
Contributor Author

polydez commented Dec 18, 2024

@bobbinth, I have first numbers.

For tree with 1M leaves, benchmark 10K updates:


apply_mutations/SimpleSmt: apply_mutations/10000
time: [2.1603 s 2.2503 s 2.3556 s]


@bobbinth
Copy link
Contributor

apply_mutations/SimpleSmt: apply_mutations/10000
time: [2.1603 s 2.2503 s 2.3556 s]

This is pretty slow (probably around 10x slower than I thought it would be). This basically means we spend 0.1 ms per updated leaf. Could you check what's the bottleneck here?

@polydez
Copy link
Contributor Author

polydez commented Dec 19, 2024

apply_mutations/SimpleSmt: apply_mutations/10000
time: [2.1603 s 2.2503 s 2.3556 s]

This is pretty slow (probably around 10x slower than I thought it would be). This basically means we spend 0.1 ms per updated leaf. Could you check what's the bottleneck here?

The bottleneck here is BTreeMap itself. Applying mutations set is just inserting/updating values in BTreeMap. There is no duplicates because mutations set contains of two maps with unique keys.

I will try to rewrite SimpleSmt to hashmaps, as proposed in one of issues, and measure performance change.

@polydez
Copy link
Contributor Author

polydez commented Dec 19, 2024

I've rewritten SimpleSmt and Smt to hashbrown hashmaps, as well, as MutationSet. This dramatically improved performance:


apply_mutations/SimpleSmt: apply_mutations/10000
time: [154.52 ms 163.09 ms 174.46 ms]
change: [-93.242% -92.753% -92.142%] (p = 0.00 < 0.05)
Performance has improved.


But overall benchmarking takes more than half of hour for 10 samples. So probably compute_mutations still takes the most of the time, but this time is not included into the benchmark. I will try to measure it.

@bobbinth
Copy link
Contributor

I've rewritten SimpleSmt and Smt to hashbrown hashmaps, as well, as MutationSet. This dramatically improved performance:

Very nice! Could you create a separate PR for this? Not sure if possible, but would be awesome to have this behind a feature flag.

Also, we are mostly interested in Smt (rather than SimpleSmt) as we'll be moving to use Smt for both accounts and nullifiers.

So probably compute_mutations still takes the most of the time, but this time is not included into the benchmark. I will try to measure it.

We are working on parallelizing this - so, hopefully, we should be able to run compute_mutations on multiple cores.

@polydez
Copy link
Contributor Author

polydez commented Dec 19, 2024

Could you create a separate PR for this? Not sure if possible, but would be awesome to have this behind a feature flag.

Do we rely on ordering of iterators over inner nodes or leaves? This is the only obstacle I can see now for full migration of Smt/SimpleSmt to hashmaps.

@bobbinth
Copy link
Contributor

Do we rely on ordering of iterators over inner nodes or leaves? This is the only obstacle I can see now for full migration of Smt/SimpleSmt to hashmaps.

Hmmm - good question. I don't think we should be, but I'm not 100% sure we are not.

@bobbinth
Copy link
Contributor

Do we rely on ordering of iterators over inner nodes or leaves? This is the only obstacle I can see now for full migration of Smt/SimpleSmt to hashmaps.

Hmmm - good question. I don't think we should be, but I'm not 100% sure we are not.

One thing that comes to mind is that order of leaves is important during Smt construction (specifically for the multi-threaded version) and so, the order in which we serialize leaves would also be important. So, maybe we do need to preserve the iteration order. But now that I think of it, I'm not sure the current iterators have the same order as what is assumed by the constructor.

@bobbinth
Copy link
Contributor

btw, I think we already have some benchmarks for computing mutations here.

@polydez
Copy link
Contributor Author

polydez commented Dec 19, 2024

Benchmarking of compute_mutations() method (tree size and mutation count are the same, as before: 1M and 10K):


compute_mutations/SimpleSmt: compute_mutations/10000
time: [2.6533 s 2.6876 s 2.7379 s]


btw, I think we already have some benchmarks for computing mutations here.

Ah, didn't find this, thank you!

@polydez
Copy link
Contributor Author

polydez commented Dec 19, 2024

One thing that comes to mind is that order of leaves is important during Smt construction (specifically for the multi-threaded version) and so, the order in which we serialize leaves would also be important.

If it's important, we can sort leaves before serialization.

@polydez polydez force-pushed the polydez-mutations-revert branch from 90b993e to 70a6383 Compare December 23, 2024 15:05
@polydez polydez changed the base branch from next to main December 23, 2024 16:52
Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you! I left a couple of small comments inline.

benches/smt-mutations.rs Outdated Show resolved Hide resolved
src/merkle/smt/full/tests.rs Outdated Show resolved Hide resolved
@polydez polydez requested a review from bobbinth December 26, 2024 09:24
@polydez polydez force-pushed the polydez-mutations-revert branch from 868fa4e to 7777e33 Compare December 26, 2024 10:07
@polydez
Copy link
Contributor Author

polydez commented Dec 26, 2024

As we had decided before, I rebased this branch onto v0.12.0 tag. We're going to release this as a patch (v0.12.1, I guess).

Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good! Thank you!

@bobbinth bobbinth merged commit 589839f into main Dec 27, 2024
13 checks passed
@bobbinth bobbinth deleted the polydez-mutations-revert branch December 27, 2024 02:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants