feat(SMT): reverse mutations generation, mutations serialization #355

polydez · 2024-12-12T11:46:07Z

Related to 0xPolygonMiden/miden-node#527

While refactoring account inclusion proofs in-memory storage, I realized that it's very hard to keep persistent state for additional "initial" structure, and it would be easier to keep only reverse mutation sets and the latest state. Thus, we need to store only reverse mutations set, but latest state we can still reconstruct from the database.

In this PR I implemented reverse mutations set generation on mutations applying (using a new method apply_mutations_with_reversion(). Also implemented serialization of mutations set and added benchmarks for mutations calculation/applying.

src/merkle/smt/mod.rs

bobbinth

Looks good! Thank you! Not an in-depth review yet, but I left a few small comments inline.

The only concern I have is that now we produce reverse mutations set on each applying of mutations set. For the node it's fine, because we always need them after changing of the latest accounts SMT. But if someone else uses our SMT, it might be unnecessary to calculate such reversion on each mutations set applying.

Yeah - I was thinking about this as well. Could you run benchmarks to see how this affects things - especially for relatively large trees? (we may already have benchmarks for this).

If the effect is significant, I wonder if there is some clean interface to let the user apply mutations with or without generating the reverse set. (but let's run the benchmarks first).

CHANGELOG.md

src/merkle/smt/mod.rs

polydez · 2024-12-16T10:45:50Z

If the effect is significant, I wonder if there is some clean interface to let the user apply mutations with or without generating the reverse set.

One of solutions would be introducing new feature for that behavior.

bobbinth · 2024-12-16T18:12:06Z

One of solutions would be introducing new feature for that behavior.

As discussed offline, having 2 separate methods would be a more flexible solution (because there may be situations where the same codebase may want to use different versions of the method). Maybe, under the hood, both methods would use a single method that could look something like:

fn apply_mutations(
    &mut self,
    mutations: MutationSet<DEPTH, Self::Key, Self::Value>,
    return_inverse_mutations: bool,
) -> Result<MutationSet<DEPTH, Self::Key, Self::Value>, MerkleError>

polydez · 2024-12-17T18:33:49Z

Could you run benchmarks to see how this affects things - especially for relatively large trees? (we may already have benchmarks for this).

If the effect is significant, I wonder if there is some clean interface to let the user apply mutations with or without generating the reverse set. (but let's run the benchmarks first).

I implemented benchmark for apply_mutations and here are the results on MacBook Pro with Apple M3 Max (unnecessary lines were removed):

With generation of reverse mutations set:

apply_mutations/SimpleSmt: apply_mutations/1000
time: [30.044 ms 30.826 ms 31.697 ms]

apply_mutations/SimpleSmt: apply_mutations/100000
time: [3.7554 s 3.8398 s 3.9335 s]

Without generation of reverse mutations set:

apply_mutations/SimpleSmt: apply_mutations/1000
time: [23.827 ms 24.089 ms 24.289 ms]
change: [-5.3989% -3.4174% -1.7821%] (p = 0.00 < 0.05)
Performance has improved.

apply_mutations/SimpleSmt: apply_mutations/100000
time: [3.2683 s 3.4088 s 3.5618 s]
change: [-15.168% -11.224% -6.7039%] (p = 0.00 < 0.05)
Performance has improved.

Here we can see up to 11% of performance improvement (for 100,000 pairs). The bad news here is that it takes up to 3.6 seconds to apply mutation set (without any other operations, like computations of mutation set). It's higher than current limit per block (65,536 accounts), but I think, it's too slow for fitting in 2 seconds per block limit.

bobbinth · 2024-12-18T10:33:32Z

Here we can see up to 11% of performance improvement (for 100,000 pairs). The bad news here is that it takes up to 3.6 seconds to apply mutation set (without any other operations, like computations of mutation set). It's higher than current limit per block (65,536 accounts), but I think, it's too slow for fitting in 2 seconds per block limit.

To clarify: how big are the trees in these benchmarks and how many leaves are we updating? For now, we can assume that we probably won't be updating more than 10K leaves per block - so, I'd benchmark this number. Could you benchmark how long this takes on a fairly large tree (e.g., 1M or 10M entries)?

If this still takes a long time, we may need to start thinking about how to optimize the apply_mutations() method.

polydez · 2024-12-18T14:10:30Z

To clarify: how big are the trees in these benchmarks and how many leaves are we updating? For now, we can assume that we probably won't be updating more than 10K leaves per block - so, I'd benchmark this number. Could you benchmark how long this takes on a fairly large tree (e.g., 1M or 10M entries)?

In these benchmarks we set up trees to have 10x more leaves than updates (10K leaves for 1K updates and 1M leaves for 100K updates). I will try separate benchmarks for 1M and 10M trees with 10K updating leaves.

polydez · 2024-12-18T14:19:58Z

If this still takes a long time, we may need to start thinking about how to optimize the apply_mutations() method.

We should also benchmark compute_mutations() method (and other time-consuming phases) and add "time spent" metrics for all block building phases because mutations application is just a part of block building.

polydez · 2024-12-18T15:52:21Z

@bobbinth, I have first numbers.

For tree with 1M leaves, benchmark 10K updates:

apply_mutations/SimpleSmt: apply_mutations/10000
time: [2.1603 s 2.2503 s 2.3556 s]

bobbinth · 2024-12-18T17:52:49Z

apply_mutations/SimpleSmt: apply_mutations/10000
time: [2.1603 s 2.2503 s 2.3556 s]

This is pretty slow (probably around 10x slower than I thought it would be). This basically means we spend 0.1 ms per updated leaf. Could you check what's the bottleneck here?

polydez · 2024-12-19T05:47:13Z

apply_mutations/SimpleSmt: apply_mutations/10000
time: [2.1603 s 2.2503 s 2.3556 s]

This is pretty slow (probably around 10x slower than I thought it would be). This basically means we spend 0.1 ms per updated leaf. Could you check what's the bottleneck here?

The bottleneck here is BTreeMap itself. Applying mutations set is just inserting/updating values in BTreeMap. There is no duplicates because mutations set contains of two maps with unique keys.

I will try to rewrite SimpleSmt to hashmaps, as proposed in one of issues, and measure performance change.

polydez · 2024-12-19T08:16:39Z

I've rewritten SimpleSmt and Smt to hashbrown hashmaps, as well, as MutationSet. This dramatically improved performance:

apply_mutations/SimpleSmt: apply_mutations/10000
time: [154.52 ms 163.09 ms 174.46 ms]
change: [-93.242% -92.753% -92.142%] (p = 0.00 < 0.05)
Performance has improved.

But overall benchmarking takes more than half of hour for 10 samples. So probably compute_mutations still takes the most of the time, but this time is not included into the benchmark. I will try to measure it.

bobbinth · 2024-12-19T08:36:23Z

I've rewritten SimpleSmt and Smt to hashbrown hashmaps, as well, as MutationSet. This dramatically improved performance:

Very nice! Could you create a separate PR for this? Not sure if possible, but would be awesome to have this behind a feature flag.

Also, we are mostly interested in Smt (rather than SimpleSmt) as we'll be moving to use Smt for both accounts and nullifiers.

So probably compute_mutations still takes the most of the time, but this time is not included into the benchmark. I will try to measure it.

We are working on parallelizing this - so, hopefully, we should be able to run compute_mutations on multiple cores.

polydez · 2024-12-19T08:54:12Z

Could you create a separate PR for this? Not sure if possible, but would be awesome to have this behind a feature flag.

Do we rely on ordering of iterators over inner nodes or leaves? This is the only obstacle I can see now for full migration of Smt/SimpleSmt to hashmaps.

bobbinth · 2024-12-19T09:33:56Z

Do we rely on ordering of iterators over inner nodes or leaves? This is the only obstacle I can see now for full migration of Smt/SimpleSmt to hashmaps.

Hmmm - good question. I don't think we should be, but I'm not 100% sure we are not.

bobbinth · 2024-12-19T09:37:34Z

Do we rely on ordering of iterators over inner nodes or leaves? This is the only obstacle I can see now for full migration of Smt/SimpleSmt to hashmaps.

Hmmm - good question. I don't think we should be, but I'm not 100% sure we are not.

One thing that comes to mind is that order of leaves is important during Smt construction (specifically for the multi-threaded version) and so, the order in which we serialize leaves would also be important. So, maybe we do need to preserve the iteration order. But now that I think of it, I'm not sure the current iterators have the same order as what is assumed by the constructor.

bobbinth · 2024-12-19T09:51:17Z

btw, I think we already have some benchmarks for computing mutations here.

polydez · 2024-12-19T09:54:45Z

Benchmarking of compute_mutations() method (tree size and mutation count are the same, as before: 1M and 10K):

compute_mutations/SimpleSmt: compute_mutations/10000
time: [2.6533 s 2.6876 s 2.7379 s]

btw, I think we already have some benchmarks for computing mutations here.

Ah, didn't find this, thank you!

polydez · 2024-12-19T09:58:32Z

One thing that comes to mind is that order of leaves is important during Smt construction (specifically for the multi-threaded version) and so, the order in which we serialize leaves would also be important.

If it's important, we can sort leaves before serialization.

bobbinth

Looks good! Thank you! I left a couple of small comments inline.

benches/smt-mutations.rs

src/merkle/smt/full/tests.rs

polydez · 2024-12-26T10:11:45Z

As we had decided before, I rebased this branch onto v0.12.0 tag. We're going to release this as a patch (v0.12.1, I guess).

sonarqubecloud · 2024-12-27T02:10:10Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

bobbinth

All looks good! Thank you!

polydez requested a review from bobbinth December 12, 2024 11:46

polydez marked this pull request as ready for review December 12, 2024 11:48

polydez changed the title ~~feat: revert mutations generation, mutations serialization~~ feat: reverse mutations generation, mutations serialization Dec 12, 2024

polydez changed the title ~~feat: reverse mutations generation, mutations serialization~~ feat(SMT): reverse mutations generation, mutations serialization Dec 12, 2024

polydez commented Dec 12, 2024

View reviewed changes

src/merkle/smt/mod.rs Show resolved Hide resolved

polydez commented Dec 12, 2024

View reviewed changes

src/merkle/smt/mod.rs Show resolved Hide resolved

bobbinth reviewed Dec 16, 2024

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

src/merkle/smt/mod.rs Show resolved Hide resolved

src/merkle/smt/mod.rs Outdated Show resolved Hide resolved

polydez mentioned this pull request Dec 20, 2024

Added support for hashmaps in Smt and SimpleSmt #363

Merged

polydez force-pushed the polydez-mutations-revert branch from 90b993e to 70a6383 Compare December 23, 2024 15:05

polydez changed the base branch from next to main December 23, 2024 16:52

bobbinth reviewed Dec 24, 2024

View reviewed changes

benches/smt-mutations.rs Outdated Show resolved Hide resolved

src/merkle/smt/full/tests.rs Outdated Show resolved Hide resolved

polydez requested a review from bobbinth December 26, 2024 09:24

feat: revert mutations generation, mutations serialization

99149fd

polydez added 14 commits December 26, 2024 14:59

docs: update CHANGELOG.md

b85fd4a

fix: compilation error on using no-std target

24d435f

refactor: address review comments

32f10ea

feat: implement benchmark

c25f59f

docs: fix error

ca2fe00

feat: add benchmark for compute_mutations method

61634bb

refactor: add getters for MutationSet fields

69c2097

fix: clippy warnings

bb31cbe

refactor: make two separate methods for mutations application

1a2eca5

tests: check both apply_mutations and apply_mutations_with_reversion

3cc7003

feat: add num_leaves method for Smt

0596892

refactor: improve ad-hoc benchmarks

6737fc0

refactor: remove criterion benches

533c459

fix: compilation errors

7777e33

polydez force-pushed the polydez-mutations-revert branch from 868fa4e to 7777e33 Compare December 26, 2024 10:07

bobbinth and others added 5 commits December 26, 2024 16:24

Merge branch 'main' into polydez-mutations-revert

01c0ffa

chore: fix build

06596e2

chore: minor smt benchmark improvements

2a2f7c5

chore: minor smt benchmark updates

eeaf3c1

chore: update crate version to v0.13.1

0e3f79f

bobbinth approved these changes Dec 27, 2024

View reviewed changes

bobbinth merged commit 589839f into main Dec 27, 2024
13 checks passed

bobbinth deleted the polydez-mutations-revert branch December 27, 2024 02:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(SMT): reverse mutations generation, mutations serialization #355

feat(SMT): reverse mutations generation, mutations serialization #355

polydez commented Dec 12, 2024 •

edited

Loading

bobbinth left a comment

polydez commented Dec 16, 2024

bobbinth commented Dec 16, 2024

polydez commented Dec 17, 2024

bobbinth commented Dec 18, 2024

polydez commented Dec 18, 2024 •

edited

Loading

polydez commented Dec 18, 2024 •

edited

Loading

polydez commented Dec 18, 2024 •

edited

Loading

bobbinth commented Dec 18, 2024

polydez commented Dec 19, 2024

polydez commented Dec 19, 2024

bobbinth commented Dec 19, 2024

polydez commented Dec 19, 2024 •

edited

Loading

bobbinth commented Dec 19, 2024

bobbinth commented Dec 19, 2024

bobbinth commented Dec 19, 2024

polydez commented Dec 19, 2024

polydez commented Dec 19, 2024

bobbinth left a comment

polydez commented Dec 26, 2024

sonarqubecloud bot commented Dec 27, 2024

bobbinth left a comment

feat(SMT): reverse mutations generation, mutations serialization #355

feat(SMT): reverse mutations generation, mutations serialization #355

Conversation

polydez commented Dec 12, 2024 • edited Loading

bobbinth left a comment

Choose a reason for hiding this comment

polydez commented Dec 16, 2024

bobbinth commented Dec 16, 2024

polydez commented Dec 17, 2024

bobbinth commented Dec 18, 2024

polydez commented Dec 18, 2024 • edited Loading

polydez commented Dec 18, 2024 • edited Loading

polydez commented Dec 18, 2024 • edited Loading

bobbinth commented Dec 18, 2024

polydez commented Dec 19, 2024

polydez commented Dec 19, 2024

bobbinth commented Dec 19, 2024

polydez commented Dec 19, 2024 • edited Loading

bobbinth commented Dec 19, 2024

bobbinth commented Dec 19, 2024

bobbinth commented Dec 19, 2024

polydez commented Dec 19, 2024

polydez commented Dec 19, 2024

bobbinth left a comment

Choose a reason for hiding this comment

polydez commented Dec 26, 2024

sonarqubecloud bot commented Dec 27, 2024

Quality Gate passed

bobbinth left a comment

Choose a reason for hiding this comment

polydez commented Dec 12, 2024 •

edited

Loading

polydez commented Dec 18, 2024 •

edited

Loading

polydez commented Dec 18, 2024 •

edited

Loading

polydez commented Dec 18, 2024 •

edited

Loading

polydez commented Dec 19, 2024 •

edited

Loading