-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable getting a proof for non-current account states #527
Comments
I think the same strategy should be applied to the |
@igamigo I have a concern or need an additional clarification regarding this issue. If account was changed after the requested block, is obsolete account's state still useful for transaction? I mean, should we rather return error (something like "account was updated") in this case? In my vision, the result will be useful for transaction execution only when the latest account update was before the requested block. If so, this will significantly simplify overall solution, since we do not need to track account changes, but only proofs. |
In the context of transaction execution, there are a couple of things to keep in mind:
Additionally, things like the client might also have extra validations to avoid retrieving stale data, etc. But in general, I think there can be cases where the user retrieves a state that is not the latest and could be considered still valid (but there should probably be a hard limit after which it does not make sense to retrieve account proofs). I am also not sure if there are going to be any other use cases for this endpoint (or |
For implementing this we actually have to solve two problems:
I've tried different ideas and approaches for doing these and I think I end up with the simplest to implement now but yet rather effective solution. Getting Account StateCurrently we store the latest account's state and all account state deltas from the very beginning of account's lifetime. Having initial account's state, we can transition it by sequentially applying deltas to its intermediate states. There are two feasible solutions (without complete account storage redesign) to get account's state prior the latest account's update:
Getting Inclusion ProofCurrently we have in-memory sparse Merkle tree which stores the latest account hashes for each account in blockchain. It's updated each new block by computing update first and then applying it to the current tree. If try to avoid ending up with completely different account's hashes storage structure, there a few solutions I can see:
I think, the third solution might be a good compromise for us now. |
There is a bit of pain associated with (3). Or more specifically, the current forward delta is not enough - you need to know the reverse delta i.e. what value did each node start with. It is possible though, I don't think its super trivial. Its also fairly expensive computationally since you are recomputing hashes (?) - unless the delta's are just hash updates. There is a fourth option iiuc. Cache the complete account proofs for all recently updated accounts. The benefit of this is that it doesn't involve the original tree at all. The downside is its a bit wasteful ito memory usage, at least if implemented naively. e.g.: If account Memory usage can be improved by creating a proper tree, with Effectively we would have the latest tree, and a separate sparse caching tree structure that only has the updated account proofs. |
Yes, but since we need this only for reversing order applying, we can use the same delta format, but compute them differently (i.e. additionally to current
I didn't think this way, thank you! Will try to think about it.
Yes, that's the way I was thinking when tried to end up with "perfect" solution for this task. In my idea I tried to make multiple SMTs which reuse same subtrees if they have equal hashes. Each root could be account's root for particular block number. Pruning of old roots would force unused nodes to prune as well. But the main difficulty there was how to access account hashes by account id and I haven't end up with efficient solution of it yet. |
I was thinking about it and there is an issue with other accounts. When one account was changed, its subtree has to be recalculated up to the root. And this affects proofs of all other accounts. For example, if account 1 was updated in block |
Indeed you'll have to store the account proof per block for an account until it exceeds the limit. |
My point was, that it's not enough to store proofs for only updated accounts, because proofs of unchanged accounts are also updated on any blockchain state change. In other words, update of one account affects proofs of all accounts. |
It does, but you only need to cache the recently updated accounts. For every account updated But in general all solutions are just different ways of expressing the same subgraph. My suggestion is just a de-normalized variation where you flatten the graph. i.e. least efficient ito memory usage but also low effort ito implementation. |
Taking into account ideas from our call, we end up with the following solutions for the corresponding problems: Getting Account StateCurrently we store the latest account's state and all account state deltas from the very beginning of account's lifetime. We store deltas in a single Let's refactor deltas in order to make them reversible (i.e. store old values along with new ones). And we also should store deltas in set of separate tables, so that it will be possible to calculate single delta from block A to block B by just using SQL queries: CREATE TABLE
account_deltas
(
account_id INTEGER NOT NULL,
block_num INTEGER NOT NULL,
old_nonce INTEGER NOT NULL,
new_nonce INTEGER NOT NULL,
PRIMARY KEY (account_id, block_num),
FOREIGN KEY (account_id) REFERENCES accounts(account_id),
FOREIGN KEY (block_num) REFERENCES block_headers(block_num),
CONSTRAINT account_deltas_nonce_increased CHECK (old_nonce < new_nonce)
) STRICT;
CREATE TABLE
account_storage_delta_values
(
account_id INTEGER NOT NULL,
block_num INTEGER NOT NULL,
slot INTEGER NOT NULL,
old_value BLOB NOT NULL,
new_value BLOB NOT NULL,
PRIMARY KEY (account_id, block_num, slot),
FOREIGN KEY (account_id) REFERENCES accounts(account_id),
FOREIGN KEY (block_num) REFERENCES block_headers(block_num),
CONSTRAINT account_storage_delta_value_updated CHECK (old_value != new_value)
) STRICT, WITHOUT ROWID;
CREATE TABLE
account_storage_delta_map_values
(
account_id INTEGER NOT NULL,
block_num INTEGER NOT NULL,
slot INTEGER NOT NULL,
key BLOB NOT NULL,
old_value BLOB NOT NULL,
new_value BLOB NOT NULL,
PRIMARY KEY (account_id, block_num, slot, key),
FOREIGN KEY (account_id) REFERENCES accounts(account_id),
FOREIGN KEY (block_num) REFERENCES block_headers(block_num),
CONSTRAINT account_storage_delta_map_value_updated CHECK (old_value != new_value)
) STRICT;
CREATE TABLE
account_fungible_asset_deltas
(
account_id INTEGER NOT NULL,
block_num INTEGER NOT NULL,
faucet_id INTEGER NOT NULL,
delta INTEGER NOT NULL,
PRIMARY KEY (account_id, block_num, faucet_id),
FOREIGN KEY (account_id) REFERENCES accounts(account_id),
FOREIGN KEY (block_num) REFERENCES block_headers(block_num)
) STRICT, WITHOUT ROWID;
CREATE TABLE
account_non_fungible_asset_delta_actions
(
account_id INTEGER NOT NULL,
block_num INTEGER NOT NULL,
asset BLOB NOT NULL,
is_remove INTEGER NOT NULL, -- 0 - add, 1 - remove
PRIMARY KEY (account_id, block_num, asset),
FOREIGN KEY (account_id) REFERENCES accounts(account_id),
FOREIGN KEY (block_num) REFERENCES block_headers(block_num)
) STRICT, WITHOUT ROWID; So, a procedure of getting account state for requested block number will look like:
Getting Inclusion ProofCurrently we have in-memory sparse Merkle tree which stores the latest account hashes for each account in blockchain. It's updated each new block by computing update first and then applying it to the current tree. When we need to get account inclusion proof, we traverse tree from account's leaf up to the root, putting hash of each passed node to the list. We can keep computed updates for the last So, an algorithm of getting account inclusion proof for requested block number will be:
UPD: I think, if the solution performs well, we won't even need to keep latest state SMT, it will be enough to have "initial" ( IMO, this is the most memory/computations effective solution so far. |
I think this works. A couple of comments:
I think this works. There could also be other ways to keep track of this data. For example, we could merge all changesets into a single data structure - something like: BTreeMap<NodeIndex, Vec<(u32, Digest)>> Where the internal tuples are Overall, I'd probably abstract this away behind a new struct - maybe something like this: pub struct AccountTree {
tree: Smt,
updates: SmtUpdates,
} I terms of PRs, I think we actually have 2 PRs here:
I'd probably start with the delta structure as we need it for other purposes too. |
We can join different tables in a single request, maybe
This is good idea and in our current solution it should work, but we should decide first, whether we keep all deltas from the very beginning, or prune deltas older than some number of blocks.
Thank you! I totally agree, this will require more operations for pruning old data, especially moving data in vectors. I will think about some optimizations here. |
Regarding #563 (comment), here are some of the ways we currently use endpoints and how these retrieving old account states would come into play:
|
Seems like overall we have a need for 3 actions:
For the first point, we should be able to come up with a decent solution and I know @polydez is currently working on it. For the second point, we kind of already have a solution - though, one thing that bothers me there is how to keep the size of the returned deltas reasonable (i.e., if a lot of updates were done to the account between blocks For the third point, the solution we currently have is to return the entire account data and only for the chain tip. We need to think of how to refactor this as both of these could be issues (the account may be too big to return in a single request, but if we return the account data over multiple requests, the account state may change between these requests). The biggest issue here is mostly dealing with storage maps and account vault as all other data takes up at most 8KB per account and so in theory, we could store multiple version of that for states close to the chain tip. How to store/retrieve account vault and storage map data is still an open question for me. |
@igamigo, @bobbinth, thank you for sharing your vision!
We can use the solution from #563, but in order to limit computations we should also add purging of obsolete deltas on each block applying. This can be achieved by executing queries like "remove all values for blocks older than This solution has several advantages over rewriting which we discussed before:
For the proposed solution, the only difficulty I see is that account state might be big to return in a single request. We can limit response size and introduce paging with caching of generated responses for short period of time (e.g. 20 seconds or so after the last request, but we probably would like to share the cached results between similar requests from different users). There is also no problem with account state changes during fetching of pages: user requests account state for specified block number and the only thing we get from the latest state is account's code which is immutable by now. Once we add support for mutable account's code, we will store code updates in account's deltas table. |
Right, this is my biggest concern: how do we deal with accounts where the state is big (e.g., 100 MB or even 1 GB). We could use pagination, but what would we paginate over and how much overhead would supporting pagination add? If we break the account down we have several components:
We could design an endpoint which always returns the first 3 items in this list and maybe some number of entries for the other items. Then, we'd need endpoints to retrieve additional items for storage maps and asset vault and depending on how we'd want to do that, we may need to refactor how we store account data in the database.
I think whether we go with this solution or something else will depend on the answers to the above question. For example, we may decide to store account assets in a table which has the following fields: It is also possible I'm overthinking this as dealing with accounts which require 100MB or 1GB of storage may require a completely different approach. |
I have doubts if we really need to support such big accounts. I think, we should limit account storage size and big account storage should also require bigger fees. Few examples from different blockchains:
And talking about asset vault, how many assets we would practically see? For DEX it might be plenty of fungible tokens, let's say, 1 thousand. For some NFT market it might be really many, millions of generated non-fungible assets. But we probably should force (or incentive) developers to spread such big numbers between many smaller accounts. |
I can't find it now, but I think there are quite a few contracts on Ethereum that take up more than 100MB of storage. I agree that storage should be expensive, but I am not sure we should limit it beyond that. Similar thinking goes for the asset vault. We could imagine contracts with millions of NFTs where each NFT is 32 bytes. So, we are again in the realm of dozens of megabytes and potentially more. So, we might want to re-factor our
In the future, we may even put public account data into a separate database so that it doesn't slow down the main database - but for now its simpler to keep them together. |
Currently, the
GetAccountProofs
RPC method provides a way to retrieve public account data, alongside a merkle proof of the account being included in a block. Specifically, you can get the account's header representing the account's state, and an account storage header, which contains all top-level (meaning, either values or roots of maps) elements.This proof is currently being generated exclusively for the chain tip. However, realistically, a user will not always have the latest block when executing a transaction that uses FPIs. For this, the current endpoint could be updated to return proofs for an arbitrary block (close to the chain tip), making transaction execution easier to set up on the user side.
The text was updated successfully, but these errors were encountered: