Deserializable ast nodes #3079
Replies: 3 comments 3 replies
-
Interesting, we already have 2 directions that are touching the AST, there may be things you can leverage. Given we are going to have 3 PRs touching the AST, and I'm the anti-macro guy on this project, I wonder whether we should start doing some codegen scripts for these kind of things. e.g. https://github.com/facebook/hermes/tree/main/unsupported/hermes/crates/hermes_estree_codegen I don't mind exploring new things, but I'm kind of busy until end of May, so I may be slow to respond. If you have questions, you may ask them on discord https://discord.gg/9uXCAwqQZW for quicker feedbacks. For deserialization, @overlookmotel may have thought about it, but he is busy as well 😅 |
Beta Was this translation helpful? Give feedback.
-
Hi @branchseer. You are bringing up some really interesting issues (this and #2943). I like your style! I do think this is worth looking into. I looked into speeding up serializing/deserializing when I made my first attempt at AST transfer (#2457) a couple of years ago on SWC. I found that even rkyv wasn't fast enough for my liking, and ended up building my own serialization library ser_raw - which is gloriously simple, and therefore also extremely fast. I abandoned the effort on SWC in the end as the maintainers weren't interested, and so I never completed Oxc also has a major advantage over SWC for this kind of thing, in that all the AST data is already clustered together in the arena allocator's buffers. So to write to cache, you could just dump the entire contents of the arena to disk without any serialization step. For restoring from cache, then it doesn't need deserialization in the usual sense. All you need to do is pull the data back from disk into memory, "rehydrate" the arena, and offset all the pointers in the data ( Correcting the pointers is easier said than done, though. I imagine would need either:
abomination (which But, either way, if a way of doing it can be found, it would blow the pants off every other ser/deser scheme that's out there. Assuming read speed of the disk containing the cache is fast, no matter how stupendous the speed of Oxc + Rolldown, I imagine that this would still be a significant speed-up, because it'd be doing practically no work. I hope these thoughts are of some interest to you. Sorry they're a little unfocused. I'm afraid that like Boshen, I am completely tied up for a while so wouldn't be able to give much time to this for a few months, but very happy to trade thoughts if you want to go further with it. |
Beta Was this translation helpful? Give feedback.
-
Thank you both for your insights! I understand you are busy. I'll leave my random thoughts here. To pressure to respond ;)
This reminds me of my little experiment: https://branchseer.github.io/tser. Maybe it could be somewhat helpful to the codegen approach?
Very intriguing concept! I wonder how much unused data are in the arena, (like when a
Is this portable across different arch/os? I mean we don't need think about portability when transfering ast in memory, but it's a good feature for disk caching. |
Beta Was this translation helpful? Give feedback.
-
Deserializable ast nodes (and types in
oxc_semantic
) open up possibilities for persistent cache in oxc-based bundlers (rolldown/rolldown#802).Since ast nodes borrow data from the allocator, deserialization is a bit trickier to implement than serialization (requires stateful deserializer). Here are my current findings:
serde
hasDeserializeSeed
for such case, but it's quite limited and doesn't support nested borrowing types: Collections compatibility with Serde fitzgen/bumpalo#63 (comment).bincode2
currently doesn't support stateful deserializer. I managed to implement it in Decode context bincode-org/bincode#710, but I'm not sure if it'll get accepted.rkyv
supports custom deserializer as a type paramD
in itsDeserialize
trait, I think this is doable with something likeDeserialize<SomeAstNode<'alloc>, &'alloc Allocator>
, although I haven't actually tried it.Before further digging I want to ask if contributions on this feature are welcomed? If so, I suggest to firstly introduce a central proc macro for (de)serialization attributes, like what swc does, so that whatever deserializing attributes we need up using, we'd only need to add them in one place.
Beta Was this translation helpful? Give feedback.
All reactions