Caching for Repeated Parsing #40

joshcho · 2023-09-02T09:56:22Z

I'm working on a performance-critical project that requires repeated parsing with parseclj. I'm interested in knowing whether there's built-in support for caching parsed results. If not, would memoization be an effective way to add this feature? Additionally, are there any caveats or issues to be aware of when implementing caching in conjunction with parseclj? If there's any places I should look, let me know.

Thanks!

plexus · 2023-09-02T10:18:11Z

That's kind of hard to answer in general. Unless you have a lot of identical inputs I don't see how caching will help you much. It might also not be trivial to memoize since parseclj works on temp buffers. You'd have to convert the buffer contents to a string and use that as your caching key. Parsing EDN is always somewhat slow, it's not a format designed for fast parsing. Parseclj is hand written to be reasonably fast, but it's also emacs lisp so... maybe look at transit or cbor.

…

On Sat, Sep 2, 2023, 11:56 Josh Cho ***@***.***> wrote: I'm working on a performance-critical project that requires repeated parsing with parseclj. I'm interested in knowing whether there's built-in support for caching parsed results. If not, would memoization be an effective way to add this feature? Additionally, are there any caveats or issues to be aware of when implementing caching in conjunction with parseclj? If there's any places I should look, let me know. Thanks! — Reply to this email directly, view it on GitHub <#40>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAH3VF674ZJOE27Z3Q7IGTXYL7FDANCNFSM6AAAAAA4IPUW5M> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

joshcho · 2023-09-02T10:36:34Z

I am essentially continually parsing an editing buffer, so most of it is identical. I am wondering which part/function of the package would best have the caching/memoization. Naively I can probably do sth like only cache inputs of strings that are large enough.

plexus · 2023-09-02T13:31:24Z

Parseclj is a parse reduce parser, rather that a recursive descent parser. The only place I can imagine you might be able to do something is at the reduce step, based on the top few elements of the stack, but it's not gonna be as easy as wrapping some function in "memoize", this is going to require deep understanding of what it's doing.

plexus · 2023-09-02T13:32:51Z

I would see what you can do to avoid parsing the whole buffer, maybe rerun the parser only on the current top level form.

Alternatively look into the treesitter parser, that's more built for this kind of use case.

joshcho changed the title ~~How to Implement Caching for Repeated Parsing Operations to Improve Performance?~~ Caching for Repeated Parsing Sep 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caching for Repeated Parsing #40

Caching for Repeated Parsing #40

joshcho commented Sep 2, 2023

plexus commented Sep 2, 2023 via email

joshcho commented Sep 2, 2023

plexus commented Sep 2, 2023

plexus commented Sep 2, 2023

Caching for Repeated Parsing #40

Caching for Repeated Parsing #40

Comments

joshcho commented Sep 2, 2023

plexus commented Sep 2, 2023 via email

joshcho commented Sep 2, 2023

plexus commented Sep 2, 2023

plexus commented Sep 2, 2023