-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast native toplevel using JIT #15
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Jeremie Dimino <[email protected]>
To summarize, your proposal is as follows:
This sounds like a very reasonable approach to me. (I had this in mind when I replied to your earlier emails but never formulated it clearly, sorry.) Minor comment: The way this RFC references earlier work by Marcell Fischbach and Benedikt Meurer is slightly confusing; I'm not sure you would reuse much of their work (except the parts that have already been upstreamed, typically the linear-scan register allocator). In particular their suggestion to have a |
Indeed. I guess the only code part of Marcell Fischbach and Benedikt Meurer's work we would reuse is the C code, which I'm assuming is independent of how the assembly is generated. |
To clarify: what we have is a way to generate machine code (+ relocation information) from the "x86 assembly AST" (introduced to share code between the two supported assembly syntaxes). Currently, we dump this machine code with a COFF emitter to produce .obj files, but for the use case discussed here, we'd need to write some dynamic code loader directly from the generated machine code + relocations (i.e. put the code in executable pages and apply the relocation) and symbol tables. This should be rather simple I think (and is perhaps covered by the "C code" from Marcell Fischbach and Benedikt Meurer's work). |
Co-Authored-By: Nicolás Ojeda Bär <[email protected]>
We discussed this quickly at the last OCaml developer meeting. There are a few questions around the portability of writing to executable memory. We are now going to build a prototype using LexiFi binary code emitter and test it on various platforms (Linux, OSX, BSD and Windows) in order to get a clearer picture of the difficulties. Once this is done, we will discuss this proposal further with the rest of the dev team. |
To the best of my knowledge, the "LexiFi binary code emitter" was, in a large part, written by me at OCamlPro, for LexiFi. It extends the COFF linker written by Alain with an x86/amd64 in-memory assembler (i.e. Intel Symbolic Assembly 32/64 bits to binary code) and an ELF linker for Linux. The code emitter was also included in |
Hi Fabrice, happy to discuss. I'm going to follow up by email to find a time. |
Hello, are there any updates on the progress? |
@entrust1234 hi, please don't post 'any update?' comments on issues, it spams everyone who is subscribed. You can subscribe to the issue to receive updates. Thanks! |
FTR, I'm no longer driving the project. My colleague @mshinwell took over. I'll let him and/or @NathanReb comment, but what I heard from them about the JIT was positive :) |
A quick update on the JIT for the native toplevel: We have a working prototype, implemented as a library outside of the compiler. It requires a couple simple hooks to be added to Opttoploop (soon to be the unified Toploop) and to expose some of the existing types and functions defined there but it is all fairly minimal. The library provides a We're now working on a branch of MDX using the JIT so we can test it on real world use cases such as RealWorldOcaml or on JaneStreet's internal code base, making sure it works as intended and that the performance gain is what we expect. If that goes well, we'll move on and start upstreaming the changes we need in the native toplevel, hopefully making the JIT available for OCaml 4.13! |
Thanks for the change! I still think this is a very nice project and I'm glad to get the update. In the interest of starting the bikeshedding early: I'm not sure about the "Jit" name because (1) today people associate JITs with dynamic-recompiling implementations, and not just on-demand code emission, so it comes with a lot of assumptions/associations that are not realized here, and (2) the previous toplevel was already "just in time" in the same sense as your prototype, the main difference is whether you go through external tools or emit binary (encoded assembly) directly. I don't think we should debate this right now, but maybe in the next few weeks/months you may have ideas for alternate names. |
Does this mean native toplevel will be as usable as the bytecode one? |
@NathanReb is code unload part of the current JIT implementation? |
It won't be, although I've had some thoughts as to how to do it. |
@mshinwell if you have time, please share, I'm interested on it for Tezos and I got an example working but only if the code has no data reference(I can ensure it by validating the cmm) |
I haven't thought about this for literally years, so my memory is hazy. However the general idea was the following. The most difficult problem is probably that, before unloading, you need to make sure there aren't any left-over code pointers into the dynamically-loaded/generated code. I think the problematic places these could occur are on the stack, in live physical registers or in the OCaml heap. The stack (and all thread stacks) could be scanned to ensure there are no references into the relevant (i.e. dynamically-loaded/generated) code areas before unloading; if a reference is found, the unloading could be tried later. I think there are various different cases here:
Live physical registers could be scanned in a similar way, using the existing liveness information. For the heap, the places the code pointers might occur (assuming no The other area of difficulty concerns statically-allocated data, as you mentioned. Maybe we could just not statically allocate anything for these dynamically-loaded modules. I tend to think there is probably a more general solution involving specific GC regions for each dynamically loaded/generated compilation unit, though the GC doesn't support anything like this at present. |
P.S. In fact the code pointers scheme above relies on all closures in the dynamically-loaded/generated units being dynamically allocated, otherwise the finaliser will never be called. |
@NathanReb would you by chance have some information on the current status of the native-toplevel revival? The "unify the toploop implementations" part was done (in large part) in #10124. Were people able to test the native toplevel inside |
We indeed tested it. The work is available on github and is briefly documented here.
I tried to provide clear information on how to set all this up in the various repos so you should be able to try it fairly easily. Please reach out to me if anything needs to be clarified! While working on this we also spotted differences between the native and bytecode toplevels that need to be fixed on the native toplevel side. These are showcased in the There also seems to be an issue with how Next steps from here are a few patch to the compiler and toplevel libraries
I'll be working on those very shortly as we'd very much like to get this into 4.14! |
I'm curious about the current status of the project. Any news? |
The compiler work for this was released with 4.14:
Additionally there were two fixes to bring the behaviour of
All this work, I believe, is being used internally at Jane Street (rebased onto OCaml 4.12) with a customised version of the mdx tool. I believe @NathanReb and @Leonidas-from-XIV are aiming to release a version of mdx using ocaml-jit (and so using native mode to interpret mdx documents) in the next couple of months. |
Thanks for the news. One side-effect benefit I hoped for this project is to get a usable |
Sorry to jump in that late but, why has this PR been closed?
The proposal looked interesting, and there is no justification given for
closing the PR.
Could we either re-open it, or explain clearly why it has been closed?
|
I think it's because Jérémie's github account has been deleted. |
Mark Shinwell (2022/08/29 06:50 -0700):
Reopened #15.
Okay, thanks. And thanks a lot for havign re-opened it, the topic seems
interesting and worth exploring to me.
|
My understanding is that currently the project is on hold. @dra27 took care of upstreaming the necessary hooks to be able to implement a JIT outside the compiler, which is enough for the Personally I hope that we will eventually get native binary emission in the compiler upstream (or maybe in a well-identified external library), as we discussed when the RFC was originally written, for example reusing the Lexifi code -- the discussion of this is a large part of the RFC. I think this would be especially useful in combination with MetaOCaml, and in general an excellent contribution for the whole ecosystem, not just mdx. (It also come with delicate questions of code maintenance etc.) But Jérémie is not working on this anymore, and I don't know if the remaining people are interested in doing the extra work to make the project more widely useful. |
Fast native toplevel using JIT
Overview
We (Jane Street + OCL/Tarides) would like to make the native toplevel faster and more self-contained.
At the moment, the native toplevel works by calling the assembler and linker for each phrase. This makes it slow and dependent on an external toolchain which is not great for deployment.
To reach this goal, we would simply like to bring this work up to date and merge it in the compiler.
Motivation
This work would provide a simple way to compile and execute OCaml code at runtime, which would unlock a lot of new possibilities to develop great new tools.
Coupled with the fact that we can already embed cmi files into an executable, this work would make it possible to distribute a self-contained binary that can evaluate OCaml code at runtime. This would make it simple and straightforward to use OCaml as an extension language.
Verified examples in documentation comment
We are particularly interested in this feature for the mdx tool. More precisely, we are currently working on a feature allowing verified toplevel snippets in mli files. For instance:
In the above example, the
{[ ... ]}
would be kept up to date bymdx
to ensure that the document stays in sync when the code changes. In fact, the user would initially only write the#
lines andmdx
would insert the results just as with expectation tests.The change in detail
This change would add JIT code generation for x86 architectures as described in the paper. For other architectures, we would still rely on the portable method of calling the assembler and linker. The main additions to the compiler code base would be:
The paper mentions that it adds 2300 lines of OCaml+C code to the compiler code base.
One detail to mention: IIUC the JIT ocamlnat from the paper goes directly from the linear form to binary assembly. Now that we have a symbolic representation of the assembly we could also go from the symbolic assembly in order to share more logic between normal compilation and JIT.
We discussed with @alainfrisch and @nobj since LexiFi has been using an in-memory assembler in production for a while. They mentioned that they would be happy to open-source the code if they can, which means that we could be using code that has been running in production for a long time and is likely to be well tested and correct.
LexiFi's binary emitter is about 1800 lines of code including comments and newlines. This looks a bit smaller than the JIT part of the JIT ocamlnat, so we would still be adding approximately the same amount of code if we went this way.
Drawback
This is one more feature to maintain in the compiler and it comes with a non-negligible amount of code. However, and especially if we can reuse LexiFi in-memory assembler, most of the additions would come from well tested code. @alainfrisch and @nobj also mentioned that this code was very low-maintenance and had pretty much not changed in 5 years.
Alternatives
For the mdx case, we considered a few alternatives.
Using a bytecode toplevel
Mdx currently uses a bytecode toplevel where everything is compiled and executed in byte-code. This includes:
as a result, mdx is currently very slow and the round-trip time between the user saving a file and seeing the result easily climbs in the tens of seconds.
In the case of Jane Street, we have one more difficulty with this method: a lot of our code doesn't work at all in bytecode because we never use bytecode programs.
Staging the build
Given that mdx is a build tool, one alternative is to redesign the interaction between mdx and the build system. For instance, it could done in stages with a first step where mdx generates some code that is then compiled and executed normally by the build system. This is how the cinaps tool works for instance.
However, it is difficult to faithfully reproduce the behavior of the toplevel with this method. What is more, such a design is tedious and requires complex collaboration between the tool and the build system.
Going through this amount of complexity for every build tool that wants to compile OCaml code on the fly doesn't feel right.
Using a mixed native/bytecode mode
One idea we considered is using a mixed mode where a native application can execute bytecode. This would work well for us as the snippets of code we evaluate on the fly are always small and fast.
However, it is completely new work while the native JIT has already been done. What is more, while it would work for us it might not work for people who care about the performance of the code evaluated on the fly.
A native JIT would likely benefit more people.