-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: constructor unboxing #14
base: master
Are you sure you want to change the base?
Conversation
I like this proposal, and these are definitely optimisations I want to be able to express. However, I think that there are two separate optimisations being proposed here which I'll call inlining and disjointness, and I think it's worth considering the two independently, rather than conflating them with the immediate/block distinction as this proposal currently does. Inlining and disjointnessInlining combines the constructors of different types into a single sum representation. Given: type ab = A | B
type cd = C | D
type u = Foo of ab | Bar of cd [@@unboxed] this proposal inlines the definition of type u = Foo_A | Foo_B | Bar_C | Bar_D Inlining amounts to reasoning about sum types modulo associativity, transforming Disjointness represents certain constructors of different types as the identity, when this introduces no ambiguity. Given:
this proposal represents the
Disjointness amounts to transforming This proposalIf I understand correctly, this proposal currently operates on type definitions
I don't think that this
type ab' = A of int | B of int
type cd' = C of int | D of int
type u' = Foo of ab | Bar of cd [@@unboxed] The inlining optimisation is just as applicable and just as useful as before. I would like to represent type u' = Foo_A of int | Foo_B of int | Bar_C of int | Bar_D of int However, upon seeing block constructors, the currently proposal switches from inlining to disjointness, which fails as the types (The converse, applying disjointness to immediates, does not appear to be useful. It is very difficult to construct two immediate types that are disjoint, as all immediate types tend to assign a meaning to
by representing it as though it were
(This amounts to reasoning about distributivity of products over sums, as well as associativity of sums)
So, with this in mind, might it not make more sense to have separate annotations for inlining (per-field) and disjointness (per-constructor), rather than a single per-type AbstractionThe distinction between inlining and disjointness reappears when considering abstraction. Inlining requires no notion of "separability" - whether something can be inlined depends only on its layout, and is exactly as hard as inlining records in the manner proposed in #10. (Well, I imagine the implementation will be more work, as there's the pattern-matching compiler to worry about) Separability arises only for disjointness optimisations. While a precise analysis of disjointness does require a binary relation between types (which as mentioned above is annoying to abstract), a simple approximation seems to get most of the value of this relation. We can introduce a new layout (i.e. kind) describing those types whose block values do not lie in the tag space used for datatype constructors. This layout includes types This is a very coarse approximation. However, it suffices to accept all of the motivating examples in this proposal. (Finer approximations are also possible. For instance, having separate layouts for string and float tags would allow both a string and a float constructor to be simultaneously unboxed via disjointness) |
The way I think of it, constructors Single-constructor unboxing uses exactly identity (of low-level representations) as the embedding functions. This is also what's going on with what Stephen calls "disjointness", but not with "unboxing": with unboxing, the values are changed by the embedding: in the example of Jeremy where booleans are inlined, Embeddings are restricted by the fact that they must preserve identity of mutable state: "inlining" may not be possible if the argument of the constructor to inline is a record with mutable fields. |
Here's another example where this would be useful: type t = Small of int | Large of mpz_t [@@unboxed] |
The FFI story is not obvious here. We might want the compiler to be allowed to be as clever as possible as soon as the user ask for unboxing (or any other kind of change in representation). Maybe we should just say that the representation is not fixed in that case and this type is not fit for FFI (or can only be manipulated as an abstract type on the C side). A warning triggering whenever that kind of type can reach a C function would be nice. Such a warning would probably be neither correct nor complete but for obvious cases it could still probably help. |
There is also another question that arise with that kind of performance related annotations. How do user discover about that kind of features ? Should there be a mode where the compiler suggest what transformation could be possible ? Something like that would require an non local analysis (you need to annotate some other mlis than the one you are looking at to know that something is possible) |
As a last comment, the notion of separability might not be as stable as one could assume. There is some work going on to try to compile OCaml to webassembly. Such a backend has less control on the shape of values and in particular might not have tags for float or string values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments coming from my partial understanding of this proposal.
My impression is that indeed there are two different problems, separability and compression of "constant" cases into a word. Separability is clear enough, eventhough its is hard to make it prescriptive. The compression part could potentially do more. In both cases, my impression is that the behavior should be either specified by the programmer explicitly (using multiple annotations), or seen as a black box (even if the algorithm is public).
Only a subset of variant definitions support `[@@unboxed]`. In particular, it must be possible to distinguish the arguments of unary constructors from each other (and from constant constructors in the same definition) at run-time. For example, the following definition is not allowed, since the arguments of `X` and `Y` have the same representation: | ||
|
||
```ocaml | ||
type strings = X of string | Y of string [@@unboxed] (* Invalid! *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about the original specification "should not involve an additional block", and what it is supposed to reject.
For instance, for the following
type 'a or_int = Ret of 'a | Err of int [@@unboxed]
it is possible to unbox Err
but clearly not Ret
, so will it be accepted?
In general, should [@@unboxed]
be seen as prescriptive or suggestive.
If it is prescriptive, wouldn't it be better to annotate each constructor individually, meaning that both
type strings = X of string [@@unboxed] | Y of string
and
type strings = X of string | Y of string [@@unboxed]
would be valid, but not
type strings = X of string [@@unboxed] | Y of string [@@unboxed]
Otherwise, it could be just suggestive, meaning "attempt to flatten the type as much as possible".
We might still want a warning when nothing is done.
|
||
### Extension: partial unboxing | ||
|
||
The current proposal requires all unary constructor arguments to have distinguishable representations. It might also be useful to support the case where only some of the arguments are distinguishable by allowing per-constructor unboxing specifications. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I suggested above is already proposed here.
Actually, I find the special handling of unary constructors confusing.
All the more when you think of types such as
type t = A of {x: int}
One could also want to flatten n-ary constructors:
type variance = V of bool * bool * bool [@@unboxed]
Thinking more about this as well:
|
For example I would not be in favor of either
but I would be happy with either of those clearly-disjoint declarations:
|
@chambart : two answer two of your questions
|
I fully agree with @garrigue and @gasche : the annotation should be per constructor so that the user knows exactly which constructors are omitted, and the compiler never chooses which of two constructors should be omitted. |
Note that this is not entirely the case because the representation needs to be taken into account when checking implementations against interfaces. I think this is still all fine -- it just means that your code needs to obey the rules of the native OCaml representation even when you are actually compiling it to JavaScript. |
Sure, but we might have a problem with js_of_ocaml if it needs to have a different interpretation of the attributes than the bytecode compiler (since, well, it is "just" a postprocessing on its output). |
In a sense the point is that if a given compiler pass makes choices based on assumptions on runtime value representations, then only OCaml implementations that satisfy those assumptions can safely reuse this compiler pass. Adding low-level features to the surface language that are based on those assumptions means that those assumptions are now made earlier in the pipeline, and they could in general invalidate design choices of some implementations branching from the main compiler after some passes. In general if the problem arise we always have the option of having a flag to not perform the representation change during compilationoptimization. For this optimization for |
One should ask people who knows better about js_of_ocaml (ping @hhugo), but I believe floats are indeed unboxed. And one could imagine (if it's not already the case) that int32 would also be represented with Javascript numbers (so, cannot be distinguished from int and floats, contrary to OCaml). It's certainly possible to tell ocamlc to not perform the representation optimization when the result is intended to be fed to js_of_ocaml, but then we lose the ability to reuse existing .cma libraries compiled for "normal" use. I'm not saying this is an argument against the optimization (which I like very much), but we need to take that into account (perhaps the conclusion is that js_of_ocaml should introduce its own file suffixes, not reuse .cmo/.cma, and explicitly requires re-compilation; a bit more like Bucklescript, I guess). |
This sounds pretty inconvenient for users that have single codebase targeting both native and js (via jsoo), especially when depending on heavy packages like core-kernel. Switch rebuild takes considerable amount of time, and above suggestion will effectively double it. |
My understanding is that the part of the build that would need to be duplicated (parsing, type-checking and bytecode compilation) takes only a fraction of the time, compared to either native-code production or jsso optimizations and javascript production. I picked a single module (tool/ocamlprof.ml from the compiler distribution), producing the .cmo takes 0.070s on my machine (0.060s from typing), producing the .cmx takes 0.120s on my machine (again 0.060s from typing), calling jsoo on the .cmo takes 0.140s. If you want a switch that installs jsoo modules for all packages, today building this module would take 0.330s (0.070s + 0.120s + 0.140s), tomorrow the .cmo-production step would be duplicated so it would take 0.400s. This is far from a doubling of compilation time. |
Of course this could be reduced further by, say, having an intermediate output for the typedtree, so that all three builds (.cmx, .cmo, .js) could reuse it. Then the build time would go down from 0.330s today to 0.280s. |
I would just like to point out that this is not at all accurate on Jane Street's code base. Type checking is comfortably the largest cost in compilation. I don't think it is particularly relevant to this discussion though, because I think that needing to use different front-end options for js_of_ocaml is both unnecessary and not really a viable suggestion. A better option is to keep the constructors and destructors of these unboxed constructors until later in the compilation pipeline so that exotic back-ends can decide whether to unbox or not based on whether the representations are actually separated on that platform. |
js_of_ocaml currently parses .cmo files; do you mean that we should propagate explicitly the "constructors/destructors" down to the bytecode level? That doesn't sound right. Keeping the information up, to, say, the lambda level would make sense, but then one would need to dump the lambda code and have js_of_ocaml starts from there (as Bucklescript does, IIUC). |
This is a bit tangent to the discussion but regarding
I think that for |
Regarding jsoo: When compiling to JS, Strings will always have some way to be distinguished at runtime. Even if jsoo's string implementation removes the boxing around strings by default (JS engines always provide a type tag to check strings). Even some of the proposed optimizations around partitioning an integer range (bool/char etc) across a set of variant constructors seems like it wouldn't cause problems (so long as the bytecode includes all of the information for renormalizing their values when they leave their "constructors"). On the other hand, if all the other optimizations mentioned are compelling enough, maybe it would be worthwhile to add boxing around floats in jsoo in order to get them. It would solve some of the compatibility issues that jsoo currently has with floats (unmarshaling). For perf sensitive applications, applications could explicitly use an unboxed Either way, it would be nice if jsoo could take advantage of all of the optimizations that do easily apply to jsoo's compilation approach. Does it need to be all or nothing? |
Regarding jsoo-specific compiler artifacts: I think this has the potential to cause compatibility, reliability, or even fragmentation issues within the ocaml ecosystem. jsoo's strength is that it is ocaml ecosystem compatible, and unless you use native C bindings, your packages more or less work well when compiled with jsoo. Importantly, packages that you depend on don't need to anticipate being compiled with jsoo, or do anything special. They don't even need test the jsoo workflow. |
Note: currently choices based on the representation (or not) of constructors are made during the translation from Typedtree to Lambda, in particular during pattern-matching compilation. If we wanted to keep the Lambda representation-agnostic, we would need to use higher-level |
@gasche and me discussed about coq/coq#12733 and found that this RFC would be an elegant solution to this problem. Essentially, they would need an ability to discriminate over the closure tag in an OCaml pattern matching (Obj.tag is too slow for this application, so this is not an option here). (A small extension of) This RFC makes this possible by allowing to unbox functions in an ADT. (A subtlety is that if we allow to unbox functions, then the corresponding constructor would correspond to two tags --the closure tag and the infix tag--, but that does not seem to be really complicated to handle.) The way the Coq folks solved this problem in the past is by changing the tag of the closure to 0 so that this special kind of closures (accumulators) had the same tag as a special ADT constructor which was easy to discriminate using a pattern matching. However, this is not compatible with the new closure representation in no naked pointers mode, hence they need a new solution. |
As far as I understand, for this proposal to be useful for Coq, the Currently: type ind_foo =
| Accu_foo of t (* this constructor is a lie, just to make sure that tag 0 is free for storing a closure *)
| Construct_foo_0 of t * t * ...
| Construct_foo_1 of t * t * ...
| Int_foo_0
| Int_foo_1 With type ind_foo =
| Accu_foo of t -> t [@@unboxed] (* the actual type of the closure is infinite: t -> t -> t -> ... *)
| Construct_foo_0 of t * t * ...
| Construct_foo_1 of t * t * ...
| Int_foo_0
| Int_foo_1 With type ind_foo =
| Accu_foo of t [@@tag 247] (* the type is still a lie, but the intent is clear *)
| Construct_foo_0 of t * t * ...
| Construct_foo_1 of t * t * ...
| Int_foo_0
| Int_foo_1 |
Indeed, you would need some way to do a fine-grained control of which constructor is unboxed. Perhaps this could be done by choosing well the content of the other constructors?
@silene : if such a tag attribute were implemented, then it is very unlikely that the closure tag will be allowed at this place, since the GC expects some particular memory layout when it sees the closure tag.
If |
None of the other constructors takes a function as an argument, so I guess this is fine. In the end, the compiler can unbox any constructor it wants, as long as
That is the whole point, isn't? Coq is being mentioned in this discussion precisely because OCaml's GC will soon expect some specific layout for (non-)closuresclosures, so Coq can no longer create closures with tag 0. But if you are concerned with safety, the solution would be for the compiler to forbid the use of
No, I just wanted to make it clear for the readers that |
To summarize my comment at coq/coq#12733 (comment) : perhaps a fast |
Yes, that's the whole point of this RFC, but, in this RFC, in addition to providing more control over memory layout, we don't want to loose any safety guarantee. If the feature is unsafe or requires using |
Hi, I do not know if it perfectly fits in this RFC, but I think it is at least related. I would appreciate being able to specify that a constructor argument should be "mixed/inlined" within the constructor itself. So for instance, in the case of simple expression: type op = Add | Mul
type t = Cst of int | Neg of t | Op of (op [@product]) * t * t The annotation type t = Cst of int | Neg of t | OpAdd of t * t | OpMul of t * t Thus, matching on "high level" I think it could also work if the mixed type contains non constant constructor but I am not sure if it would be interesting (extracting the value back will need a fresh allocation with copy). May be should it apply only on the constant constructors of the type. |
Hi, We (@nchataing as intern, @gasche as advisor) implemented a variant of @yallop's constructor-unboxing specification as an experimental branch that we would now like to discuss and consider for upstreaming (you can find the original file for this specification at HEAD_SHAPE.spec.md) Our intent was to implement the simplest possible form of unboxing in presence of several constructors, and leave more advanced aspects -- anything that could be left off -- to further work. We support a per-constructor For example: type bignum =
| Short of int [@unboxed] (* represented directly by an integer *)
| Long of Gmp.t (* Block of tag 0 (first non-unboxed constructor) *) Precise specificationWe define the head of an OCaml value as follows:
(In other words, the head tracks whether a value is immediate or a block, and for blocks only keeps the tag.) The "head shape" of a type is a (slight over-approximation of) the set of heads of all possible values of this type. Now consider a variant type declaration containing one or several constructors annotated with type ('a, 'b, ...) t =
| Const0 (* some constant constructors *)
| Const1
| ...
| Const{m}
| NonConst0 of t00 * t01 * ...
| Nonconst1 of t10 * t11 * ...
| ...
| NonConst{n} of t{n}0 * t{n}1 * ...
| Unboxed1 of u0 [@unboxed]
| Unboxed2 of u1 [@unboxed]
| ...
| Unboxed{l} of u{l} [@unboxed] (For simplicity we wrote above all constant constructors first, then all non-constant constructors then all unboxed constructors. But in fact we support arbitrary interleaving of these categories, and the representation is exactly the same as long as the ordering within constant constructors and within non-constant constructors is preserved.) The compiled representation of this type is as follows:
This definition is rejected statically if the unboxed constructors overlap with the other values of the type, in the following precise sense:
Unknown/abstract types are assumed to have a "top" shape with containing potentially all heads. (This should be refined when the abstract type is used to represent an FFI type with a precise shape implemented in C; supporting head shape assertions on abstract types is future work.) Examples(* rejected *)
type t =
| Int of int [@unboxed] (* shape: (Imm, _) *)
| Unit (* shape: (Imm, 0), conflicts with Int above *)
(* accepted *)
type t =
| Int of int [@unboxed] (* shape: (Imm, _) *)
| Box of t (* shape: (Block, 0), as the first non-constant non-unboxed constructor *)
(* shape(t): (Imm, _) ∪ {(Block, 0)} *)
(* accepted *)
type prod = t * t
and t =
| Int of int [@unboxed] (* shape: (Imm, _): any immediate *)
| String of string [@unboxed] (* shape: (Block, Obj.string_tag) (Obj.string_tag is 252) *)
| Prod of prod [@unboxed] (* shape: (Block, 0) *)
(* shape(t): (Imm, _) ∪ {(Block, 0), (Block, Obj.string_tag)} *)
(** With abstract types *)
type abstract
(* accepted *)
type t =
| Int of int [@unboxed] (* shape: (Imm, _) *)
| Abs of abstract (* shape: (Block, 0) *)
(* shape(t): (Imm, _) ∪ {(Block, 0)} *)
(* rejected *)
type t =
| Int of int (* shape: (Block, 0) *)
| Abs of abstract [@unboxed] (* any shape, conflicts with Int *)
(** Nested unboxing *)
type t1 =
| Int of int [@unboxed]
| Block of unit
(* shape(t1): (Imm, _) ∪ {(Block, 0)} *)
(* rejected *)
type t2 =
| T1 of t1 [@unboxed] (* shape: (Imm, _) ∪ {(Block, 0)} *)
| S of string (* shape: (Block, 0), conflicts with T1 *)
(* accepted *)
type t3 =
| T1 of t1 [@unboxed] (* shape: {(Imm, _), (Block, 0)} *)
| S of string [@unboxed] (* shape: (Block, Obj.stringₜag) *)
(* shape(t3): (Imm, _) ∪ {(Block, 0)} ∪ {(Block, Obj.string_tag)} *) Comparison with Yallop's proposal RFC#14Jeremy Yallop's proposal uses a global annotation A major difference is that the RFC#14 specification suggests renumbering constructors in some cases, where the representation of
(Note: @stedolan calls this aspect of RFC#14 "conflating inlining and disjointness". We only deal with disjointness.) separabilityWhen the compiler is in We can track separatedness as part of the head-shape computation for unboxed type declaration, by adding to head-shape data a "separated" bit (see the details in HEAD_SHAPE.impl.md). We reject type declarations whose head-shape is not separated (when in It may be that this tracking is precise enough to entirely replace the pre-existing "separability analysis" of the type-checker. We have not implemented it yet, and have not evaluated this possibility. Leftover question: how close to the compiler-distribution runtime should the specification be?We define static accept/reject decisions for partially-unboxed types using "head shapes", which are defined in terms of the value-representation strategy of the main OCaml implementation. Should we have a more abstract definition, that leaves more room to other representations in alternative implementations? We have not studied this question yet and we believe it is a pressing question. In particular, any choice that would end up being merged in the language probably MUST support the js_of_ocaml value representation. (Do you know of a reference document that describes the js_of_ocaml value representation? Pointers are welcome are we are not jsoo experts ourselves. cc @hhugo.) Our intuition is that we could fine a "weakening" of our current specification that distinguishes less different sort of shapes -- thus rejects more definitions -- and gives good flexibility for alternative implementations. Here are some things we could do:
In other words: what amount of runtime type information should we require from OCaml implementations? At the limit, one extreme choice would be to only reason on the tag of variant constructors (constant or not), which are distinguishable from each other in any OCaml implementation, and not make any other assumption about head shapes (map all types except variants to the "top" shape). This would reject most unboxing definitions, leave maximal freedom for language implementations. Unfortunately this would also prevent the actually-interesting uses of the feature we know about, which mostly resolve around unboxing an This is an aspect of our design on which we need external feedback from people with a good taste for these matters. (cc @xavierleroy, @damiendoligez, @yallop, @stedolan, @lpw25, @let-def, etc.). |
Two things come to mind immediately
|
An extension that might be useful to consider early as it might be related to the question of how tightly to tie the unboxability criterion to the value representation is e.g. type t = Even of int [@unboxed] | Odd of int [@unboxed] | Box of t or type t = Zero | Positive of int [@unboxed] | Negative of int [@unboxed] | Box of t These cases are similar to the normal case of unboxing an I don't know if the added complication of supporting such cases would be worthwhile, but it seems that it would involve considering more than representation of the carried values when determining the shapes. This notion might be productively unifiable with being able to support different backends with different representations. |
In this mode, wouldn't it be enough to simply reject unboxed constructors whose argument can contain floats, checking this property using the head-shape? (No need to keep track of separability in the head-shape itself.) |
Being able to have an unboxed disjoint sum between, say, int and float (e.g. as part of the "value representation" in the interpreter of a dynamically typed language) seems very useful (for those lucky enough to use the no-flat-float-array mode :-)). I can see several approaches:
|
@jberdine: In my mind, your examples (eg. I believe that this use-case would be better served by pattern views, replacing your Maybe I misunderstood. In any case, I would welcome "important examples" of why we would want to go in this direction.
It might be a corner case but your proposal rejects
None of the three options you propose look terribly enticing to me. My preference would go for a variant of (2), where we store assumptions in the build artifacts, and also offer an explicit option to ignore some unboxing (to be discussed) to produce jsoo-compatible artifacts. |
Congrats for the prototype! Regarding the opportunity to fix Coq's native_compute in no-naked-pointers mode, would the implementation of closure unboxing be simple enough to fit a "getting things done" strategy, or would Coq devs be better off pushing for an efficient (For the curiosity of readers here is what I get:
The exception does not arise if I replace |
@gadmm I would prefer to discuss our prototype somewhere else, and reserve this issue to discuss the specification. We hope to open a PR soon-ish, and in the meantime opening an issue on my fork or @nchataing's would be fine. (Currently the prototype does not support unboxing function types, but it could certainly be taught that functions have tag Closure_tag or Infix_tag. The fact that it does let you proceed with a definition that it believes to be incorrect/conflicting looks like a glitch we have to fix. You may want to use the |
While we are in meta-land: please note that our proposal is different from the initial proposal of @yallop. (Maybe we should open another RFC?) We would welcome feedback on whether people find the variant we propose better or worse than the (corresponding subset of) the original proposal. |
It's good to see progress on this. Thank you for picking it up and creating an implementation, @nchataing and @gasche! I agree that per-constructor annotations are the wisest approach for the initial change: their behaviour is straightforward and they handle the most compelling examples (e.g. the Coq accumulator issue, ropes and bignums, although that last also needs support for exposing abstract type representations in order to work optimally).
That's my preferred solution for the js_of_ocaml issue, too. |
Re. Coq, the specification we propose with @nchataing would accept the type type ind_foo =
| Accu_foo of t -> t [@unboxed] (* the actual type of the closure is infinite: t -> t -> t -> ... *)
| Construct_foo_0 of t * t * ...
| Construct_foo_1 of t * t * ...
| Int_foo_0
| Int_foo_1 (Minor note: per-constructor attributes take a single This is assuming that we can dynamically distinguish functions from variant-constructors and immediates on all backends we decide to care about, in particular js_of_ocaml. @hhugo gave information about the numeric representation earlier (thanks!), but not about closures. Is it valid to assume that |
I was wondering what has to be done for people who overwrite tags of values when the tag was non-meaningful, for instance in order to declare that an I think it is enough to ask the programmer who changes non-meaningful tags to make such types abstract, to prevent unboxing. In this case the fix is just to document that the assumptions made on the representation of values according to their type are strengthened when this feature lands. In addition, the no-scan-tag-array use-case sounds reasonable, there is the opportunity to make it official by adding it to the list of recognised tags for types whose tag is non-meaningful. |
We discussed the jsoo-interop problem at a maintainer meeting today. My understanding of the consensus is as follows:
|
@hhugo where can I read documentation about jsoo's value representation? In particular, how are constructors represented? Are they distinguishable from numeric values? |
constant constructor are not distinguishable from numerical value. I'm adding some documentation: |
Thanks! I'm glad I asked :-) |
@gasche : generally speaking, since js_of_ocaml starts from the ocaml bytecode, it cannot use a representation that would allow distinguishing more values than bytecode programs. |
This POPL 2024 paper appears to be related:
|
Indeed, we wrote a paper based on Nicolas' internship results and some additional contributions following, including the line of thought suggested by Stephen on the relation to The current draft is available at http://gallium.inria.fr/~scherer/research/constructor-unboxing/constructor-unboxing-popl-2024.pdf . Anyone is welcome to provide feedback as I have a window of a few days/weeks to make changes before the camera-ready version. (The Acknowledgments of the paper, currently sitting on page 30, mention a few people here by name, notably @yallop for authorship of the proposal, @jhjourdan, @silene and @gadmm for their discussion of vm_compute, @stedolan, Nicolas and myself.) (Note: we could probably have notified the community of this presentation work earlier, but POPL used an unusually strict interpretaion of double-blind this year that made this difficult. From the CFP: "authors should not take steps that would almost certainly reveal their identities to members of the Program Committee, e.g., directly contacting PC members or publicizing the work on widely-visible social media or major mailing lists used by the community." KC, Stephen and favonia were on the PC for example.) |
[I've replicated the proposal in this comment for ease of reading.]
Constructor unboxing
Motivating example: compact rope representation
Data structures defined in OCaml are often less compact than they might be, because of boxing.
For example, here is a type for representing ropes:
With this definition the value
Branch {llen=3; l=Leaf "abc"; r=Leaf "def"}
has the following representation:In the general case each part of this representation serves a purpose. For example, in order to distinguish
Leaf
nodes fromBranch
nodes at run-time, each constructor is represented by a tagged block. However, for this particular data type the block representingLeaf
nodes ([--L|-∘-]
) is unnecessary; since strings are already distinguishable from other blocks, the value could in principle be represented more compactly:The basic idea
Adding an
[@@unboxed]
annotation to a variant definition indicates that the representation of unary constructors should not involve an additional block:Only a subset of variant definitions support
[@@unboxed]
. In particular, it must be possible to distinguish the arguments of unary constructors from each other (and from constant constructors in the same definition) at run-time. For example, the following definition is not allowed, since the arguments ofX
andY
have the same representation:Performance improvements
Since the
unboxed
variant representation uses less allocation and less indirection, it improves performance in some cases.For example, here is a simple benchmark for the rope data type. The benchmark creates rope representations of size
n
, converting the ropes to strings in a final step:Measurements show substantial improvements for the unboxed representation, especially for larger values of
n
:(These measurements were taken by building the unboxed representation explicitly using
Obj
rather than actually implementing this proposal in the OCaml compiler.)Which types are distinguishable?
Eliminating the block associated with a constructor is safe only when:
This proposal currently focuses on concrete types; it may be extended to abstract type constructors and existential variables by building on the notion of separability introduced in Unboxing Mutually Recursive Type Definitions in OCaml (Colin, Lepigre, Scherer, JFLA 2019).
Consider the definition of a data type
t
with unary argument typest1
...tn
and constant constructorsC1
...Cn
:The simplest case where constructor unboxing is clearly safe is where
t1
...tn
all have non-immediate and distinguishable representations, and none of them is eitherfloat
ort
.However, constructor unboxing is also possible if some of the
t1
...tn
have immediate representations. For example, consider the following definition:In OCaml
C1
,C2
,bool
andchar
are all represented as immediates, and a single immediate value (i.e. anint
) can easily represent all of these values. We can therefore adopt the following representation:C1
0
C2
1
C3 false
2
C3 true
3
C4 c
4 + Char.code c
More generally, we can unbox the definition
t
if:the block-representations of
t1
...tn
are all disjoint (and distinct fromfloat
and fromt
for non-unary constructors)the immediate-representations of
t1
...tn
together withC1
...Cn
cover less than the immediate space.(Here block-representation indicates the set of tags that non-immediate values of a type can have, and immediate-representation indicates the number of distinct immediate values of a given type. Covering less than the immediate space means that the sum of the immediate representations of
t1
...tn
is less thanmax_int
.)How does this relate to the existing [@@unboxed] annotation?
This proposal is a conservative extension of the existing
[@@unboxed]
annotation (PRs #606, #2188). It has no effect on existing code, and the extended meaning of[@@unboxed]
is compatible with the existing meaning.How does this relate to the existing proposal for unboxed types?
Another open proposal involves unboxing field types into parent types --- for example, producing flat representations of
int32
pairs:The two proposals are distinct, but complementary. The field-unboxing proposal does not support the compact rope representation, the current proposal does not support pair unboxing, and the combination of the two proposals can support representations that are more compact than either proposal supports in isolation. In the following example, the
int32
arguments are unboxed into a single block using field unboxing, and the block associated with thePair
constructor is eliminated using constructor unboxing:Without any support for unboxing, the value
Pair {x=0l; y=0l}
involves 4 blocks; with field unboxing alone it involves 2 blocks; with constructor unboxing alone it involves 3 blocks; with both field and constructor unboxing it involves 1 block.(There are some additional connections and distinctions between the two proposals. The current proposal can be seen as a generalization of the nullable types mentioned in the field unboxing proposal. And the field unboxing proposal is more ambitious: it additionally introduces compound unboxed types and changes to the type system to combine unboxing with abstraction.)
Extension: partial unboxing
The current proposal requires all unary constructor arguments to have distinguishable representations. It might also be useful to support the case where only some of the arguments are distinguishable by allowing per-constructor unboxing specifications.
Comment: abstraction
Unboxability (for records) is a property of a single type, and is fairly straightforward to abstract.
However, separability is a relation between types, and is not so easily abstractable.
One possibility is to optionally expose the immediate-space and the tag-space of abstract types by extending the
[@@immediate]
attribute. For example, we might promise that a type has an immediate representation with no more than 256 distinct values:We might additionally support setting the tag value associated with particular constructors explicitly to avoid clashes: