-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track the purity of expressions to avoid the value restriction #5
base: master
Are you sure you want to change the base?
Conversation
I haven't read this in detail yet, but I think it might be interesting to compare it to: |
Actually, this is the other way round: this PR tracks the part or purity relevant to polymorphism, in order to get more polymorphism. The types are unchanged. |
Planned extensions after discussion with @diremy :
|
Added a few features to the experimental implementation
This allows to compile the furl combinator library, with purity annotations to allow sufficient polymorphism in the |
IIUC this suggests that you might want to find a different word than "pure" for these annotation, since you'll be putting it on functions that are not pure in any of the traditional senses. |
Yes, I've been looking for a better name for a while, but couldn't find one yet. |
All I can think of so far is |
Thanks for the suggestion. |
On 11/11/2019 22:39, Jacques Garrigue wrote:
Thanks for the suggestion.
Another idea is |[@@logical]|, to emphasize that quantifiers follow the
laws of logic. But we now also need a negation to switch off signature
wide qualifiers. Or just |[@@@logical false]|.
Hum, [@@logical] seems a little pompous here and does not carry a good
intuition.
I still prefer [@@pure] or [@@value].
I think [@@valueform] would be better than [@@value], but a bit verbose.
As Jacques said, his proposal is actually an abstraction
(i.e. approximation) of the weak vs stong type variables of SML. Hence, the
good intuition when an indentifier is [@pure], e.g.
val id : 'a -> 'a [@@pure]
is that the implicit "all 'a." in front of "'a -> 'a" is a strong quantifier
while when all unannotated OCaml quantifiers have to be treated as weak.
So perhaps,
val id : 'a -> 'a [@@Strong]
would even be better: closer to the underlying intuition, and does not
assert purity when it's only talking about a certain notion of purity.
Didier
|
Current state (2019-11-12)As mentioned above, the current branch allows one to play with the Furl library, avoiding clumsy thunks.
Discussion
|
Discussion
* The need for a special handling of ground types makes me think that the
propagation approach is only a hack: having strong type variables in the
type-checker would be both more powerful and principal, even if we
choose to show them through these annotations
I support the strong type variables approach, which is closer to what's
happening underneath. It might still be lightweight---as long as there are
just weak or strong and do not use different levels of weakness as in SML-nj.
* The problem with polymorphic recursive definitions stays with strong
type variables, except if we require explicit purity annotations on the
definition.
With strong type variables, a polymorphic recursive definition should be
annotated so the purity annotation would naturally come with the type
annotation: since polymorphic variables must be listed, they should also
come with their kind information (weak/strong) whatever the syntax.
Didier
|
I didn't look at the proposal until today. A few comments. I certainly agree with Didier that it is odd/unpleasant to classify polymorphic fields as impure. Taking a step back, the fine-grained mode would be to annotate variable-introduction sites (in types, universal quantifiers; does it make sense to annotate existential variables in GADTs?), so the coarse-grained approach would be to annotate not really a type, but a telescope / set of variable-introducing quantifiers. In the HM fragment, there is exactly one such telescope per type, but in presence of explicit polymorphism in recursive definitions or polymorphic fields we should allow to annotate all places that allow universal quantification (implicit or explicit). So I should be able to write I'm not personally convinced that we need to restrict ourselves to the coarse-grained approach (could this be a mistake we would regret in the future, similarly to the decision to not name row variables?). It would be nice to be able to annotate specific variables, and we have an excellent default, sufficient in most cases, that annotating a type scheme means annotating all quantified variables. As @lpw25 I strongly believe that being able to annotate
There are known bad-interactions between non-delimited continuations and generalization (actually the best document I found to summarize the state of the art in the ML design world was Xavier Leroy's 1992 PhD thesis manuscript... in French). I suppose that such complications can be avoided by marking control-operator types with weak/unsafe/escaping/leaking variables similarly (this was the SML solution), but it's not clear enough to me how the "non escape" intuition can be extended to that setting. I would like to feel confident that however the restriction is implemented is not going to become inadequate in presence of a richer effect system (possibly relying on delimited continuations); we have results (see No value restriction is needed for algebraic effects and handlers, Kammar and Pretnar, 2016) showing that simple effect handlers are not generative enough to run into polymorphism issues, but still I would welcome a coherent intuition for the whole picture. Semantically what does the notion of strong/safe/local variables stands for? |
* As far as references are concerned, the distinctive property of strong/safe type variables is that values at their types cannot "escape" to the ambient execution context through the program execution (by being stored in the mutable heap, or passed to a possibly-unsafe function), except as part of the result behavior matching the parts of the return type that mention this variable. This idea of "not escaping" may be useful also at the value level (a function argument or a temporary variable that does not escape the dynamic extent of a function), and I sort of idly wondered if a coherent naming would be possible.
I think your description is unnecessarily complicated. The intuition, which
is quite closed to the argument used in the proof is simpler:
When an indentier/expression is assigned a type scheme, its weakly
quantified type variables may be instantiated by a type that can be used
as the type of a subexpression that will end up in the store when the
expression will be applied and evaluated (which cannot happen for its
strongly quantified type variables).
Otherwise, I don't think safe/unsafe is much better than pure/impure unless
it is preceded by the idea of allocation. The intuition is not that of
reading or writing into the store, just that of allocating a piece of store.
(A polymorphic funtion that may write in the store is pefectly fine!)
Hence, @@noalloc/@@alloc would be much closer to the truth and not
misleading, but this does not sound so nice... Perhaps a little nicer is
@@AllocFree sounds nicer than @@noalloc---and more positive for something
that should be prefered to than @@alloc. An alternative is to replace
"alloc" by "leak".
To sum up:
pure impure
safe unsafe
noalloc alloc
allocfree allocable
leakfree leakable
unleakable leakable
Didier
|
I prefer "safe" over "pure" because while users understand that there are many distinct notions of "safety" (so they will wait to see which one you mean before assuming a meaning), most people work with a very restricted set of meanings for "pure" in the context of programming languages: an expression is "pure" if it does not perform any side-effect (then we argue about what side-effects mean). Indeed I find "leak" more tempting (it is not misleading like "pure" is and it carries some intuition, unlike "safe"), but "noleak" or "unleakable" are a bit ugly; I think that "safe" or "stable" or "local" may work as the opposite of "leak". Your description is simpler, but:
The Harper&Lillibridge example for continuations, as repeated in Xavier's thesis, is as follows: let later : 'a . ('a -> 'a) * (('a -> 'a) -> unit) =
call/cc (fun k ->
(λx. x),
(λf. throw k (f, λx. ())
)
in
print_string(first(later)("Hello !"));
second(later)(λx. x + 1) This example also relies on the fact that two incompatible instantiations of the declaration are made in the program and interact in bad ways, but I don't know how to formulate an intuition for it in terms of leaking into a "store". Maybe we could say that the variable leaks into the continuation environment, binding continuation names to (typed) continuations? (This environment must be global to the program to capture an undelimited continuation.) |
How about |
Your description is simpler, but:
I should have said somethinge even simpler:
- A type variable is weak when it is used in the type if a subexpression
whose evaluation may allocate a piece of store containing a value of
that type.
- Besides, a weak type variable may only be generalized in front of a
value form.
* It does not completely suffice to explain why |ref []| is safe, |fun x
-> ref []| is safe, but |thunk (ref [])| is not.
I think it does. Notice that weak/strong applies to type variables, not
to expressions.
BEWARE! ref [] is unsafe (in your sense):
ref [] : '_a list ref
Hence, '_a which is weak cannot be generalized.
Why so? because a value of type 'a may enter the store during
the evaluation of ref [].
Did you mean (fun x -> []) () ?
- (fun x -> []) () : 'a. 'a list
Here, nothing can enter the store, so 'a is strong and can be generalized.
If you meant Jacques' extended value restriction, see below.
Continuing with your examples:
- thunk (ref []) : '_a list ref
same argument as for ref []
- fun x -> ref [] : '_a . '_a -> '_a list ref
'_a is weak when typing "ref []" as it may enter the store.
it stays weak in the type of fun x -> ref [], indeed.
However, as this is a value, it can be generalized (as weak variable).
Hence it you instantiate it (e.g. '_b -> '_b) it has to be with
weak type variables as well.
- (fun x -> x), ref [] : 'a. ('a -> 'a) * '_b list ref
'a is strong, '_b is weak, so only 'a can be generalized.
----------------
Jacques' extended value restriction?
let _ = ref [] in []
One answer is that we may not need it anymore because useful cases such as
pseudo constructors would become pure and generalizable.
This is still fine and can be explained with my criteria.
let x = ref [] in !x
I argue that x has type "bot list" and !x has type bot list which is coerced
into 'a list. Hence 'a is strong as it has not been put in the store.
* Noe of those descriptions gives an intuition that is generic enough to
be extended to the issue with first-class continuations.
Right, I haven't considered continuations.
But I asume that replacing "may allocate a piece of store" should be
completed with "or may be simultaneously input to and output of a
continuation". But it needs to be check---I don't have the details in mind.
Didier
… The Harper&Lillibridge example for continuations, as repeated in Xavier's thesis, is as follows:
let later : 'a . ('a -> 'a) * (('a -> 'a) -> unit) =
call/cc (fun k ->
(λx. x),
(λf. throw k (f, λx. ())
)
in
print_string(first(later)("Hello !"));
second(later)(λx. x + 1)
This example also relies on the fact that two incompatible instantiations of the declaration are made in the program and interact in bad ways, but I don't know how to formulate an intuition for it in terms of leaking into a "store". Maybe we could say that the variable leaks into the continuation environment, binding continuation names to (typed) continuations? (This environment must be global to the program to capture an undelimited continuation.)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#5?email_source=notifications&email_token=AAKLQ2UO6FVNRT7DWHMZOPDQUVOD7A5CNFSM4JJWRUHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEESQWQQ#issuecomment-556075842>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAKLQ2WSWQFTTI4QEMNB2RLQUVOD7ANCNFSM4JJWRUHA>.
|
Yes, indeed, I thought about nonexpansive as well, but it's a bit longer
with a negative connotation and (I think, but I am not sure) this originally
refered to expressions that are referentially transparent, and in
particular that cannot produce side effects.
Perhaps, I am wrong, but I've check and found occurrences where "being
expansive" is defined as or associated with "being able to produce side
effects".
This is not the correct intuition since assignment can produce side effects
but can be safely typed with strong variables:
ref : '_a. '_a -> '_a ref
(!) : 'a. 'a ref -> 'a
(:=) : 'a. 'a ref -> 'a -> unit
We may still use the word nonexpansive/expansive with _our definition_
if you don't find it too long, and ...
It also is bothers me (just a little) that the _positive_ thing is expressed
by a negative statement such as _non_ expansive (or _no_ alloc, _un_
leakable, etc.)
Didier
…On 20/11/2019 17:43, Stephen Dolan wrote:
How about |nonexpansive|? That's the term used in the definition of SML, and it's not as overloaded as "safe", "pure", etc. (The intuition is that these are the expressions whose evaluation does not expand the store)
|
It is also my feeling. The point being that those variables behave normally (logically, as I was trying to express with my
There are of course many more, but they have more connotations. |
A typical user of OCaml interacts with the language through the abstraction of the OCaml type checker, which do not leak such details as expansive or nonexpansive or pure, or anything like this. In fact, it is not even referencing value restriction. What the type checker is saying is that the type variable could not be generalized. Using the existing terminology we can express the notion of a generalizable using a, sort of, imperative val some : 'a -> 'a option [@@generalize 'a] As example shows, it will also leave the space for more fine-granular annotations, when not all type variables in a scheme require or could be generalized. I believe this naming will capture nicely the typical interaction between a language user and the typechecker, without introducing any new concepts. |
I like @ivg's proposal. |
Why not---that may be the best of the proposals so far.
"generalize 'a" itself is not quite the intuition, since 'a is generalized
anyway as it is the type of a value, but the idea that such an 'a should
_hereditarily_ be generalized in all contexts, as long as it is not unified
with a type and in a context that say otherwise, is the idea.
Do we also need a keyword for the negation, to say "do not generalize 'a",
e.g. if the context says that the default is "generalize"? What should it be
then? "do-not-generalize" or "generalize=false"? Do we have other examples
of annotations that come in pairs of positive/negative flags? Then, we
should have a regular pattern for the naming the negative flag.
Didier
…On 25/11/2019 18:51, Ivan Gotovchits wrote:
A typical user of OCaml interacts with the language through the abstraction of the OCaml type checker, which do not leak such details as expansive or nonexpansive or pure, or anything like this. In fact, it is not even referencing value restriction. What the type checker is saying is that the type variable could not be generalized. Using the existing terminology we can express the notion of a generalizable using a, sort of, imperative |generalize|.
val some : 'a -> 'a option [@@generalize 'a]
As example shows, it will also leave the space for more fine-granular annotations. I believe this naming will capture nicely the typical interaction between a language user and the typechecker, without introducing any new concepts.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#5?email_source=notifications&email_token=AAKLQ2SIKAD4Z4SI7W5UTRDQVQGALA5CNFSM4JJWRUHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFDHVOI#issuecomment-558267065>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAKLQ2W2B7732SKMOOQF63LQVQGALANCNFSM4JJWRUHA>.
|
We could use My understanding of the use-case is that (Have we considered using |
Indeed, the problem with SML was that it was considering the problem from the wrong side. [@@@generalize true]
[@@@generalize false] as switches. |
(I don't really agree that we should not provide a negative statement. When there are two options, I think that we should always have a syntax for both, even if one is the implicit default. This avoids blind spots such as Personally I find |
So, if I'm not mistaken, we've now had 3 weeks of serious discussion about the name of the attribute. However, one thing I cannot really see (neither in the PDF, nor on this page) is a discussion about drawbacks and alternatives (cf. https://github.com/ocaml/RFCs#making-an-rfc ). |
Let me try to express my opinion in the classical review style, let's see if it works :) Review AOverall MeritWeak Accept. I'm fine with this change and won't object to its inclusion in the language. However, I believe that this change touches the fundamental notions of the language and affects the user experience and requires more careful consideration. Otherwise, it will create yet another dark corner in the language, that may avert new and existing users. Submission SummaryThe proposal implements a must analysis of a program property to enable generalization of type variables in expressions that hold this property. The property is defined structurally by a set of rules and over-approximates the strength of an expression. Informally, an expression loses its strength when it accesses a mutable or polymorphic field of a record and the rules define how the impurity (taint) gained from this access is propagated. If an expression is not strong then we can say that during the evaluation of this expression it may create a storage for a value with a type that escapes the scope of this expression. Therefore, generalization of all variables is conservatively disallowed in the type scheme of this expression. Submission strengthsThe proposed analysis lets type more programs without compromising soundness. And is especially useful in cases when functional (smart) constructors are used instead of plain data constructors. It is a relatively lightweight change that doesn't significantly increase the support burden. Submission weaknessesThe interaction with variance is not clearIt looks like that this proposal is parallel with the existing mechanism of relaxing value restriction based on variances of type variables. I have the impression that it extends the variance analysis from data constructors to functional constructors. Could they be unified, at least on the conceptual level? The property is not familiar to casual programmersFirst of all, the problem with finding a good name for the property indicates that the property is not well-known or maybe still waiting for a proper mathematical abstraction to be fit in. So far, it is defined purely structurally as a boolean lattice and a set of propagation rules that define the transfer function. The change complicates the typing rules exposed to the programmersThe proposal extends the typing rules that a casual programmer has to learn and thus increases the cognition burden and makes OCaml a little bit more complex language. Of course, the rules are already extended by the attributes semantics, but this extension is probably the first serious step that affects the inferred types of expressions, i.e., it allows attributes to affect the typing rules. I think that this is an important change and since this is one of the first moves in this direction (and it looks like that this direction will be pursued no matter the objections) we have to think carefully and to frame it in some repeatable pattern. Maybe we should develop a common framework (at least from the user perspective) for the annotated type system (as described in Type and Effect Systems), so that annotations like this could be attached to typing schemes, e.g., val return : 'a -> 'a t [@@annotated no(storages)] The change may clash with the upcoming effectsThe annotations from the previous paragraph could be captured with the notions of effects or, at least, have a lot of intersections with effects. It could be a good idea to reach out to the designers of the upcoming effect type system to sync the opinions. As [1] shows, effects could be framed into the annotated type system, though I'm not sure which is more general -- annotations or effects. E.g., this also makes sense: val return : 'a -> 'a t [@@effects no(stores 'a)] SuggestionsIf the annotated type system is a too far going goal, an alternative approach could be investigated with a simpler notion of purity (at least as exposed to the end-user). From the examples, I can conclude that the main use-case is to prove to the compiler that a functional constructor is behaving as a covariant data constructor, e.g., that |
1d2b14b
to
79480f8
Compare
The original rules were unsound (they were actually unintentionally stronger than SML 90). |
We discussed this briefly again, and a central question seems to be how the proposed feature would interact with effect typing. @lpw25 should write an RFC on effect typing (when he's done upstreaming his short-paths implementation), but we can already ask: how would this feature fit in a language with row-based effect inference, such as Koka ? |
for another perspective, To stick with existing typing annotation semantics it could be put in noun form I was surprised to read this error the first time, first day :
and then :
This is very pedagogically explained in the ocaml documentation 👍, many kudos there. I naively thought to myself, the typer types type variables in the most general way. Then it gives me the least general solution, a monomorphic type. To cap it all, I can manually/mentally check there is a trivial polymorphic typing to it. The reason is that we all implicitly take OCaml to be functional+espilon where epsilon is the perceived addition of mutation to the language. The decision problem is hard so the typer can't decide and conservatively weakens the I think the problem is badly errored out/presented due to an unclear message. A better message may be « I monomorphized that I get an unsettling vibe with I tried these : I believe the naming should capture the separation of the language with respect to paradigm. That is what is tracked. Therefore, What do you think of To summarize, as a user, I am convinced |
cc @goldfirere: I wonder if you would be interested in co-shepherding this RFC. |
I find this proposal intriguing, but I do not feel expert enough to co-shepherd. Here are some remaining points of confusion for me:
|
The value restriction, even in its relaxed form, makes us lose
polymorphism even in cases where all functions are pure.
This is especially galling for combinator libraries, where code is
built through function applications.
This proposal tries to avoid it in those situations, by tracking the purity of expressions in a very light weight way. Typability is a subset of SML90, which ensures the soundness of this approach.
The proposal is in PDF form.
An experimental implementation is available as a branch purity-attribute. He are some examples.