-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new RFC: short error identifiers #33
base: master
Are you sure you want to change the base?
Conversation
Co-Authored-By: Florian Angeletti <[email protected]>
I like it. I like that Rust also gives you a link that you can directly click on to get more information. |
(Why the group prefix then? Actually it simplifies the implementation | ||
to give per-module identifiers instead of trying to pick global | ||
numbers and avoiding conflicts between two errors trying to use the | ||
same number. It might also help users identify classes of errors.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really convinced by the format nor the rationale.
I think the fact that it's an abbreviation that maps to the module is likely to be interesting only to compiler devs. And the day your code moves to another module what do you do with the uid ? Is it that complicated to have a single module Error_id
where you define errors and which you draw from ?
Why not keep it to EXXX/WXXX, short and actually tells the user whether that's an error or a warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The module name in the example was mostly a proxy for the class of errors. For instance, extending the examples
synt
: syntax errorstyco
: core language type errorstymo
: module languagestymi
module inclusion errorstyex
: type expression errorslink
: linking errorcpmo
: module compilation error
If we ever change the location/classification of an error, it seems fine to change the identifiers of the errors.
One advantage that I see with those errors is that thematically adjacent errors will share similar identifiers even after many renaming and moves (which doesn't happen that often). Contrarily, with dense set of numerical identifiers, a single splits of errors might move two sibling errors far away from each other.
However, I kind of agree that the "themes" themselves might be only partially understandable at first glance for non-compiler developers, but I think that the similarity of the errors will make sense for users.
Defining all error datatypes in a single module does not really work that well with the current architecture: error constructors often carry complex payload that are strongly tied to the concepts of the modules that raise them.
We could define the identifier part centrally, and dispatch the error constructors themselves to those identifiers, but that creates two distinct source of truths and probably make a coherency tool a little more mandatory.
Overall, we could definitively have a single numerical identifiers, but the idea of having group of errors seemed slightly attractive both in term of user experience and compiler development. But it could be totally be than my vision of user experience is far too biased in that specific instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My 2 cents
synt : syntax errors
tyco : core language type errors
tymo : module languages
tymi : module inclusion errors
tyex : type expression errors
link : linking error
cpmo : module compilation error
This map of knowledge only makes sense when you look at it entirely. When those abbreviations might appear on your console, out of context it wouldn't give any valuable information.
For example when "tyex" appears on your errors for the first time until the N time and realise it has some meaning (and there are other codes that mean other errors) that's already late since you would understand the map as a hole and context isn't needed anymore.
I like the proposal, it adds searchability to errors and given the extended places to ask/receive help online it's huge. I would either make it indexable by a number or give the entire context in the error. Instead of synt
-> SYNTAX_ERROR_X
PS: Sorry If I jumped into an issue where no one called me out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davesnx this is a reasonable point, but we also want to balance with the size of the identifier which should not become too long, because it is less interesting to humans than the actual human-readable explanation of the error, and still displayed first. We could decide to move the error identifier at the end of the error message, which would un-constrain this aspect of the design space.
@dbuenzli my gut feeling is that having per-class numbers, instead of whole-compiler numbers, is going to make our life easier. For example, it is going to greatly decrease the number of situations where two independent compiler changes conflict (logically and/or textually) because they want to introduce the same error number.
This was also my first thought. Nice and simple.
This also sounds like a good idea (though it is orthogonal to the issue of error numbers). |
Having links to examples and explanations in the manual is part of the longer plan. In particular, it would be easier to write examples and explanations for error messages once we are able to explicitly refer to an error. |
I don't mind having a short code for classifications of errors it can certainly help to give context. But one that makes senses to end users (which I would roughly say is: syntax, type, link) it seems to me that it could fit in one or two chars e.g. What I don't understand is that you seem ok with changing the error code. That defeats the whole idea of having stable unique identifiers for errors. Once used an error code should never change and never be reused. Also to devise the scheme I think it would be useful to give an idea of the number of codes we are actually talking about. |
I like the idea, but I think it'd be better with mnemonic names, such as We recently (ocaml/ocaml#9657) added the ability to write |
I like @yallop's suggestion. Mnemonic names allow for quick error characterization without having to parse the whole error message or maintain arbitrary numbers in one's brain. It certainly makes it easier for people helping other people (including oneself :-). If that is done, I would just suggest to keep the warning naming convention rather than add a new one, i.e. ( |
I am reminded of IBM's xlc compiler, which adds codes like What's wrong with just googling the full textual error message? It's not like we change them often. |
I have another idea! If the error codes are 256-bit wide, they could be addresses of smart contracts in a friendly blockchain. Each contract would, then, print the detailed explanation of the error. For a fee, of course. Voilà! self-financing for the OCaml project! |
@dbuenzli The set of errors might evolve, for instance one error ( @xavierleroy Identifying the skeleton part of the error message one should google is not completely straightforward. Even when using If we got toward a human-readable identifier, it will gives user a clear title for their error messages and avoid the dystopian feeling. |
This doesn't work so well because error messages are context dependent (also somehow google search with arbitrary text is much less effective than it used to be). This means you have to trim them of your specifics to search them. A well defined token makes the whole process easier. |
Another aspect to have in mind is possible internationalization of error
messages in the future.
…On Fri, Sep 23, 2022 at 1:27 PM Daniel Bünzli ***@***.***> wrote:
What's wrong with just googling the full textual error message?
This doesn't work so well because error messages are context dependent
(also somehow google search with arbitrary text is much less effective than
it used to be). This means you have to trim them of your specifics to
search them. A well defined token makes the whole process easier.
—
Reply to this email directly, view it on GitHub
<#33 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADIB7SFURJRLNCMUAZE73TV7WH3XANCNFSM6AAAAAAQTFR6QI>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
For another data point, it seems that Haskell is also moving in this direction: https://errors.haskell.org . |
I've been heavily involved with the Haskell errors process. I can share a few notes from our thoughts. (These are more our thoughts than our experiences, as it's all too fresh to have useful user feedback, say.)
|
After a little wandering, #Outreachy contribution phase led me here. I just got exposed to Ocaml and I certainly love its speed and expressiveness. I've played around with the basic features and certainly, there is a need for simpler Error Identification. I think Rust's error identification way is pretty much impressive(one of the major reasons I stuck around), something that we could perhaps borrow if the compiler allows. An identification structure that could perhaps entail the severity level enum(Error/Warning/Help...etc), code[prefix+int], a concise message clause, and perhaps a diagnostic window showcasing relevant primary and secondary spans. |
You could do worse than follow the example of IBM - the mainframe stuff, not the c compilers. |
For the record: @Octachron and myself intended this RFC to be the basis for an Outreachy internship, but we ended up without an intern to work on this. If someone is interested in contributing in this area, feel free to let us know -- no one is actively working on it but @Octachron is still interested in supervising. |
People have (rightly) suggested that the error identifier could/should be a clickable link going to a webpage with documentation about the error. I wasn't sure how to include clickable links in terminal output. The following page has good information on this: https://gist.github.com/egmontkob/eb114294efbcd5adb1944c9f3cb5feda (This is related to work on structured error output: many people consume error messages not through a terminal but rather through Merlin/LSP and their editor integration, and for those we probably need structured errors to enable proper hyperlinking.) |
That gist is an interesting overview but perhaps we are overthinking things here? As far as I can tell nowadays most terminals support either clicking or right-clicking on plain links that start with |
There is a noise to displaying URLs all the time, though - the error identifier is the information, not the URL to find out more. I played around with this very briefly in opam ages ago (see ocaml/opam#4568), but it wasn't as compelling there because we do also want to display the URL in |
To me, the 'URL to find out more' is the most important part of the workflow when you show the user an error message and short identifier. You just got an error (or otherwise) message and an ID, what are you going to do with it? You can Google it, if it has enough SEO juice you might get lucky and find it. Or you can search for it on the OCaml Discuss, or website and again rely on the vagaries of search engines. If we want to make the 'find out more' part less frustrating and more reliable, we have to think about the pages that actually provide the detailed information about the error IDs. What are their URLs? What are their contents? How are they maintained? How do we ensure people can find them easily? The last question is what I am trying to answer with e.g. ocaml/ocaml.org#916 . If we directly print the hyperlinks, then it's super easy to find out more. E.g.,
Which would just 301 redirect to the actual page where the error message is explained. |
Who is going to maintain those links? I think it is virtually guaranteed that they will break and then you end up with an even worse situation, dangling URLs. The compiler should not depend on external resources it does not control. Using unique identifiers (and a consistent formally defined syntax for messages) offloads the task of illuminating the error meaning to third-party tools, which is where it belongs, IMO. In other words, the compiler should not be in the business of designing workflows; rather it should provide the resources that tooling can use to design them. |
In term of dangling pointers, an interesting data point is that some error messages already point towards sections of the manual. However, since the manual is part of the compiler source tree, the consistency of those links are already checked in the compiler CI. This consistency check seems to empirically work since I didn't have to remind people to update those links the last time new sections were added to the manual. I could potentially extend this test to cover the error IDs if the error explanation were in the manual. At the same time, if the error message explanations are in a sufficiently discoverable section the manual, do we still need URL links? |
@mobileink yes these are all great questions. How to ensure that links to pages which explain the error IDs continue to work in the future? I understand that historically the compiler hasn't done this and has left it up to tooling creators. I am saying that if we treat it as a holistic issue of user experience, it would be a big win for users if error messages pointed to instantly accessible pages for detailed information. @Octachron pointing to links published from sources in the compiler source tree is a fine relatively safe alternative, in my opinion. E.g.
It's not as short, but the concision is not the point, the ease of access is. The most important point is that the links not break, and for that some coordination would be required with the If links are totally unacceptable, then at least we can point to specific sections of the manual using some recognizable citation format. We would probably need to add appendices to the manual for error messages, warnings, etc. |
I have seen 2 big projects using this approach from the Frontend/JavaScript-world. React and Next.js. They have the same technique: have a shortener link that points to a URL when an error/warning happens. Maintaining a list of errors that points to a website isn't that big of a deal, making sure this list is append-only and any deletion must add a redirect. Might be helpful to have a shortener that allows this. To add more weight to this, it's very useful not only to expand more information, but rather than share the error publicly while helping others or asking for help. |
My impression is that "showing the error code, with the appropriate escape codes to have a clickable link" and "showing a URL" are two reasonable approaches with pros and cons, that are generally okay. From the point of view of users and projects, they are in fact rather close. (The URL idea adds slightly more convenience and a new piece to maintain.) Maybe we could postpone the final discussion on that particular aspect of the design until we have had more discussions on the rest and, in particular, some visibility on who may implement the RFC? |
Sure. If we postpone the URL/clickable link idea, then to my eyes there are two main points left:
Of course this also leaves all the implementation details and actual people to be decided. |
… On Tue, Feb 21, 2023 at 12:06 PM Yawar Amin ***@***.***> wrote:
Sure. If we postpone the URL/clickable link idea, then to my eyes there
are two main points left:
1. What messages will be assigned IDs and what will the IDs be
2. Where will the detailed info corresponding to each ID be published
3. What is the formal syntax of messages.
|
Could you give an example of what you mean by 'formal syntax of messages'? E.g. the PR description shows a possible message:
Are you thinking of a (BNF?) syntax that just formalizes the above message, or something different? |
I honestly don't like methods that rely on search engines, or anything that is useless in an offline environment. So have you thought about an offline-first, online-ready hybrid reference by using the concepts that have been worked well in odoc? I even thought we could take advantage of the offline and work with LSP and others to do something interesting fast. However, this is just an idea. |
I don't have anything specific in mind. It just has to be formally specified, to ensure that all messages are easily and unambiguously parseable, so tools don't have to guess. |
Apologies if this has been mentioned, I haven't read the entire theread. If the manual is installed as manpages along with the compiler, what about just printing a |
That's not a bad idea but:
EDIT: on second thought, it seems kinda clever: (* stdlib/errors.mli *)
module Type_error : sig end
(** Type mismatch error,... *)
module Syntax_error : sig end
(** Usually because an OCaml keyword was used as an identifier,... *) We could get the compiler to check the error identifiers are unique. And we could just tell people to look up the documentation for E.g.
|
This RFC is more about a tooling change than a language change, but the RFC mechanism of agreeing on a spec before implementing feels appropriate here as well.
The idea is to include a stable error identifier in all error messages generated by the OCaml compiler, to make it easier to look online for help on this error for example. (We learned of this idea from the Rust compiler,
rustc
)Rendered version of the RFC: https://github.com/gasche/RFCs/blob/error-identifiers/rfcs/error-identifiers.md
Current behavior:
Proposed behavior: