-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model card support #449
Comments
Thank you for this rich report @isinyaaa About
Is the link in the quote the expected one, please? . In general my view: I can't see there is a (one) "standard" Model Card format, and we should avoid overfitting to one existing solution (ie HF only). . Having a link to the original Model Card via DocArtifact when importing from HF, could make easier to preserve a way to display the "original" one. Similar when importing from other registries. i.e.: was in the original model registry a ModelCard which can be linked? If yes, provide a link to it, when importing into Model Registry. . In the case the model is indexed directly on (our) Model Registry, I believe the solution should remain flexible, while guiding the user about which information could be helpful to collect. This is what I like a lot about the template proposal. From UI and model card generation pov we should not rely on hardcoded or naming convention for fields/subfields; I concur the potential solution "naming conventions for fields" is brittle in this regards, and I would not pursue it. I also believe the potential solution "templated metadata forms" would support enterprise/user which has their own format for model card. . I'm not sure I fully understand why new type definition(s) would be needed in MLMD for templates; my intuition based on the report was that we can more simply dedicate either a custom artifact to hold a yaml representation for the model card, for those cases where the user didn't want to more simply just re-use the already existing UI blocks as a Model Card default rendering report, i.e.: something ~like (mockup) |
In the survey/assessment, we shall also consider:
|
@tarilabs updated the HF user studies link, please refer to: huggingface.co/docs/hub/model-cards-user-studies#user-studies |
thanks, that and the remainder of the linked resources will be helpful to discuss with UXD |
Agreed!
Definitely sounds like a good idea for HF imports :)
Yes, that's precisely the point I wanted to raise with the "templating" capabilities. My analogy to MLMD type definitions is in that MLMD enables users to define their own types, like we should be able to allow users to define their own "Model metadata forms", which end up being very similar in usage. Does that clarify your concern? Otherwise I'm not sure if I follow your last point about using the special yaml artifact type. |
I would probably include ModelKit spec (KitOps) as option to consider.
+1 Personally, I like the idea to have some templating or even "types" to keep flexibility and support different formats without any strict format requirement/validation. We should also consider metadata about input/output parameters. This is not covered by any of these spec as far as I see, probably mainly because this is runtime specific and usually OpenAI API "compatible" but we should consider extending this to these metadata (this has been done for predictive AI by KServe with Open-Inference-Protocol) |
From Hugging Face take a look https://huggingface.co/spaces/huggingface/Model_Cards_Writing_Tool |
this looks nice, but it often gives me back 🤔 🤷: |
Model card support proposal
Model cards are document summaries widely utilized, e.g. huggingface.co shows them as the default model view.
While this style of model card can be rich in metadata, it's up to the authors of the document to include optional header fields in the repo README.md file.
There are, however, more generic formats which utilize metadata to build the model card, such as Google model cards and IBM fact sheets.
Some Model Cards guidelines also suggest on using templates for HuggingFace model cards
And there are also reported user studies (about Model Cards from HuggingFace) suggesting how to structure one.
Which further points in the direction of constructing a model card from metadata templates.
Possible implementations
To support model cards on Model Registry, it would be necessary to gather meta-metadata, so that parsing becomes possible. Possible approaches include:
naming conventions for fields
Pros:
Cons:
This could work if we split custom fields and parsed fields, but this creates a divide for useful metadata, or importing from "unknown" types that could be later supported.
restructuring metadata types to include required meta-metadata information, such as field category, or section
Pros:
Cons:
this might not solve the problem entirely, as there can be rules for the meta-metadata types, which would have to be validated dynamically, and possibly "discovered" by clients on runtime. e.g. user wants to register a vision model artifact under a multimodal model version, how do they know which fields are valid for that model before requesting?
templated metadata forms
having standard forms is feasible as long as they can still have arbitrary fields per "section". However this does not consider field validation rules e.g. for different model types like vision and text-to-speech. To account for that Model Registry could support metadata template registration (similar to MLMD type definitions).
Pros:
Cons:
For the python client it would be possible to make template building blocks available for clients to specify their own types.
Proposed solution
It would be possible to iterate on these ideas progressively, starting with the simplest proposal idea.
First, MR can only parse a single json-serialized field, generated by the client that wants a model card.
The client parses its own metadata and sends a custom-format which could be iterated upon.
For a second step, MR would either need to completely abstract MLMD custom properties or switch the metadata store entirely, so this discussion will be left as WIP.
The text was updated successfully, but these errors were encountered: