Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add from_file class method to the Prompt object #1355

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

yvan-sraka
Copy link
Contributor

@yvan-sraka yvan-sraka commented Jan 2, 2025

Fix #1345: For now, I'm just trying to make sure I'm editing the right part of the codebase. I haven't managed to run the tests yet…

EDIT: I opened a separate issue about the falling tests 🙂

I noticed that the project use pytest, so I assumed that running pip install ".[test]" would install all the dependencies needed to run the full test suite. However, it seems that's not the case (it's currently complaining that I don't have vllm). Should I open an issue (I can add the missing dependency under the test section of the pyproject.toml)?

#1356

@yvan-sraka yvan-sraka requested review from torymur and rlouf January 2, 2025 13:54
@yvan-sraka yvan-sraka self-assigned this Jan 2, 2025
@yvan-sraka yvan-sraka linked an issue Jan 2, 2025 that may be closed by this pull request
@yvan-sraka yvan-sraka force-pushed the 1345-add-from_file-class-method-to-the-prompt-object branch 3 times, most recently from b2605b3 to df0eb2c Compare January 2, 2025 14:36
@rlouf
Copy link
Member

rlouf commented Jan 2, 2025

Yes, please open an issue 😄

@yvan-sraka yvan-sraka force-pushed the 1345-add-from_file-class-method-to-the-prompt-object branch from f11f10c to 98cef53 Compare January 2, 2025 17:01
outlines/prompts.py Outdated Show resolved Hide resolved
tests/test_prompts.py Outdated Show resolved Hide resolved
@rlouf
Copy link
Member

rlouf commented Jan 2, 2025

Added a couple comments, although since this is a draft PR you are probably addressing them already.

@yvan-sraka yvan-sraka force-pushed the 1345-add-from_file-class-method-to-the-prompt-object branch 6 times, most recently from f75319f to 2954197 Compare January 6, 2025 09:51
"""Return the schema of a Pydantic model."""
if not type(model) == type(BaseModel):
if not isinstance(model, type(pydantic.BaseModel)):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are unrelated lint fixes, I can eventually move them to another PR!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep them in this PR as long as they have their own commit.

@yvan-sraka yvan-sraka requested a review from rlouf January 6, 2025 10:00
@yvan-sraka yvan-sraka marked this pull request as ready for review January 6, 2025 10:00
@yvan-sraka yvan-sraka force-pushed the 1345-add-from_file-class-method-to-the-prompt-object branch from 2954197 to ee5c4ef Compare January 6, 2025 10:45
@yvan-sraka
Copy link
Contributor Author

Working on this PR made me wonder: why do we integrate Jinja2 (or any specific templating engine) into Outlines? I mean, we could have just provided a Prompt interface that takes an already rendered str (and eventually keep a function decorator if that sounds fancy and helps better infer the template arguments), leaving the templating up to the user. That way, users could choose a different templating engine if they wanted, and e.g., how to use Outlines in combination with Jinja2 could be rather provided as an example in the documentation. WDYT?

@rlouf
Copy link
Member

rlouf commented Jan 6, 2025

I am not sure what you mean by "taking an already rendered str", could you provide an example?

@yvan-sraka
Copy link
Contributor Author

yvan-sraka commented Jan 6, 2025

I am not sure what you mean by "taking an already rendered str", could you provide an example?

TL;DR: I mean passing the output/result of the templating process to the Prompt constructor, thereby externalizing the templating from Outlines. IMHO, this would lead to better separation of concerns and make Outlines lighter and simpler. One idea could be to maintain tighter integration with templating only as an optional feature, if we believe it simplifies the workflow and improves the developer experience.

import outlines

# Using Jinja2 ...
import jinja2

env = jinja2.Environment(loader=jinja2.FileSystemLoader("."))
template = env.get_template("prompt.jinja2")

# ... or, Using Mako ...
import Mako

template = mako.template.Template(filename="prompt.mako")

@outlines.prompt
def fancy(examples, question):
    return template.render(examples=examples, question=question)

examples = [
    {"question": "What is the capital of France?", "answer": "Paris"},
    {"question": "What is 2 + 2?", "answer": "4"},
]
question = "What is the Earth's diameter?"

prompt = fancy(examples, question)

# ... but why not just?
rendered = template.render(examples=examples, question=question)
prompt = outline.Prompt(rendered)
# EDIT: Nevermind, while writing this down, I realized that this would
# essentially mean that arguments are evaluated too early, leaving the prompt
# without any arguments and simply storing a `str`...

@@ -183,17 +196,16 @@ def render(template: str, **values: Optional[Dict[str, Any]]) -> str:
A string that contains the rendered template.
"""
jinja_template = create_jinja_template(template)
return jinja_template.render(**values)
return template.render(**values)
Copy link
Contributor

@torymur torymur Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From interface point of view render functionality has changed too if defined this way. Before it would accept str and create jinja template (what's now done by template_from_str), why was it redefined to accept jinja template right away?

Because I'm not sure of the overall usability of render if it accepts the jinja template. By reading the test cases and docs with examples it seems this function was here to confirm how the transformation from str to jinja template looked like. So, by calling it, user would make sure that it was transformed as expected + then rendered.

Copy link
Contributor

@torymur torymur Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, since now there are 2 separate flows to jinja template:

  1. from str
  2. from file

Then it's expected to call render with jinja template, yes, but this old kind of render more looks like Prompt.from_str, which will allow to confirm the str to jinja transformation, and then the actual call will render any of it.

Copy link
Contributor Author

@yvan-sraka yvan-sraka Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old design was slightly inefficient because a jinja2.template should not be constructed more than once per template. Previously, it was being reconstructed for each call to render...

I understand this could be a potential breaking change. Thinking about it, I had to modify render-related tests, but I don't think users should be able to call render without a Prompt object, WDYT? And yes, it would make sense to expose a from_str method for parity with from_file!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the render code so it no longer breaks previous usage, but instead displays a deprecation warning. WDYT?

Copy link
Contributor

@torymur torymur Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I checked new code, from_str looks good!

Though I wonder about two things:

  1. since this will be in major update, maybe we can get away even without deprecation warning for whatever we'll decide for render just this time
  2. this render function still creates more questions of why it exists to me, you both in your examples used interface like prompt.template.render(params), which is much clearer anyway and with from_str available, especially. Also, there is similar logic in __call__, so maybe this standalone render could be dropped all together? It seems it was something historical as well, considering graph mentioning: Parse a Jinaj2 template and translate it into an Outlines graph.

So, I would even remove render function and integrate and refactor its docs for template_from_str instead.
@yvan-sraka @rlouf what's your take?

Copy link
Contributor Author

@yvan-sraka yvan-sraka Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it, and tbh, I think that's a bit overkill. I'd be fine with not exposing it to the user, as well as other methods. However (and maybe this is where my dusty understanding of Python module visibility falls short), AFAIU, those methods aren't exposed atm, e.g.:

def render(
    template: Union[jinja2.Template, str], **values: Optional[Dict[str, Any]]
) -> str:
    r"""
    Examples
    --------

    Outlines follow Jinja2's syntax

    >>> import outlines
    >>> outline = outlines.render("I like {{food}} and {{sport}}", food="tomatoes", sport="tennis")
    I like tomatoes and tennis
    
    ... snipped"""
% python3
Python 3.10.0 | packaged by conda-forge | (default, Nov 20 2021, 02:27:15) [Clang 11.1.0 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import outlines
>>> outlines.render("I like {{food}} and {{sport}}", food="tomatoes", sport="tennis")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'outlines' has no attribute 'render'

So, I guess it's the documentation that's rather misleading, right?

EDIT: Never mind, the correct import was outlines.prompts.render!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about keeping the function, displaying a clear deprecation warning, and only removing it in a future release? d318ac2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with either way: removing now or later, this version of deprecation seems better for something we don't want to support 👍

Though, I'd suggest to make sure that we're refactoring all this old docs and tests of render somehow meaningfully into new template_from_str since we're doing all that there anyway.

Copy link
Contributor Author

@yvan-sraka yvan-sraka Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that template_from_str should be user-facing. Why would the user use it, and how could that be more ergonomic than Prompt.from_str? What about renaming them to _template_from_str and _template_from_file, so it's more obvious that they are internals? Where is the documentation that we should not forget to update?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was more thinking about integration of these docs/tests of render into tests of template_from_str, jjust to preserve these assumptions, but not to make template_from_* user-facing. I agree, that these are internal, we can make it explicit, yes.

I was also searching for docs, seems they're generated based on docstrings: https://dottxt-ai.github.io/outlines/latest/api/prompts/

outlines/prompts.py Outdated Show resolved Hide resolved
@rlouf
Copy link
Member

rlouf commented Jan 6, 2025

You can already use any templating engine you want with Outlines, by rendering the template and passing the returned string to a generator:

import outlines

model = outlines.models.openai("gpt-4o")

# Input

examples = [
    {"question": "What is the capital of France?", "answer": "Paris"},
    {"question": "What is 2 + 2?", "answer": "4"},
]
question = "What is the Earth's diameter?"


# Using Jinja2 ...
import jinja2

env = jinja2.Environment(loader=jinja2.FileSystemLoader("."))
template = env.get_template("prompt.jinja2")
rendered = template.render(examples=examples, question=question)

generator = outlines.generate.text(model)
results = generator(rendered)


# ... or, Using Mako ...
import Mako

template = mako.template.Template(filename="prompt.mako")
rendered = template.render(examples=examples, question=question)

generator = outlines.generate.text(model)
results = generator(rendered)

The role of the Prompt object is to wrap a Jinja2 template so that it returns a string when called with the template's variable values. I chose Jinja2 to make the common path easier; users of outlines are not necessarily familiar with templating engines and Jinja2 seemed like a reasonable default. As I demonstrated above you can always use another templating engine, at the cost of extra complexity.

The confusion may come from poor naming on my part; this object should probably be called Template instead.

@@ -135,7 +158,6 @@ def test_prompt_kwargs():
def test_kwarg_tpl(var, other_var="other"):
"""{{var}} and {{other_var}}"""

assert test_kwarg_tpl.template == "{{var}} and {{other_var}}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not keep these two .template checks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because template is no longer a str but a jinja2.template, I could try adapting them instead of suppressing them, however, they seemed a bit redundant to me, WDYT?

Copy link
Contributor

@torymur torymur Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, probably type checks will be enough, though I would suggest adding assertions to verify .signature: check it to be present from decorator call / from_str and None otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few more assert statements to check that the signature. It's not None only when the Prompt is constructed using the decorator (otherwise, it can't infer the signature argument since there is no function).

@yvan-sraka
Copy link
Contributor Author

yvan-sraka commented Jan 6, 2025

Thank you very much @rlouf for the clarification. I'm not sure about the naming either, but if it's extensible, that already sounds like a great design! Apologies for sharing ideas maybe a bit too quickly as they pop from my mind, I probably should have checked the documentation first!

@yvan-sraka yvan-sraka force-pushed the 1345-add-from_file-class-method-to-the-prompt-object branch 2 times, most recently from 46e150d to b0677b7 Compare January 6, 2025 17:06
@yvan-sraka yvan-sraka requested a review from torymur January 6, 2025 17:08
@yvan-sraka yvan-sraka force-pushed the 1345-add-from_file-class-method-to-the-prompt-object branch from b0677b7 to 8b27da3 Compare January 6, 2025 17:14
@rlouf
Copy link
Member

rlouf commented Jan 6, 2025

Thank you very much @rlouf for the clarification. I’m not sure about the naming either, but if it's extensible, that already sounds like a great design! Apologies for sharing ideas maybe a bit too quickly as they pop from my mind, I probably should have checked the documentation first!

No worries! Questions are always welcome

@yvan-sraka yvan-sraka force-pushed the 1345-add-from_file-class-method-to-the-prompt-object branch 2 times, most recently from 5aad80d to d318ac2 Compare January 7, 2025 14:35
@yvan-sraka yvan-sraka marked this pull request as draft January 7, 2025 14:38
@yvan-sraka
Copy link
Contributor Author

(Do not merge this PR yet, I will rewrite the commit history a bit once we've settled on deprecating or removing outlines.prompts.render!)

@yvan-sraka yvan-sraka force-pushed the 1345-add-from_file-class-method-to-the-prompt-object branch 4 times, most recently from 5a3771a to 2b87ecb Compare January 7, 2025 15:58
@yvan-sraka yvan-sraka force-pushed the 1345-add-from_file-class-method-to-the-prompt-object branch from 2b87ecb to 32587ee Compare January 7, 2025 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add from_file class method to the Prompt object
3 participants