-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamically add table aliases without an LLM + de-duplicate columns in pandas #155
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the deduplicating fix and the updates! 2 small comments
utils/gen_prompt.py
Outdated
) | ||
|
||
if "cot_instructions": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: shall we add a separate argument in generate_prompt
, say cot_pregen
, to tack on the table aliases? I was thinking cot_instructions
would be more of the "instructions" of what to do, while cot_pregen
would be the actual output (eg table aliases)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aye sounds good!
prompts/prompt_cot.md
Outdated
DDL statements: | ||
{table_metadata_string} | ||
|
||
{cot_instructions}Generate a valid SQL query that answers the question `{user_question}`, and only references the tables and columns in the DDL statements.<|eot_id|><|start_header_id|>assistant<|end_header_id|> | ||
Generate a valid SQL query that answers the question `{user_question}`, and only references the tables and columns in the DDL statements.<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we keep the cot_instructions
field here? This is more for letting the model know that it should be generating the aliases followed by the SQL.
Currently in the data we use the instruction to let the model know when it should generate the table aliases, and when it should just directly generate the SQL (to facilitate the mixing of both types of data).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good – making a fix now!
… other is called `pregen`
… other is called `pregen`
Fixed! We can now use |
This automatically generates relevant table aliases and appends it to a prompt. Doing so transfers the onus of creating table aliases away from the LLM. We may have to retrain our LLM to expect this kind of prompting, so that it expects a more varied source of inputs.
Here's an example of how to run this.