You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was reading the "Building an eval" and it's categories and it includes over refusals and safety. I am going to have to read over again to understand how I am going to give my eval a metric, but anyways here is my idea, I was thinking since it includes over-refusals and safety, that an eval with examples of prompt injection with adversarial prompt engineering used to design prompts that can cause the model to generate a response that is not intended. Please let me know if this is something that will be accepted and how many examples would be needed for me to list. Thanks.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I was reading the "Building an eval" and it's categories and it includes over refusals and safety. I am going to have to read over again to understand how I am going to give my eval a metric, but anyways here is my idea, I was thinking since it includes over-refusals and safety, that an eval with examples of prompt injection with adversarial prompt engineering used to design prompts that can cause the model to generate a response that is not intended. Please let me know if this is something that will be accepted and how many examples would be needed for me to list. Thanks.
Beta Was this translation helpful? Give feedback.
All reactions