Eval on Jailbreaks #350

pocket909 · 2023-03-19T16:32:08Z

pocket909
Mar 19, 2023

I was reading the "Building an eval" and it's categories and it includes over refusals and safety. I am going to have to read over again to understand how I am going to give my eval a metric, but anyways here is my idea, I was thinking since it includes over-refusals and safety, that an eval with examples of prompt injection with adversarial prompt engineering used to design prompts that can cause the model to generate a response that is not intended. Please let me know if this is something that will be accepted and how many examples would be needed for me to list. Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval on Jailbreaks #350

{{title}}

Replies: 0 comments

Select a reply

Eval on Jailbreaks #350

pocket909 Mar 19, 2023

Replies: 0 comments

pocket909
Mar 19, 2023