Skip to content

Actions: openai/evals

Actions

All workflows

Actions

Loading...
Loading

Showing runs from all workflows
390 workflow runs
390 workflow runs

Filter by Event

Filter by Status

Filter by Branch

Filter by Actor

Add In-Context RL eval
Run new evals #2242: Pull request #1491 synchronize by james-aung
March 19, 2024 14:27 2m 21s james-aung:incontext-rl
March 19, 2024 14:27 2m 21s
Add In-Context RL eval
Run unit tests #1681: Pull request #1491 synchronize by james-aung
March 19, 2024 14:27 2m 10s james-aung:incontext-rl
March 19, 2024 14:27 2m 10s
Add Function Deduction eval (#1492)
Run unit tests #1680: Commit dfeaac4 pushed by JunShern
March 19, 2024 14:25 2m 34s main
March 19, 2024 14:25 2m 34s
Identifying Variables Eval (#1488)
Run unit tests #1679: Commit c207dba pushed by JunShern
March 19, 2024 14:21 2m 39s main
March 19, 2024 14:21 2m 39s
Track the Stat Eval (#1489)
Run unit tests #1678: Commit 99bfada pushed by JunShern
March 19, 2024 14:09 2m 19s main
March 19, 2024 14:09 2m 19s
Add Function Deduction eval
Run unit tests #1677: Pull request #1492 synchronize by james-aung
March 19, 2024 14:09 2m 28s james-aung:function-deduction
March 19, 2024 14:09 2m 28s
Add Function Deduction eval
Run new evals #2241: Pull request #1492 synchronize by james-aung
March 19, 2024 14:09 2m 7s james-aung:function-deduction
March 19, 2024 14:09 2m 7s
Already Said That Eval (#1490)
Run unit tests #1676: Commit baa12d0 pushed by JunShern
March 19, 2024 14:03 2m 42s main
March 19, 2024 14:03 2m 42s
Add 20 questions eval (#1499)
Run unit tests #1675: Commit bd1736e pushed by JunShern
March 19, 2024 13:57 5m 35s main
March 19, 2024 13:57 5m 35s
Add skill acquisition eval (#1497)
Run unit tests #1674: Commit 76a9f4e pushed by JunShern
March 19, 2024 13:53 2m 20s main
March 19, 2024 13:53 2m 20s
Add 20 questions eval
Run unit tests #1673: Pull request #1499 opened by inwaves
March 19, 2024 11:13 2m 14s inwaves:andrei/add-20-questions
March 19, 2024 11:13 2m 14s
Add 20 questions eval
Run new evals #2240: Pull request #1499 opened by inwaves
March 19, 2024 11:13 2m 14s inwaves:andrei/add-20-questions
March 19, 2024 11:13 2m 14s
AnthropicSolver
Run unit tests #1672: Pull request #1498 opened by thesofakillers
March 19, 2024 10:26 2m 16s thesofakillers:anthropic-solver
March 19, 2024 10:26 2m 16s
AnthropicSolver
Run new evals #2239: Pull request #1498 opened by thesofakillers
March 19, 2024 10:26 2m 9s thesofakillers:anthropic-solver
March 19, 2024 10:26 2m 9s
Identifying Variables Eval
Run new evals #2238: Pull request #1488 synchronize by thesofakillers
March 19, 2024 09:58 2m 23s thesofakillers:idvars
March 19, 2024 09:58 2m 23s
Identifying Variables Eval
Run unit tests #1671: Pull request #1488 synchronize by thesofakillers
March 19, 2024 09:58 2m 44s thesofakillers:idvars
March 19, 2024 09:58 2m 44s
Track the Stat Eval
Run unit tests #1670: Pull request #1489 synchronize by thesofakillers
March 19, 2024 09:38 2m 17s thesofakillers:tts
March 19, 2024 09:38 2m 17s
Track the Stat Eval
Run new evals #2237: Pull request #1489 synchronize by thesofakillers
March 19, 2024 09:38 2m 17s thesofakillers:tts
March 19, 2024 09:38 2m 17s
Track the Stat Eval
Run unit tests #1669: Pull request #1489 synchronize by thesofakillers
March 19, 2024 09:33 2m 25s thesofakillers:tts
March 19, 2024 09:33 2m 25s
Track the Stat Eval
Run new evals #2236: Pull request #1489 synchronize by thesofakillers
March 19, 2024 09:33 2m 31s thesofakillers:tts
March 19, 2024 09:33 2m 31s
Already Said That Eval
Run new evals #2235: Pull request #1490 synchronize by thesofakillers
March 19, 2024 09:32 3m 18s thesofakillers:ast
March 19, 2024 09:32 3m 18s
Already Said That Eval
Run unit tests #1668: Pull request #1490 synchronize by thesofakillers
March 19, 2024 09:32 2m 15s thesofakillers:ast
March 19, 2024 09:32 2m 15s
Add Human-Relative MLAgentBench
Run new evals #2234: Pull request #1496 synchronize by danesherbs
March 19, 2024 09:20 3m 36s danesherbs:dane/add-mlab-v2
March 19, 2024 09:20 3m 36s
Add Human-Relative MLAgentBench
Run unit tests #1667: Pull request #1496 synchronize by danesherbs
March 19, 2024 09:20 3m 40s danesherbs:dane/add-mlab-v2
March 19, 2024 09:20 3m 40s
Add Human-Relative MLAgentBench
Run unit tests #1666: Pull request #1496 synchronize by danesherbs
March 19, 2024 09:12 6m 28s danesherbs:dane/add-mlab-v2
March 19, 2024 09:12 6m 28s