Skip to content

Actions: openai/evals

Actions

Run unit tests

Actions

Loading...
Loading

Show workflow options

Create status badge

Loading
260 workflow runs
260 workflow runs

Filter by Event

Filter by Status

Filter by Branch

Filter by Actor

AnthropicSolver (#1498)
Run unit tests #1688: Commit e30e141 pushed by JunShern
March 21, 2024 04:15 3m 48s main
March 21, 2024 04:15 3m 48s
Add Human-Relative MLAgentBench (#1496)
Run unit tests #1687: Commit 4f97ce6 pushed by JunShern
March 21, 2024 03:47 4m 56s main
March 21, 2024 03:47 4m 56s
Add Human-Relative MLAgentBench
Run unit tests #1686: Pull request #1496 synchronize by danesherbs
March 21, 2024 03:36 3m 37s danesherbs:dane/add-mlab-v2
March 21, 2024 03:36 3m 37s
Add Multi-Step Web Tasks (#1500)
Run unit tests #1685: Commit 5b84993 pushed by JunShern
March 21, 2024 03:35 2m 27s main
March 21, 2024 03:35 2m 27s
Add Multi-Step Web Tasks
Run unit tests #1684: Pull request #1500 synchronize by danesherbs
March 21, 2024 02:40 2m 21s danesherbs:dane/add-multi-step-web-tasks
March 21, 2024 02:40 2m 21s
Add Multi-Step Web Tasks
Run unit tests #1683: Pull request #1500 synchronize by danesherbs
March 21, 2024 02:21 2m 18s danesherbs:dane/add-multi-step-web-tasks
March 21, 2024 02:21 2m 18s
Add In-Context RL eval (#1491)
Run unit tests #1682: Commit ff994b5 pushed by JunShern
March 19, 2024 14:59 5m 56s main
March 19, 2024 14:59 5m 56s
Add In-Context RL eval
Run unit tests #1681: Pull request #1491 synchronize by james-aung
March 19, 2024 14:27 2m 10s james-aung:incontext-rl
March 19, 2024 14:27 2m 10s
Add Function Deduction eval (#1492)
Run unit tests #1680: Commit dfeaac4 pushed by JunShern
March 19, 2024 14:25 2m 34s main
March 19, 2024 14:25 2m 34s
Identifying Variables Eval (#1488)
Run unit tests #1679: Commit c207dba pushed by JunShern
March 19, 2024 14:21 2m 39s main
March 19, 2024 14:21 2m 39s
Track the Stat Eval (#1489)
Run unit tests #1678: Commit 99bfada pushed by JunShern
March 19, 2024 14:09 2m 19s main
March 19, 2024 14:09 2m 19s
Add Function Deduction eval
Run unit tests #1677: Pull request #1492 synchronize by james-aung
March 19, 2024 14:09 2m 28s james-aung:function-deduction
March 19, 2024 14:09 2m 28s
Already Said That Eval (#1490)
Run unit tests #1676: Commit baa12d0 pushed by JunShern
March 19, 2024 14:03 2m 42s main
March 19, 2024 14:03 2m 42s
Add 20 questions eval (#1499)
Run unit tests #1675: Commit bd1736e pushed by JunShern
March 19, 2024 13:57 5m 35s main
March 19, 2024 13:57 5m 35s
Add skill acquisition eval (#1497)
Run unit tests #1674: Commit 76a9f4e pushed by JunShern
March 19, 2024 13:53 2m 20s main
March 19, 2024 13:53 2m 20s
Add 20 questions eval
Run unit tests #1673: Pull request #1499 opened by inwaves
March 19, 2024 11:13 2m 14s inwaves:andrei/add-20-questions
March 19, 2024 11:13 2m 14s
AnthropicSolver
Run unit tests #1672: Pull request #1498 opened by thesofakillers
March 19, 2024 10:26 2m 16s thesofakillers:anthropic-solver
March 19, 2024 10:26 2m 16s
Identifying Variables Eval
Run unit tests #1671: Pull request #1488 synchronize by thesofakillers
March 19, 2024 09:58 2m 44s thesofakillers:idvars
March 19, 2024 09:58 2m 44s
Track the Stat Eval
Run unit tests #1670: Pull request #1489 synchronize by thesofakillers
March 19, 2024 09:38 2m 17s thesofakillers:tts
March 19, 2024 09:38 2m 17s
Track the Stat Eval
Run unit tests #1669: Pull request #1489 synchronize by thesofakillers
March 19, 2024 09:33 2m 25s thesofakillers:tts
March 19, 2024 09:33 2m 25s
Already Said That Eval
Run unit tests #1668: Pull request #1490 synchronize by thesofakillers
March 19, 2024 09:32 2m 15s thesofakillers:ast
March 19, 2024 09:32 2m 15s
Add Human-Relative MLAgentBench
Run unit tests #1667: Pull request #1496 synchronize by danesherbs
March 19, 2024 09:20 3m 40s danesherbs:dane/add-mlab-v2
March 19, 2024 09:20 3m 40s
Add Human-Relative MLAgentBench
Run unit tests #1666: Pull request #1496 synchronize by danesherbs
March 19, 2024 09:12 6m 28s danesherbs:dane/add-mlab-v2
March 19, 2024 09:12 6m 28s
Error Recovery Eval (#1485)
Run unit tests #1665: Commit 80ac60d pushed by JunShern
March 19, 2024 08:26 3m 3s main
March 19, 2024 08:26 3m 3s
Add skill acquisition eval
Run unit tests #1664: Pull request #1497 opened by inwaves
March 19, 2024 08:25 2m 26s inwaves:andrei/updates-20240319
March 19, 2024 08:25 2m 26s