Skip to content

Actions: openai/evals

Actions

Run unit tests

Actions

Loading...
Loading

Show workflow options

Create status badge

Loading
260 workflow runs
260 workflow runs

Filter by Event

Filter by Status

Filter by Branch

Filter by Actor

Error Recovery Eval
Run unit tests #1663: Pull request #1485 synchronize by ojaffe
March 19, 2024 08:15 2m 19s ojaffe:ollie/error_recovery
March 19, 2024 08:15 2m 19s
Add Human-Relative MLAgentBench
Run unit tests #1662: Pull request #1496 synchronize by danesherbs
March 19, 2024 07:31 9m 4s danesherbs:dane/add-mlab-v2
March 19, 2024 07:31 9m 4s
Add Human-Relative MLAgentBench
Run unit tests #1661: Pull request #1496 synchronize by danesherbs
March 19, 2024 07:13 3m 43s danesherbs:dane/add-mlab-v2
March 19, 2024 07:13 3m 43s
Add Human-Relative MLAgentBench
Run unit tests #1660: Pull request #1496 synchronize by danesherbs
March 19, 2024 06:32 3m 41s danesherbs:dane/add-mlab-v2
March 19, 2024 06:32 3m 41s
Add Human-Relative MLAgentBench
Run unit tests #1659: Pull request #1496 synchronize by danesherbs
March 19, 2024 06:25 4m 44s danesherbs:dane/add-mlab-v2
March 19, 2024 06:25 4m 44s
Add Human-Relative MLAgentBench
Run unit tests #1658: Pull request #1496 synchronize by danesherbs
March 19, 2024 06:02 3m 30s danesherbs:dane/add-mlab-v2
March 19, 2024 06:02 3m 30s
Add Human-Relative MLAgentBench
Run unit tests #1657: Pull request #1496 opened by danesherbs
March 19, 2024 05:57 2m 24s danesherbs:dane/add-mlab-v2
March 19, 2024 05:57 2m 24s
Can't Do That Anymore Eval (#1487)
Run unit tests #1656: Commit f72afb9 pushed by JunShern
March 19, 2024 04:04 2m 38s main
March 19, 2024 04:04 2m 38s
Bugged Tools Eval (#1486)
Run unit tests #1655: Commit ad377e4 pushed by JunShern
March 19, 2024 04:00 2m 35s main
March 19, 2024 04:00 2m 35s
Add Function Deduction eval
Run unit tests #1652: Pull request #1492 opened by james-aung
March 15, 2024 18:25 2m 19s james-aung:function-deduction
March 15, 2024 18:25 2m 19s
Add In-Context RL eval
Run unit tests #1651: Pull request #1491 opened by james-aung
March 15, 2024 18:24 2m 5s james-aung:incontext-rl
March 15, 2024 18:24 2m 5s
Already Said That Eval
Run unit tests #1650: Pull request #1490 synchronize by thesofakillers
March 15, 2024 14:22 2m 23s thesofakillers:ast
March 15, 2024 14:22 2m 23s
Track the Stat Eval
Run unit tests #1649: Pull request #1489 opened by thesofakillers
March 15, 2024 14:06 2m 51s thesofakillers:tts
March 15, 2024 14:06 2m 51s
Identifying Variables Eval
Run unit tests #1648: Pull request #1488 synchronize by thesofakillers
March 15, 2024 13:46 2m 35s thesofakillers:idvars
March 15, 2024 13:46 2m 35s
Identifying Variables Eval
Run unit tests #1647: Pull request #1488 synchronize by thesofakillers
March 15, 2024 13:45 2m 26s thesofakillers:idvars
March 15, 2024 13:45 2m 26s
Identifying Variables Eval
Run unit tests #1646: Pull request #1488 opened by thesofakillers
March 15, 2024 13:38 2m 39s thesofakillers:idvars
March 15, 2024 13:38 2m 39s
Can't Do That Anymore Eval
Run unit tests #1645: Pull request #1487 opened by ojaffe
March 15, 2024 10:54 2m 9s ojaffe:ollie/cant_do_that_anymore
March 15, 2024 10:54 2m 9s
Bugged Tools Eval
Run unit tests #1644: Pull request #1486 opened by ojaffe
March 15, 2024 10:37 2m 6s ojaffe:ollie/bugged_tools
March 15, 2024 10:37 2m 6s
Error Recovery Eval
Run unit tests #1643: Pull request #1485 synchronize by ojaffe
March 15, 2024 10:32 2m 10s ojaffe:ollie/error_recovery
March 15, 2024 10:32 2m 10s
Error Recovery Eval
Run unit tests #1642: Pull request #1485 opened by ojaffe
March 15, 2024 10:25 2m 47s ojaffe:ollie/error_recovery
March 15, 2024 10:25 2m 47s
Updates on existing evals; readmes; solvers (#1483)
Run unit tests #1641: Commit 11c30b2 pushed by JunShern
March 13, 2024 10:20 2m 28s main
March 13, 2024 10:20 2m 28s
Updates on existing evals; readmes; solvers
Run unit tests #1640: Pull request #1483 opened by ojaffe
March 13, 2024 09:45 2m 25s ojaffe:ollie/updates-20240313
March 13, 2024 09:45 2m 25s
Log model and usage stats in record.sampling
Run unit tests #1638: Pull request #1449 synchronize by JunShern
March 13, 2024 07:48 2m 29s jun/log-token-counts
March 13, 2024 07:48 2m 29s
Drop two datasets from steganography (#1481)
Run unit tests #1637: Commit 7e958fe pushed by JunShern
March 12, 2024 09:23 2m 50s main
March 12, 2024 09:23 2m 50s