Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test failed in CI: test_mgs_metrics #7251

Closed
smklein opened this issue Dec 13, 2024 · 3 comments
Closed

test failed in CI: test_mgs_metrics #7251

smklein opened this issue Dec 13, 2024 · 3 comments
Labels
Test Flake Tests that work. Wait, no. Actually yes. Hang on. Something is broken.

Comments

@smklein
Copy link
Collaborator

smklein commented Dec 13, 2024

This test failed on a CI run on #7236

https://github.com/oxidecomputer/omicron/pull/7236/checks?check_run_id=34396932885

Log showing the specific test failure:

https://buildomat.eng.oxide.computer/wg/0/details/01JF0W46RCCHYRBT939BZNX9N1/lzD91cQIcXjBCvXQ7t9oaI2PQhD9bmE4RhLxI5qWH1JlfFFq/01JF0W51WYHSSG6WTF71FH0539

Excerpt from the log showing the failure:

[6596](https://buildomat.eng.oxide.computer/wg/0/details/01JF0W46RCCHYRBT939BZNX9N1/lzD91cQIcXjBCvXQ7t9oaI2PQhD9bmE4RhLxI5qWH1JlfFFq/01JF0W51WYHSSG6WTF71FH0539#S6596)	2024-12-13T21:18:28.172Z	
[6597](https://buildomat.eng.oxide.computer/wg/0/details/01JF0W46RCCHYRBT939BZNX9N1/lzD91cQIcXjBCvXQ7t9oaI2PQhD9bmE4RhLxI5qWH1JlfFFq/01JF0W51WYHSSG6WTF71FH0539#S6597)	2024-12-13T21:18:28.173Z	=== checking timeseries for hardware_component:temperature ===
[6598](https://buildomat.eng.oxide.computer/wg/0/details/01JF0W46RCCHYRBT939BZNX9N1/lzD91cQIcXjBCvXQ7t9oaI2PQhD9bmE4RhLxI5qWH1JlfFFq/01JF0W51WYHSSG6WTF71FH0539#S6598)	2024-12-13T21:18:28.173Z	
[6599](https://buildomat.eng.oxide.computer/wg/0/details/01JF0W46RCCHYRBT939BZNX9N1/lzD91cQIcXjBCvXQ7t9oaI2PQhD9bmE4RhLxI5qWH1JlfFFq/01JF0W51WYHSSG6WTF71FH0539#S6599)	2024-12-13T21:18:28.173Z	thread 'integration_tests::metrics::test_mgs_metrics' panicked at nexus/tests/integration_tests/metrics.rs:413:17:
[6600](https://buildomat.eng.oxide.computer/wg/0/details/01JF0W46RCCHYRBT939BZNX9N1/lzD91cQIcXjBCvXQ7t9oaI2PQhD9bmE4RhLxI5qWH1JlfFFq/01JF0W51WYHSSG6WTF71FH0539#S6600)	2024-12-13T21:18:28.173Z	could not parse timeseries query response: parsing response body
[6601](https://buildomat.eng.oxide.computer/wg/0/details/01JF0W46RCCHYRBT939BZNX9N1/lzD91cQIcXjBCvXQ7t9oaI2PQhD9bmE4RhLxI5qWH1JlfFFq/01JF0W51WYHSSG6WTF71FH0539#S6601)	2024-12-13T21:18:28.173Z	
[6602](https://buildomat.eng.oxide.computer/wg/0/details/01JF0W46RCCHYRBT939BZNX9N1/lzD91cQIcXjBCvXQ7t9oaI2PQhD9bmE4RhLxI5qWH1JlfFFq/01JF0W51WYHSSG6WTF71FH0539#S6602)	2024-12-13T21:18:28.173Z	Caused by:
[6603](https://buildomat.eng.oxide.computer/wg/0/details/01JF0W46RCCHYRBT939BZNX9N1/lzD91cQIcXjBCvXQ7t9oaI2PQhD9bmE4RhLxI5qWH1JlfFFq/01JF0W51WYHSSG6WTF71FH0539#S6603)	2024-12-13T21:18:28.173Z	    missing field `tables` at line 5 column 1
@smklein smklein added the Test Flake Tests that work. Wait, no. Actually yes. Hang on. Something is broken. label Dec 13, 2024
@hawkw
Copy link
Member

hawkw commented Dec 13, 2024

oh no, not again...

@jgallagher
Copy link
Contributor

I think this is the revival of #7084 which we thought was fixed in #7156

@bnaecker
Copy link
Collaborator

It looks like there is a mismatch between the error messages we're checking for in the retry-loop and the one we seem to get back from the actual error. Here's the error we're failing on: Timeseries not found for: hardware_component:temperature, but the retry code is looking for a message like "Schema for timeseries hardware_component:temperature not found". It sure looks like I just flubbed the check itself, though I'm not sure how. Looking back at #7084, we see exactly the same error message as we see today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Test Flake Tests that work. Wait, no. Actually yes. Hang on. Something is broken.
Projects
None yet
Development

No branches or pull requests

4 participants