-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MVP improvements to automated archive runs #357
Conversation
For more information, see https://pre-commit.ci
.github/workflows/run-archiver.yml
Outdated
- nrelatb | ||
- phmsagas | ||
|
||
dataset: ${{ fromJSON(format('[{0}]', inputs.small_runner || '"eia176","eia191","eia757a","eia860","eia860m","eia861","eia923","eia930","eiaaeo","eiawater","eia_bulk_elec","epacamd_eia","ferc1","ferc2","ferc6","ferc60","ferc714","mshamines","nrelatb","phmsagas"')) }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If scheduled, should default to the full list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocking: What do you think of defining one list of "all the damn datasets" in env
so we can access that everywhere we need to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit tricky here because there should in fact be two variables - "small runner datasets" and "large runner datasets". The alternative is one variable with some kind of filtering, but I couldn't figure out how to do that neatly. But I can make a "small" and "large" dataset list.
.github/workflows/run-archiver.yml
Outdated
@@ -78,6 +73,7 @@ jobs: | |||
path: ${{ matrix.dataset }}_run_summary.json | |||
|
|||
archive-run-large: | |||
if: inputs.large_runner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If set as true in workflow dispatch, or triggered by scheduled run this should run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, if triggered by scheduled run, I would expect inputs.large_runner
be empty and thus archive-run-large
to get skipped - am I missing something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My assumption was that an empty string would actually get evaluated as true, but I could be totally off-base here. I think your suggestion re: incorporating the type of run below is great and I'll incorporate it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an unset variable here would get treated as a "falsey" value: https://docs.github.com/en/actions/learn-github-actions/expressions#literals
Note that in conditionals, falsy values (false, 0, -0, "", '', null) are coerced to false and truthy (true and other non-falsy values) are coerced to true.
And actually I bet unset variable is actually null
instead of ''
, now that I look at those docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way, making this more explicit seems wise.
.github/workflows/run-archiver.yml
Outdated
@@ -160,3 +155,19 @@ jobs: | |||
payload: ${{ steps.all_summaries.outputs.SLACK_PAYLOAD }} | |||
env: | |||
SLACK_BOT_TOKEN: ${{ secrets.PUDL_DEPLOY_SLACK_TOKEN }} | |||
|
|||
make-github-issue: | |||
if: always() && inputs.create_github_issue != false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See discussion here for use of always()
actions/runner#491
Completely unhinged GHA behavior, but what can we do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whee! hey, if it works it works.
Is the idea behind inputs.create_github_issue != false
that "" != false
and then we will get the github issue in scheduled runs as well as the specified manual runs?
If so, what do you think of using github.event_name
to differentiate between workflow_dispatch
and scheduled
runs? That is more explicitly "do X step if it's scheduled or if there's some specific workflow_dispatch input."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a great idea, can incorporate it here. But yes, that was my original idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broadly this looks good! I have a few small questions that I'd love to see addressed before we merge. If you think that the implied changes aren't useful, feel free to say so and re-request review :)
In terms of testing - pretty tricky to do more than just "trigger manual runs and see if things work the way you expect." I think that level of testing is fine, and if the scheduled runs fail we can always handle that manually.
``` | ||
|
||
# Relevant logs | ||
[Link to logs from GHA run]( PLEASE FIND THE ACTUAL LINK AND FILL IN HERE ) | ||
[Link to logs from GHA run]({{ env.RUN_URL }}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-blocking: we could maybe strip this whole section if there's that summary of results
section above.
.github/workflows/run-archiver.yml
Outdated
@@ -78,6 +73,7 @@ jobs: | |||
path: ${{ matrix.dataset }}_run_summary.json | |||
|
|||
archive-run-large: | |||
if: inputs.large_runner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, if triggered by scheduled run, I would expect inputs.large_runner
be empty and thus archive-run-large
to get skipped - am I missing something here?
.github/workflows/run-archiver.yml
Outdated
- nrelatb | ||
- phmsagas | ||
|
||
dataset: ${{ fromJSON(format('[{0}]', inputs.small_runner || '"eia176","eia191","eia757a","eia860","eia860m","eia861","eia923","eia930","eiaaeo","eiawater","eia_bulk_elec","epacamd_eia","ferc1","ferc2","ferc6","ferc60","ferc714","mshamines","nrelatb","phmsagas"')) }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocking: What do you think of defining one list of "all the damn datasets" in env
so we can access that everywhere we need to?
.github/workflows/run-archiver.yml
Outdated
@@ -160,3 +155,19 @@ jobs: | |||
payload: ${{ steps.all_summaries.outputs.SLACK_PAYLOAD }} | |||
env: | |||
SLACK_BOT_TOKEN: ${{ secrets.PUDL_DEPLOY_SLACK_TOKEN }} | |||
|
|||
make-github-issue: | |||
if: always() && inputs.create_github_issue != false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whee! hey, if it works it works.
Is the idea behind inputs.create_github_issue != false
that "" != false
and then we will get the github issue in scheduled runs as well as the specified manual runs?
If so, what do you think of using github.event_name
to differentiate between workflow_dispatch
and scheduled
runs? That is more explicitly "do X step if it's scheduled or if there's some specific workflow_dispatch input."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢 - no need to futz around more in the tangled web of GHA if this works 😄 .
Sure there's some things that "could" be cleaner, but is the time saved down the line worth the upfront investment now? I don't think so at this point.
Overview
Addresses subset of #346 and #347, focusing on the lowest hanging improvements.
What problem does this address?
There are two types of GHA failures we want to keep track of: run failures and validation test failures. These should both be reported in the Slackbot notifications. The archive run now also automatically opens an issue using the archive issue template, noting the date of the run and linking to the URL for the action. Finally, we can now easily configure what datasets we run on a manual workflow dispatch, whether to kick off the large runner for EPACEMS (and eventually maybe some other datasets), and whether to make a Github issue following the run.
What did you change in this PR?
make_slack_notification.py
to incorporate validation test failures.Out of scope:
Testing
How did you make sure this worked? How can a reviewer verify this?
Action testing one dataset manually provided, Github issue creation and skipping large runners:
https://github.com/catalyst-cooperative/pudl-archiver/actions/runs/9569277213
#360
Action testing large runners, 2 datasets manually provided:
https://github.com/catalyst-cooperative/pudl-archiver/actions/runs/9568539141
Action testing skipping of Github issue creation:
https://github.com/catalyst-cooperative/pudl-archiver/actions/runs/9569300760
To-do list
Tasks