Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MVP improvements to automated archive runs #357

Merged
merged 35 commits into from
Jun 19, 2024
Merged

Conversation

e-belfer
Copy link
Member

@e-belfer e-belfer commented Jun 17, 2024

Overview

Addresses subset of #346 and #347, focusing on the lowest hanging improvements.

What problem does this address?
There are two types of GHA failures we want to keep track of: run failures and validation test failures. These should both be reported in the Slackbot notifications. The archive run now also automatically opens an issue using the archive issue template, noting the date of the run and linking to the URL for the action. Finally, we can now easily configure what datasets we run on a manual workflow dispatch, whether to kick off the large runner for EPACEMS (and eventually maybe some other datasets), and whether to make a Github issue following the run.

What did you change in this PR?

  • Update the make_slack_notification.py to incorporate validation test failures.
  • Auto-create Github issue monthly with link to run results and date
  • Add options for manual dispatch: to only run workflow on certain datasets, kick off large runner, make Github issue

Out of scope:

  • Figure out how to add other run failures to Slack message with enough info from the error to avoid having to luck at the run logs (e.g., "archive deleted")
  • Figure out how to add run summary to Github issue (e.g., these datasets passed, these failed validation, these are differently broken)
  • Figure out how to auto-populate tasklist with this run summary

Testing

How did you make sure this worked? How can a reviewer verify this?

Action testing one dataset manually provided, Github issue creation and skipping large runners:
https://github.com/catalyst-cooperative/pudl-archiver/actions/runs/9569277213
#360

Action testing large runners, 2 datasets manually provided:
https://github.com/catalyst-cooperative/pudl-archiver/actions/runs/9568539141

Action testing skipping of Github issue creation:
https://github.com/catalyst-cooperative/pudl-archiver/actions/runs/9569300760

To-do list

Tasks

Preview Give feedback

@e-belfer e-belfer changed the title Add failures to slackbot When GHA jobs fail or validation tests fail, report to Slackbot Jun 17, 2024
@e-belfer e-belfer self-assigned this Jun 17, 2024
@e-belfer e-belfer changed the title When GHA jobs fail or validation tests fail, report to Slackbot MVP improvements to automated archive runs Jun 18, 2024
@e-belfer e-belfer marked this pull request as ready for review June 18, 2024 16:16
- nrelatb
- phmsagas

dataset: ${{ fromJSON(format('[{0}]', inputs.small_runner || '"eia176","eia191","eia757a","eia860","eia860m","eia861","eia923","eia930","eiaaeo","eiawater","eia_bulk_elec","epacamd_eia","ferc1","ferc2","ferc6","ferc60","ferc714","mshamines","nrelatb","phmsagas"')) }}
Copy link
Member Author

@e-belfer e-belfer Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If scheduled, should default to the full list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocking: What do you think of defining one list of "all the damn datasets" in env so we can access that everywhere we need to?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit tricky here because there should in fact be two variables - "small runner datasets" and "large runner datasets". The alternative is one variable with some kind of filtering, but I couldn't figure out how to do that neatly. But I can make a "small" and "large" dataset list.

@@ -78,6 +73,7 @@ jobs:
path: ${{ matrix.dataset }}_run_summary.json

archive-run-large:
if: inputs.large_runner
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If set as true in workflow dispatch, or triggered by scheduled run this should run.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, if triggered by scheduled run, I would expect inputs.large_runner be empty and thus archive-run-large to get skipped - am I missing something here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My assumption was that an empty string would actually get evaluated as true, but I could be totally off-base here. I think your suggestion re: incorporating the type of run below is great and I'll incorporate it here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an unset variable here would get treated as a "falsey" value: https://docs.github.com/en/actions/learn-github-actions/expressions#literals

Note that in conditionals, falsy values (false, 0, -0, "", '', null) are coerced to false and truthy (true and other non-falsy values) are coerced to true.

And actually I bet unset variable is actually null instead of '', now that I look at those docs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way, making this more explicit seems wise.

@@ -160,3 +155,19 @@ jobs:
payload: ${{ steps.all_summaries.outputs.SLACK_PAYLOAD }}
env:
SLACK_BOT_TOKEN: ${{ secrets.PUDL_DEPLOY_SLACK_TOKEN }}

make-github-issue:
if: always() && inputs.create_github_issue != false
Copy link
Member Author

@e-belfer e-belfer Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See discussion here for use of always() actions/runner#491
Completely unhinged GHA behavior, but what can we do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whee! hey, if it works it works.

Is the idea behind inputs.create_github_issue != false that "" != false and then we will get the github issue in scheduled runs as well as the specified manual runs?

If so, what do you think of using github.event_name to differentiate between workflow_dispatch and scheduled runs? That is more explicitly "do X step if it's scheduled or if there's some specific workflow_dispatch input."

Copy link
Member Author

@e-belfer e-belfer Jun 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a great idea, can incorporate it here. But yes, that was my original idea.

@e-belfer e-belfer added github_actions Pull requests that update GitHub Actions code automation Issues relating to automated archiver runs labels Jun 18, 2024
@e-belfer e-belfer requested a review from jdangerx June 18, 2024 19:31
Copy link
Member

@jdangerx jdangerx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broadly this looks good! I have a few small questions that I'd love to see addressed before we merge. If you think that the implied changes aren't useful, feel free to say so and re-request review :)

In terms of testing - pretty tricky to do more than just "trigger manual runs and see if things work the way you expect." I think that level of testing is fine, and if the scheduled runs fail we can always handle that manually.

```

# Relevant logs
[Link to logs from GHA run]( PLEASE FIND THE ACTUAL LINK AND FILL IN HERE )
[Link to logs from GHA run]({{ env.RUN_URL }})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking: we could maybe strip this whole section if there's that summary of results section above.

.github/workflows/run-archiver.yml Show resolved Hide resolved
@@ -78,6 +73,7 @@ jobs:
path: ${{ matrix.dataset }}_run_summary.json

archive-run-large:
if: inputs.large_runner
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, if triggered by scheduled run, I would expect inputs.large_runner be empty and thus archive-run-large to get skipped - am I missing something here?

- nrelatb
- phmsagas

dataset: ${{ fromJSON(format('[{0}]', inputs.small_runner || '"eia176","eia191","eia757a","eia860","eia860m","eia861","eia923","eia930","eiaaeo","eiawater","eia_bulk_elec","epacamd_eia","ferc1","ferc2","ferc6","ferc60","ferc714","mshamines","nrelatb","phmsagas"')) }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocking: What do you think of defining one list of "all the damn datasets" in env so we can access that everywhere we need to?

@@ -160,3 +155,19 @@ jobs:
payload: ${{ steps.all_summaries.outputs.SLACK_PAYLOAD }}
env:
SLACK_BOT_TOKEN: ${{ secrets.PUDL_DEPLOY_SLACK_TOKEN }}

make-github-issue:
if: always() && inputs.create_github_issue != false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whee! hey, if it works it works.

Is the idea behind inputs.create_github_issue != false that "" != false and then we will get the github issue in scheduled runs as well as the specified manual runs?

If so, what do you think of using github.event_name to differentiate between workflow_dispatch and scheduled runs? That is more explicitly "do X step if it's scheduled or if there's some specific workflow_dispatch input."

scripts/make_slack_notification_message.py Show resolved Hide resolved
@e-belfer
Copy link
Member Author

Tested the large runner and github issue creation here, all looks good. Issue is here. @jdangerx you're right that I haven't come up with a way to test the scheduled run behavior, we could either kick one off or debug this at the start of the month as needed.

@e-belfer e-belfer requested a review from jdangerx June 19, 2024 14:55
Copy link
Member

@jdangerx jdangerx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢 - no need to futz around more in the tangled web of GHA if this works 😄 .

Sure there's some things that "could" be cleaner, but is the time saved down the line worth the upfront investment now? I don't think so at this point.

.github/workflows/run-archiver.yml Show resolved Hide resolved
@e-belfer e-belfer merged commit e1d78b5 into main Jun 19, 2024
6 of 7 checks passed
@e-belfer e-belfer deleted the add-failures-to-slackbot branch June 19, 2024 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automation Issues relating to automated archiver runs github_actions Pull requests that update GitHub Actions code
Projects
Archived in project
2 participants