Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s3: Improve error messages and handling #10125

Merged

Conversation

quodlibetor
Copy link
Contributor

  • Include more information in all error messages so that users can understand
    what caused the problem
  • ListBucketsError is a fatal error, but we were just sending two messages
    instead of making it a hard error.

Motivation

  • This PR fixes a previously unreported bug:

    We were providing useless error messages for S3 errors.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • [n/a] This PR adds a release note for any user-facing behavior changes.

* Include more information in all error messages so that users can understand
  what caused the problem
* ListBucketsError is a fatal error, but we were just sending two messages
  instead of making it a hard error.
@quodlibetor quodlibetor force-pushed the s3-detailed-error-messages branch from fa1be10 to 5f11c57 Compare January 19, 2022 22:19
@quodlibetor quodlibetor requested review from a team and guswynn January 19, 2022 22:24
Copy link
Contributor

@guswynn guswynn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems good!

is the reason a list error is fatal but a get error isnt basically that we expect get's to sometimes fail as we try to obtain all objects in a bucket, but we expect the bucket to be available?

err,
}))
.await
.unwrap_or_else(|e| tracing::debug!("Source queue has been shut down: {}", e));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this only happen during shutdown?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IoError {
bucket: String,
err: std::io::Error,
},
}

impl std::error::Error for S3Error {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to implement source on the error impl, for the cases that have inner errors (which right now is all of them), maybe a good TODO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, currently we print all causes in the display impl instead of implementing the causes method. There is a rust-general discussion about what the best thing to is, but this is generally what we do in mz.

For now I think it's more confusing to both impl display of causes and the cause() method, but I do think it would be better in general if we impl'd cause() everywhere and used anyhow's {:#?} format for final display of errors.

S3Error::GetObjectError { .. } | S3Error::ListObjectsFailed(_) => {
Ok(NextMessage::Pending)
}
Some(Some(Err(e))) => match e {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does the double-Option mean? another good TODO to change that to an explicit enum

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that the double-option is an an artifact of using now_or_never() on an async queue -- recv() yields an Option, and now_or_never() yields an Option of the thing it's called on.

Some(Some(Err(e))) => match e {
S3Error::GetObjectError { .. } => {
tracing::warn!(
"when reading source '{}' ({}): {}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"when reading source '{}' ({}): {}",
"error when reading source '{}' ({}): {}",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that adding error to warn! logs is helpful, it just makes grep error (and especially case-insensitive search in e.g. less) less useful/more confusing. This will currently show up as WARN <mod> when reading source....

I agree that this is a sentence fragment, though.

@quodlibetor
Copy link
Contributor Author

is the reason a list error is fatal but a get error isnt basically that we expect get's to sometimes fail as we try to obtain all objects in a bucket, but we expect the bucket to be available?

Yes, I think that that's the original motivation. I'm not 100% convinced that it's the right decision, but right now there's no way to get a dataflow out of an error state, so I'm hesitant to error one just because any given single item was missing. If we implemented MaterializeInc/database-issues#2060 it would be possible for people to be more precise about what they want to ingest than currently, which would make it more reasonable to error on anything invalid.

@quodlibetor quodlibetor merged commit 0465c65 into MaterializeInc:main Jan 20, 2022
@quodlibetor quodlibetor deleted the s3-detailed-error-messages branch January 20, 2022 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants