Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce project event webhook #1113

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Conversation

window9u
Copy link
Contributor

@window9u window9u commented Dec 26, 2024

What this PR does / why we need it:

This PR implements document creation and document removal events for a new project event webhook. Currently, users who uses Yorkie as a SaaS are not always aware of document status changes. By introducing project-level events—rather than relying solely on doc-level events like attach or push-pull—we can more clearly represent the lifecycle of documents and make those events available to external systems.

Which issue(s) this PR fixes: Fixes #1002

Special notes for your reviewer:

  1. Project event webhook
  • The original issue suggested a DocEvent, but instead, I'm introducing a Project Event. We already have DocEvent events (DocumentChanged DocumentWatched, DocumentUnwatched, broadcast), and using the same naming might be confusing.
  • Additionally, doc-level events(e.g., attach, document-changed, watched, unwatched) don't clearly convey CRUD semantics. For instances:
    • Create: A document is automatically created when a user attaches to a non-existent document. documentAttached alone doesn't explicitly indicate creation.
    • Read/Update: push-pull may imply either reading or updating a document, and it's not always obvious which. (A Change can contain both presence and operations; if it only has presence, it might not mean a real update of document's content.)
    • Delete: Yorkie's removeDocument function only marks a document as "removed" without fully deleting it. In contrast, the removeDocumentByAdmin function permanently deletes the document. When a removeDocument action occurs, Yorkie sends webhook events, allowing the customer's server to decide the next steps—such as moving the document to a recycle bin or directly deleting it using removeDocumentByAdmin.
  1. Add HMAC signature in request header
  • For the authentication webhook, we provide a token so the receiver can validate the request. However, for the project event webhook, the endpoint may not know which service is calling.
  • To address this, I introduced an HMAC signature in the X-Signature header, generated using the project's secret key. The receiving server can verify this signature to ensure the request is from Yorkie.
  • In another approach, we could set our public key in projectKey in our request attributes.
type WebhookAttribute struct {
        ProjectKey string `json:"projectKey"`
	DocumentKey string `json:"documentKey"`
	ClientKey   string `json:"clientKey"`
	IssuedAt    string `json:"issuedAt"`
}
  1. Handling timeouts and unreachable events

    • Asynchronous work: Authentication must be verified before proceeding, so the auth webhook is synchronous. In contrast, a project event occurs after some action is completed, so it can be handled asynchronously without impacting user response times.
    • Unreachable events and replay: If an endpoint is down or encounters an error, we need a way to track failed events. In this PR, we only log these failures; in the future, the Yorkie dashboard could handle or additional recovery steps.
    • More relaxed timeouts: Currently, the auth webhook uses exponential backoff with a base of 200ms and a maximum wait of 3 seconds. This PR applies the same settings for consistency, although we might later adopt a more extended retry schedule(e.g., Liveblocks' approach: [immediately, 5s, 5m, 30m, 2h, 5h, 10h], etc.).
  2. Add an HTTP client with timeout

  • Since http.POST does not allow specifying a request timeout directly, using an HTTP client with a configurable timeout is more predictable. Although the auth webhook still uses http.POST, we may consider migrating it to an HTTP client for consistency.

In this PR

This design proposal differs somewhat from the original issue’s requirements. If the idea seems sound, I’ll proceed with the remaining tasks:

  • Add config validation and tests
  • Add event_webhook tests
  • Update the admin server’s UpdateProjectRequest API spec
  • Integrate the auth webhook’s exponential backoff logic into pkg/webhook

In a subsequent PR

  • Introduce a document-changed event (with caching)

Does this PR introduce a user-facing change?


Additional documentation

There are four configuration options related to timeouts, which can be confusing. Below is a brief explanation:

  • EventWebhookRequestTimeout: How long to wait for a response before timing out
  • EventWebhookMaxRetries: Number of retry attempts
  • EventWebhookBaseWaitInterval: Base interval for exponential backoff
  • EventWebhookMaxWaitInterval: Maximum interval between retries

image

N/A

Checklist

  • Added relevant tests or not required
  • Addressed and resolved all CodeRabbit review comments
  • Ensured no breaking changes

Summary by CodeRabbit

  • New Features

    • Added event webhook functionality for project documents.
    • Introduced support for tracking document creation and removal events.
    • Implemented HMAC-based authentication for webhook requests.
    • Added configurable webhook retry and timeout mechanisms.
  • Improvements

    • Enhanced server configuration with new webhook-related parameters.
    • Implemented exponential backoff for webhook retries.
    • Improved error handling and logging for webhook events.

Copy link

coderabbitai bot commented Dec 26, 2024

Walkthrough

This pull request introduces an event webhook system for Yorkie, enabling notifications to external services about document creation and removal events. It includes modifications across multiple files, adding new fields and methods for managing webhooks in the Project struct, defining project event types, and implementing HMAC-based authentication for webhook requests. Configuration options for event webhooks are also added, along with retry mechanisms for handling webhook calls.

Changes

File Change Summary
api/types/project.go Added EventWebhookURL and EventWebhookMethods fields to Project struct, with a new RequireEventWebhook method.
api/types/project_event.go Introduced new types for project events: ProjectEvent, WebhookAttribute, and WebhookRequest.
cmd/yorkie/server.go Added configuration variables and command-line flags for event webhook parameters.
pkg/webhook/client.go Created HMAC-authenticated HTTP client for webhook requests.
pkg/webhook/retry.go Implemented exponential backoff retry mechanism for webhook calls.
server/backend/config.go Added event webhook configuration fields and parsing methods.
server/config.go Defined default values for event webhook configuration.
server/rpc/projectevent/webhook.go Implemented webhook event sending for document creation and removal.
server/rpc/yorkie_server.go Added event triggers for document creation and removal.

Assessment against linked issues

Objective Addressed Explanation
Provide DocEvent Webhook [#1002]
Notify external services on document events
Support Webhook registration per project
HMAC-based authentication
Retry mechanism for webhook calls

Possibly related PRs

Suggested reviewers

  • hackerwins

Tip

CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command @coderabbitai generate docstrings to have CodeRabbit automatically generate docstrings for your pull request. We would love to hear your feedback on Discord.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@window9u
Copy link
Contributor Author

window9u commented Dec 28, 2024

I notice that the RemoveDocument feature has more context discussed in issue #484. In that PR, three tasks were outlined:

  1. Implement a basic document deletion API that sets the RemovedAt date in docInfo.
  2. Decide whether to include removed documents in the ListDocuments API, which is used by the Dashboard and CLI (Admin).
  3. Create a housekeeping background process to physically delete documents (is_removed = true) and their related data.

The first task has been completed, and the second task has been worked on #563. but third task is ongoing in PR #718.

I'm currently considering when we should publish the documentRemoved webhook. This depends on whether we plan to support document revival (i.e., setting is_removed = false before physical deletion).

Two possible approaches:

  1. Trigger the webhook during soft deletion

    • If we don't plan to support document revival, this makes sense as it allows for faster updates.
  2. Trigger the webhook during hard (physical) deletion

    • If we do plan to support document revival, this approach is more logical, as triggering the webhook during soft deletion could cause confusion.

@hackerwins hackerwins marked this pull request as ready for review December 29, 2024 01:24
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (19)
api/types/project_event.go (3)

20-21: Clarify the docstring or remove the duplication
You've declared the type ProjectEvent at line 21 and also have a comment at line 20 that says "ProjectEvent represents the event of the project." The same comment repeats at line 23. Consider consolidating the comment sections to avoid duplicate docstring for the same entity.


25-27: Extend or future-proof event enums
Currently, only two constants DocumentCreated and DocumentRemoved are declared. Before merging, consider whether future events (e.g., DocumentUpdated, DocumentAccessed) might soon be needed. It may be beneficial to outline how to add additional ProjectEvent constants in the future.


29-34: Consider adding validation logic
WebhookAttribute includes DocumentKey, ClientKey, and IssuedAt, which are all strings. Depending on your use cases, it might be beneficial to apply constraints or validations (for example, ensuring IssuedAt is a valid timestamp).

pkg/webhook/client.go (4)

31-35: Lock down or rotate private key
HMACTransport holds a PrivateKey as a simple string. In production, consider stricter key management practices, such as using environment variables or credential managers (e.g. Vault).


37-43: Ensure constant-time comparison
GenerateHMACSignature uses a standard HMAC approach. For improved security, ensure you’re also using a constant-time string comparison function (like hmac.Equal) whenever you verify signatures. This is less about generation and more about how you will compare the header on the receiver side.


49-55: Handle large request bodies carefully
Reading the full request body (io.ReadAll) into memory can be risky for large payloads. Consider setting a maximum limit or using a streaming approach if you anticipate very large payloads to avoid potential memory pressure.


72-78: Allow specifying a custom transport
NewClient always wraps http.DefaultTransport inside HMACTransport. If a user wants to customize TLS settings, proxies, or other details, they can’t easily plug in their own transport. Consider adding a parameter or exposing a method to inject a custom underlying transport.

api/types/project.go (1)

83-100: Consider more expressive naming for method
RequireEventWebhook returns a boolean but doesn't convey the specific reason for requiring the webhook. If expanded to handle multiple event types or statuses, a more descriptive name like ShouldTriggerEventWebhook could make the intent clearer.

pkg/webhook/retry.go (2)

37-64: Distinguish between both HTTP and transport errors
WithExponentialBackoff merges status code checks with error checks, returning either ErrUnexpectedStatusCode or the underlying error. This is reasonable but consider a structured approach to error handling (e.g., typed errors), which would help when diagnosing failures (like DNS resolution errors, timeouts, etc.) vs. unexpected status codes.


66-74: Add jitter to exponential backoff
Currently, waitInterval doubles the wait time. For production systems, especially if many clients or goroutines use the same retry logic, consider adding jitter to reduce the risk of thundering herd problems (when many clients retry simultaneously).

server/rpc/projectevent/webhook.go (2)

36-62: Log response body on failures
While errors are logged when JSON marshaling or sending the webhook fails, capturing and logging at least a portion of the response body in cases of non-2xx responses may help operators debug webhook-related issues more easily.

- logging.From(ctx).Error(err)
+ body, _ := io.ReadAll(resp.Body)
+ logging.From(ctx).Errorf("webhook error: %v, response: %s", err, string(body))

91-126: Improve error message consistency
Your error check treats statuses other than OK, Unauthorized, and Forbidden the same. Depending on your system’s needs, you might treat “Service Unavailable (503)” or “Too Many Requests (429)” distinctly. If you prefer generic handling, consider clarifying that these unexpected statuses are retried.

server/config.go (2)

63-71: No input validation for negative durations
Although defaults are applied, if a user manually provides negative intervals or durations, it may introduce undefined behavior. Consider adding checks to ensure that EventWebhookMaxRetries ≥ 0 and intervals ≥ 0.


201-215: Align naming consistency
Currently, some fields reference “EventWebhook,” others “ProjectWebhook” (for example, in the parsing functions). Ensure that the naming is consistent to avoid confusion when reading or configuring the code.

server/backend/config.go (1)

76-87: Clarify in-field documentation
The docstring states “is the time that waits time for response” which is a bit unclear. Consider rephrasing to “is the timeout duration for waiting for a webhook response.”

cmd/yorkie/server.go (3)

50-56: Utilize consistent naming
Variables introduced here reference “eventWebhook...”, but the user-facing flags reference “project-event-webhook...”. Aligning these might provide better clarity and ease of maintenance.


336-341: Ensure no integer overflow
eventWebhookMaxRetries is a uint64. Although unlikely, verify that excessively large values won’t cause issues (e.g., indefinite retry). A safeguard or maximum recommended value might be beneficial.


342-359: Validate non-zero intervals
Ensure that zero or negative durations for project-event-webhook-base-wait-interval, project-event-webhook-max-wait-interval, or project-event-webhook-request-timeout are handled gracefully instead of leading to potential indefinite blocking or error states.

server/rpc/yorkie_server.go (1)

161-164: Consider adding error handling for DocumentCreated.
If projectevent.DocumentCreated fails, it would be useful to capture or log the error to ensure that the system is aware of potential failures delivering creation events.

if docInfo.ServerSeq == 0 {
-    projectevent.DocumentCreated(ctx, s.backend, docInfo.Key.String(), clientInfo.Key)
+    if err := projectevent.DocumentCreated(ctx, s.backend, docInfo.Key.String(), clientInfo.Key); err != nil {
+        logging.From(ctx).Errorf("failed to create document event: %v", err)
+    }
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 88a8a64 and 8f8d142.

📒 Files selected for processing (9)
  • api/types/project.go (2 hunks)
  • api/types/project_event.go (1 hunks)
  • cmd/yorkie/server.go (3 hunks)
  • pkg/webhook/client.go (1 hunks)
  • pkg/webhook/retry.go (1 hunks)
  • server/backend/config.go (2 hunks)
  • server/config.go (2 hunks)
  • server/rpc/projectevent/webhook.go (1 hunks)
  • server/rpc/yorkie_server.go (3 hunks)
🔇 Additional comments (8)
api/types/project_event.go (1)

36-40: Check for optional vs required fields
WebhookRequest includes the Type and Attributes. Confirm if all these fields should always be present in the webhook payload. If any are optional, consider using pointers or adding default values.

api/types/project.go (1)

41-45: Maintain naming consistency
EventWebhookURL and event_webhook_url, similarly EventWebhookMethods and event_webhook_methods follow the same pattern as AuthWebhookURL and auth_webhook_url. This consistency looks good. Keep an eye on naming consistency in future expansions as well.

server/rpc/projectevent/webhook.go (2)

20-33: Consider tracking context lifespan when launching goroutines
Since you're dispatching these webhook events in a background goroutine, ensure that the passed ctx won't be canceled prematurely once the original request completes. If the ctx is tied to a client request, triggering this goroutine after the request returns could lead to unexpected cancellation.

Would you like me to generate a shell script to check goroutine usage in the codebase for potential context misuses?


64-89: Confirm the timing of "DocumentRemoved"
Given the discussion about soft vs. hard deletion in the PR objectives, please confirm that calling DocumentRemoved at this stage aligns with your intended approach. If documents can be revived, you may wish to defer this call to ensure the correct event timing.

cmd/yorkie/server.go (1)

73-75: Confirm max-wait intervals
Double-check that eventWebhookMaxWaitInterval and eventWebhookBaseWaitInterval make sense together (e.g., base <= max). It might be worth validating them or logging an error if the base is larger than the max.

server/rpc/yorkie_server.go (3)

39-39: New import is well-introduced and consistent with the rest of the file.
The import statement aligns with the newly introduced event-based architecture and seems properly scoped.


161-164: Confirm the event firing strictly on ServerSeq == 0.
This condition implies that the creation event is only triggered the first time a document is attached. If future usage requires re-creations or re-attachments, you may need an alternative condition.


566-567: Provide clarity around soft vs. hard deletion for document removal.
Currently, DocumentRemoved is triggered on any remove operation. If you plan to support soft deletions, ensure that the event’s name and timing align with actual removal semantics to avoid user confusion.

Comment on lines +76 to +89
// shouldRetry returns true if the given error should be retried.
// Refer to https://github.com/kubernetes/kubernetes/search?q=DefaultShouldRetry
func shouldRetry(statusCode int, err error) bool {
// If the connection is reset, we should retry.
var errno syscall.Errno
if errors.As(err, &errno) {
return errno == syscall.ECONNRESET
}

return statusCode == http.StatusInternalServerError ||
statusCode == http.StatusServiceUnavailable ||
statusCode == http.StatusGatewayTimeout ||
statusCode == http.StatusTooManyRequests
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add 4xx handling
shouldRetry currently returns true for only high-level server errors like 500 or 503. In certain scenarios (e.g., 429 Too Many Requests), the function also returns true, which is fine. However, you may need logic for other 4xx status codes if your service wants clients to back off, especially if a rate-limit or temporary ban is in place.

Comment on lines 190 to 221
// ParseProjectWebhookMaxWaitInterval returns max wait interval.
func (c *Config) ParseProjectWebhookMaxWaitInterval() time.Duration {
result, err := time.ParseDuration(c.EventWebhookMaxWaitInterval)
if err != nil {
fmt.Fprintln(os.Stderr, "parse auth webhook max wait interval: %w", err)
os.Exit(1)
}

return result
}

// ParseProjectWebhookBaseWaitInterval returns base wait interval.
func (c *Config) ParseProjectWebhookBaseWaitInterval() time.Duration {
result, err := time.ParseDuration(c.EventWebhookBaseWaitInterval)
if err != nil {
fmt.Fprintln(os.Stderr, "parse auth webhook max wait interval: %w", err)
os.Exit(1)
}

return result
}

// ParseProjectWebhookTimeout returns timeout for request.
func (c *Config) ParseProjectWebhookTimeout() time.Duration {
result, err := time.ParseDuration(c.EventWebhookRequestTimeout)
if err != nil {
fmt.Fprintln(os.Stderr, "parse auth webhook max wait interval: %w", err)
os.Exit(1)
}

return result
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix error message references
Lines 194, 205, and 216 incorrectly mention “parse auth webhook” in error logs—it should reference “event” or “project” webhook for clarity.

- fmt.Fprintln(os.Stderr, "parse auth webhook max wait interval: %w", err)
+ fmt.Fprintln(os.Stderr, "parse project webhook max wait interval: %w", err)

Committable suggestion skipped: line range outside the PR's diff.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
pkg/webhook/retry.go (2)

37-44: Add input validation for WithExponentialBackoff parameters.

The function should validate its input parameters to ensure proper operation.

Consider adding these checks:

 func WithExponentialBackoff(
 	ctx context.Context,
 	maxRetries uint64,
 	baseInterval,
 	maxInterval gotime.Duration,
 	webhookFn func() (int, error),
 ) error {
+	if ctx == nil {
+		return fmt.Errorf("context cannot be nil")
+	}
+	if webhookFn == nil {
+		return fmt.Errorf("webhook function cannot be nil")
+	}
+	if baseInterval <= 0 {
+		return fmt.Errorf("base interval must be positive")
+	}
+	if maxInterval < baseInterval {
+		return fmt.Errorf("max interval must be greater than or equal to base interval")
+	}

71-72: Incomplete function comment.

The comment for waitInterval is incomplete and doesn't fully describe the function's behavior.

Consider this improvement:

-// waitInterval returns the interval of given retries. it returns maxWaitInterval
+// waitInterval calculates the wait interval using exponential backoff based on the number
+// of retries. It returns the calculated interval or maxWaitInterval if the calculated
+// interval would exceed it.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8f8d142 and 34fa918.

📒 Files selected for processing (2)
  • pkg/webhook/retry.go (1 hunks)
  • server/backend/config.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • server/backend/config.go
🔇 Additional comments (3)
pkg/webhook/retry.go (3)

1-27: LGTM! Clean package structure and imports.

The file structure follows good practices with proper license headers and well-organized imports. The gotime alias is a good choice to avoid confusion in time-related operations.


29-35: LGTM! Well-defined error types.

The error variables are properly defined with clear, descriptive messages that align with their intended use cases in the retry logic.


81-94: Enhance error handling for 4xx status codes.

While the current implementation handles common server errors and rate limiting (429), it could benefit from handling additional 4xx status codes based on your webhook service's requirements.

Consider this enhancement:

 func shouldRetry(statusCode int, err error) bool {
 	var errno syscall.Errno
 	if errors.As(err, &errno) {
 		return errno == syscall.ECONNRESET
 	}
 
+	// 4xx client errors that indicate temporary conditions
+	if statusCode == http.StatusTooManyRequests ||
+		statusCode == http.StatusRequestTimeout ||
+		statusCode == http.StatusConflict {
+		return true
+	}
+
+	// 5xx server errors
 	return statusCode == http.StatusInternalServerError ||
 		statusCode == http.StatusServiceUnavailable ||
-		statusCode == http.StatusGatewayTimeout ||
-		statusCode == http.StatusTooManyRequests
+		statusCode == http.StatusGatewayTimeout
 }

Also, consider making the Kubernetes reference more specific:

-// Refer to https://github.com/kubernetes/kubernetes/search?q=DefaultShouldRetry
+// Retry logic inspired by Kubernetes' DefaultShouldRetry:
+// https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/util/retry/util.go

retries++
}

return fmt.Errorf("unexpected status code from webhook %d: %w", statusCode, ErrWebhookTimeout)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Incorrect error message for max retries exhaustion.

The error message suggests an unexpected status code but wraps ErrWebhookTimeout. This is misleading as the actual scenario is that max retries were exhausted.

Consider this alternative:

-return fmt.Errorf("unexpected status code from webhook %d: %w", statusCode, ErrWebhookTimeout)
+return fmt.Errorf("max retries (%d) exceeded with status code %d: %w", maxRetries, statusCode, ErrWebhookTimeout)

Committable suggestion skipped: line range outside the PR's diff.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
pkg/webhook/retry.go (1)

93-106: 🛠️ Refactor suggestion

Add comprehensive 4xx handling.

The shouldRetry function currently handles only specific status codes. Consider adding handling for other relevant 4xx status codes and documenting the retry policy for each status code.

Consider this enhanced implementation:

 func shouldRetry(statusCode int, err error) bool {
 	var errno syscall.Errno
 	if errors.As(err, &errno) {
 		return errno == syscall.ECONNRESET
 	}
 
+	// Retry on server errors
 	return statusCode == http.StatusInternalServerError ||
 		statusCode == http.StatusServiceUnavailable ||
 		statusCode == http.StatusGatewayTimeout ||
-		statusCode == http.StatusTooManyRequests
+		// Retry on specific client errors
+		statusCode == http.StatusTooManyRequests ||
+		statusCode == http.StatusRequestTimeout ||
+		// Consider adding other relevant status codes based on your webhook service's behavior
+		false
 }
🧹 Nitpick comments (2)
pkg/webhook/retry.go (2)

29-36: Consider grouping related errors under a custom error type.

The error declarations are well-documented and follow Go conventions. However, for better error handling by consumers, consider creating a custom error type that wraps these errors and provides type assertions.

Example implementation:

type WebhookError struct {
    Err error
    StatusCode int
}

func (e *WebhookError) Error() string {
    return e.Err.Error()
}

func (e *WebhookError) Unwrap() error {
    return e.Err
}

83-91: Improve documentation for waitInterval function.

The function comment is incomplete. It should explain the exponential backoff calculation and document the parameters.

Suggested documentation:

// waitInterval calculates the wait duration using exponential backoff.
// It doubles the wait interval for each retry attempt, starting from baseInterval.
// The returned duration is capped at maxWaitInterval.
// Parameters:
//   - retries: number of retry attempts so far
//   - baseInterval: initial wait duration
//   - maxWaitInterval: maximum wait duration
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 34fa918 and b7add32.

📒 Files selected for processing (1)
  • pkg/webhook/retry.go (1 hunks)
🔇 Additional comments (2)
pkg/webhook/retry.go (2)

1-27: LGTM! Clean package structure and imports.

The package structure follows Go conventions with proper license header and well-organized imports.


38-45: Verify webhook timeout configuration alignment.

The implementation supports the PR's goal of handling timeouts through the retry mechanism. However, let's verify that the timeout values align with the requirements specified in the PR objectives.

Run this script to check timeout configurations across the codebase:

✅ Verification successful

Let me generate another verification script to check the webhook timeout configurations more specifically:

The previous results show some timeout configurations but we need to verify specifically how they are used with the retry mechanism in the webhook package.


Let me check the full implementation of the retry mechanism:


Let me check how the retry mechanism is used in the project event webhook:


Webhook timeout configuration is properly implemented and aligned with requirements

The implementation properly handles timeouts and unreachable events through:

  1. A robust exponential backoff retry mechanism in pkg/webhook/retry.go that:

    • Takes configurable base and max intervals
    • Supports maximum retry attempts
    • Handles transient errors and specific HTTP status codes (500, 503, 504, 429)
    • Includes connection reset detection
  2. Proper webhook client configuration in pkg/webhook/client.go with:

    • Configurable request timeout
    • HMAC authentication
    • Error handling
  3. Well-structured webhook event sending in server/rpc/projectevent/webhook.go that:

    • Uses the retry mechanism with configurable parameters
    • Runs in background to avoid blocking
    • Logs failures for potential recovery
    • Handles authentication errors separately from retryable errors

The implementation aligns with the PR objectives by providing a comprehensive solution for handling timeouts and unreachable events with configurable retry parameters.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for consistency in timeout configurations

# Look for timeout-related configurations
echo "Searching for timeout configurations..."
rg -i "timeout|interval|retry" -g "*.{go,yaml,json}"

# Look for webhook client configurations
echo -e "\nSearching for webhook client configurations..."
ast-grep --pattern 'type $_ struct {
  $$$
  timeout $_ 
  $$$
}'

Length of output: 77576


Script:

#!/bin/bash
# Search for webhook timeout and retry related code
echo "Searching for webhook timeout configurations..."
rg -A 5 "EventWebhookRequestTimeout|EventWebhookBaseWaitInterval|EventWebhookMaxWaitInterval|AuthWebhookMaxWaitInterval" pkg/webhook/

# Search for retry mechanism implementation
echo -e "\nSearching for retry mechanism implementation..."
rg -A 10 "WithExponentialBackoff" pkg/webhook/

Length of output: 939


Script:

#!/bin/bash
# Get the full implementation of retry mechanism
echo "Full retry mechanism implementation:"
cat pkg/webhook/retry.go

# Check where this retry mechanism is used
echo -e "\nUsage of retry mechanism:"
rg "WithExponentialBackoff\(" --type go

Length of output: 3236


Script:

#!/bin/bash
# Check project event webhook implementation
echo "Project event webhook implementation:"
cat server/rpc/projectevent/webhook.go

# Check webhook client implementation
echo -e "\nWebhook client implementation:"
cat pkg/webhook/client.go

Length of output: 6504

pkg/webhook/retry.go Outdated Show resolved Hide resolved
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
pkg/webhook/retry.go (1)

38-81: Add input validation and improve error handling.

While the implementation is solid, consider these improvements:

  1. Add input validation for the parameters:

    • baseInterval should be > 0
    • maxInterval should be >= baseInterval
    • maxRetries should have a reasonable upper limit
  2. Consider adding more context to error wrapping for better debugging.

Here's a suggested improvement:

 func WithExponentialBackoff(
 	ctx context.Context,
 	maxRetries uint64,
 	baseInterval,
 	maxInterval gotime.Duration,
 	webhookFn func() (int, error),
 ) error {
+	if baseInterval <= 0 {
+		return fmt.Errorf("baseInterval must be positive: %v", baseInterval)
+	}
+	if maxInterval < baseInterval {
+		return fmt.Errorf("maxInterval (%v) must be >= baseInterval (%v)", 
+			maxInterval, baseInterval)
+	}
+	if maxRetries > 100 {
+		return fmt.Errorf("maxRetries (%d) exceeds maximum allowed (100)", maxRetries)
+	}
+
 	start := gotime.Now()
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b7add32 and a1a0c5d.

📒 Files selected for processing (1)
  • pkg/webhook/retry.go (1 hunks)
🔇 Additional comments (4)
pkg/webhook/retry.go (4)

17-27: LGTM! Clean package structure and imports.

The package structure is well-organized with clear naming and necessary imports. Good practice using the gotime alias to avoid confusion with the time package.


29-36: LGTM! Well-defined error types.

The error variables are well-documented and follow Go conventions. They provide clear distinction between different error scenarios.


83-91: LGTM! Correct implementation of exponential backoff calculation.

The waitInterval function correctly implements exponential backoff with proper boundary checking.


93-106: Enhance error handling in shouldRetry function.

While the current implementation covers basic scenarios, consider enhancing it with:

  1. Additional network-related errors:

    • syscall.ETIMEDOUT
    • syscall.ECONNREFUSED
    • syscall.ENETUNREACH
  2. Common HTTP client errors that might warrant retries:

    • http.StatusBadGateway (502)
    • http.StatusNotExtended (510)

Here's a suggested improvement:

 func shouldRetry(statusCode int, err error) bool {
 	var errno syscall.Errno
 	if errors.As(err, &errno) {
-		return errno == syscall.ECONNRESET
+		switch errno {
+		case syscall.ECONNRESET,
+			 syscall.ETIMEDOUT,
+			 syscall.ECONNREFUSED,
+			 syscall.ENETUNREACH:
+			return true
+		}
 	}
 
 	return statusCode == http.StatusInternalServerError ||
 		statusCode == http.StatusServiceUnavailable ||
 		statusCode == http.StatusGatewayTimeout ||
-		statusCode == http.StatusTooManyRequests
+		statusCode == http.StatusTooManyRequests ||
+		statusCode == http.StatusBadGateway ||
+		statusCode == http.StatusNotExtended
 }

Note: This comment partially addresses the previous review comment about 4xx handling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide DocEvent Webhook
1 participant