Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent leaking file descriptor during snapshotting and provide better logging of errors #19093

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

serathius
Copy link
Member

cc @ahrtr @fuweid @ivanvc

Found case where a leaked file descriptor prevents data being deleted from disk and filling up the disk.

Would like to backport it to all v3.5 and v3.4

Copy link

codecov bot commented Dec 20, 2024

Codecov Report

Attention: Patch coverage is 47.61905% with 11 lines in your changes missing coverage. Please review.

Project coverage is 68.78%. Comparing base (9fa35e5) to head (a1d5008).

Files with missing lines Patch % Lines
client/v3/snapshot/v3_snapshot.go 47.61% 7 Missing and 4 partials ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
client/v3/snapshot/v3_snapshot.go 52.72% <47.61%> (-5.42%) ⬇️

... and 24 files with indirect coverage changes

@@            Coverage Diff             @@
##             main   #19093      +/-   ##
==========================================
- Coverage   68.85%   68.78%   -0.08%     
==========================================
  Files         420      420              
  Lines       35642    35654      +12     
==========================================
- Hits        24541    24523      -18     
- Misses       9676     9701      +25     
- Partials     1425     1430       +5     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9fa35e5...a1d5008. Read the comment docs.

@ahrtr
Copy link
Member

ahrtr commented Dec 20, 2024

We need to close all previous similar PRs, one of which is #18200

if err != nil {
return "", fmt.Errorf("could not open %s (%w)", partpath, err)
}
defer func() {
err = f.Close()
if err != nil && !errors.Is(err, os.ErrClosed) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works. But I think a better and more explicit way is to use sync.Once to ensure f.Close is only executed once. So that it's well aware that we need to close the file before renaming, but also need to ensure it's closed in case early return due to any error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked https://pkg.go.dev/os#File.Close and the api can guarantee it can be executed only once. sync.Once is not neccesary here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal, please feel free to merge this PR and backport to 3.5 and 3.4.

@k8s-ci-robot
Copy link

@serathius: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-etcd-integration-1-cpu-amd64 a1d5008 link true /test pull-etcd-integration-1-cpu-amd64

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Member

@fuweid fuweid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -98,5 +119,5 @@ func SaveWithVersion(ctx context.Context, lg *zap.Logger, cfg clientv3.Config, d
return resp.Version, fmt.Errorf("could not rename %s to %s (%w)", partpath, dbPath, err)
}
lg.Info("saved", zap.String("path", dbPath))
return resp.Version, nil
return resp.Version, err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious about using err here. nil is more accurate

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fuweid, serathius

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants