-
Notifications
You must be signed in to change notification settings - Fork 523
ship nfs-ganesha V6 in main and squid #2246
Comments
So what exactly is the failure? Log messages? |
I see the log message in rook/rook#14771 |
The suggested fix, as I understand it, was to add a configuration setting to ignore failure of the system call that could be used by users of nfs-ganesha. Again, as I understand it, this means adding something to the nfs-ganesha deploy for ceph. I proposed another possible fix for the issue, which was to add parameters to the invocation of the nfs-ganesha container to allow the system call to penetrate the seccomp jail. Both of these seem to require further changes to Ceph nfs-ganesha deployment before we can integrate v6.x. This leads me to further believe that there's been no testing of v6.x on containerized Ceph (by far the majority deployment strategy for Ceph upstream and the only one for downstream). |
It has been years since Ganesha has got meaningful testing from various adopters prior to a release tag, so it is not at all surprising that V6.0 has problems. This issue has been identified. One fix went into downstream that was accepted for upstream, but before it was merged upstream, another adopter also hit the problem and proposed a different fix. What has ultimately gone upstream into V6.1 is a fix agreed on by participants in a Ganesha community call last week. This fix does require a config change when operating in a non-privileged container. That config change allows Ganesha to try the prctl, and if it fails with EPERM, to issue a warning message but proceed to start. The prctl call was added in Ganesha V6-dev.19. The prctl call is necessary to prevent a kernel deadlock if there is a local NFS mount using Ganesha as the server. The deadlock can occur if pages need to be flushed via NFS to allow Ganesha to allocate pages to proceed. Since Ganesha is blocked waiting for free pages, it can not serve the request, via the NFS client, to flush the pages necessary to be able to allocate pages. One thing that isn't clear is if Ganesha running in a container while some other container on the host has an NFS mount, or even a non-containerized NFS mount could also trigger the problem. That may well be the case. If the prctl is required for containerized Ganesha, then either prctl needs to be allowed for that container, or maybe there's some way to do the prctl for the container as a whole. In any case, setting the parameter to true will allow Ganesha to proceed without prctl success, and it will be in the same boat as it was pre-V6-dev.19. |
Yep. So the new version will require a change in ceph deployment code before it can even run, and it's untested-in-Ceph code beyond that deployment change. (To be clear I'm talking about at least cephadm/ceph orchestrator; I don't even know if we currently support deploying on non-cephadm deploys.) This PR cannot be accepted until those changes, and perhaps at least some kind of testing, are complete. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions. |
Hi @guits
|
@ktdreyer As far as I know it's still the case that we have to change something to add allow_set_io_flusher_fail=true to the nfs-ganesha config /etc/ganesha/ganesha.conf. I'm not exactly sure what agent is going to have to change to set this, probably cephadm, which already knows about ganesha.conf, but without it, the service will fail to start in a container because it can't do a prctl(PR_SET_IO_FLUSHER). |
...and yes, of course, for CI builds we'll also need to change ceph.git's container/Containerfile |
Yes, nothing is changed there.
…On Fri, Oct 25, 2024 at 1:15 PM Dan Mick ***@***.***> wrote:
@ktdreyer <https://github.com/ktdreyer> As far as I know it's still the
case that we have to change something to add allow_set_io_flusher_fail=true
to the nfs-ganesha config /etc/ganesha/ganesha.conf. I'm not exactly sure
what agent is going to have to change to set this, probably cephadm, which
already knows about ganesha.conf, but without it, the service will fail to
start in a container because it can't do a prctl(PR_SET_IO_FLUSHER).
—
Reply to this email directly, view it on GitHub
<#2246 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADBQU536TLUQXRSTOXITNDZ5KRGNAVCNFSM6AAAAABPHZE4SGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZYG4YTIOJQGE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
@ffilz , would you please open a ticket at https://tracker.ceph.com/ so that the cephadm developers understand what changes to make in Ceph's installer? (Here's an example ticket you can use for inspiration: https://tracker.ceph.com/issues/65144 , it was under "Project: Orchestrator") |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions. |
@kalebskeithley and @adk3798 met with me today about this. I learned that ceph/ceph#60077 has been in For the |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation. |
nfs-ganesha V6.0 had issues in the squid container, referenced at #2244
Would you please ship the latest V6 version that has the fixes we need?
The text was updated successfully, but these errors were encountered: