You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On our ZYRE production server, once in a while, we observe never-closed sockets.
Sometimes, it goes up to 200 sockets to the same remote ZYRE node.
Environment
libzmq version (commit hash if unreleased): 3.4
OS: reproduced on
Linux CentOS (32 & 64 bits - x86 and ARM),
Rocky (64 bits) (x86)
Minimal test code / Steps to reproduce the issue
Start ZYRE node A
Start ZYRE node B
On node A, 2 TCP sockets are seen with Node B:
Node A connected to Node B (used to send data to B).
Node B connected to Node A (used to receive data from B).
Node B goes offline (out of WIFI coverage, Ethernet cable unplugged, Windows hybernation, ...)
On node A, after some time, the ZYRE layer detects that node B is no more present and the PEER B is destroyed with the socket to it (node A to B).
What's the actual result? (include assertion message & call stack if applicable)
Socket from node B to node A is never closed, even if
node B application is restarted or
node B is rebooted.
Note:
This is not visible if application on node B is properly stopped (thx to TCP layer for sending TCP RESET).
What's the expected result?
Sockets from remote nodes should be automatically closed when the remote disappear:
Either the ZYRE peer destruction should do,
Use of TCP KEEPALIVE from the ZYRE application,
I failed to have a working implementation in any of those 2 cases.
Possible solution
I digged into LIBZMQ and ZYRE for quite some time.
I tried different approaches, but I always failed to get an access to the ACCEPT()ed socket
in this particular scenario.
Finally, I have a 'draft' possible workaroung, that enables TCP KEEPALIVE right after a particular ACCEPT() in tcp_listener.cpp.
Basically, the idea is like:
sock = accept(s_);
...
tune_tcp_keepalives(sock, x, y, y);
The text was updated successfully, but these errors were encountered:
@keith-dev I can provide a draft PR this afternoon, with what we currently use for a few months.
It works for us, but might require more attention/suggestions.
Issue description
On our ZYRE production server, once in a while, we observe never-closed sockets.
Sometimes, it goes up to 200 sockets to the same remote ZYRE node.
Environment
Minimal test code / Steps to reproduce the issue
What's the actual result? (include assertion message & call stack if applicable)
Socket from node B to node A is never closed, even if
Note:
This is not visible if application on node B is properly stopped (thx to TCP layer for sending TCP RESET).
What's the expected result?
Sockets from remote nodes should be automatically closed when the remote disappear:
I failed to have a working implementation in any of those 2 cases.
Possible solution
I digged into LIBZMQ and ZYRE for quite some time.
I tried different approaches, but I always failed to get an access to the ACCEPT()ed socket
in this particular scenario.
Finally, I have a 'draft' possible workaroung, that enables TCP KEEPALIVE right after a particular ACCEPT() in
tcp_listener.cpp
.Basically, the idea is like:
The text was updated successfully, but these errors were encountered: