-
Notifications
You must be signed in to change notification settings - Fork 704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NEW NATS API Drain for durable consumer #1673
Comments
I'm dealing with the same exact issue. |
Hello, we are dealing with the same problem on production. It's causing problems |
Thanks for detailed report! Please note, that you could not have the same behavior in the old API, as it did not support Consume for pull consumers. It only had Fetch. The behavior you describe was indeed a bug in the server that was fixed in one of the latest version of the server. @piotrpio can provide details and PR links. Please try 2.10.17 and 1.36.0 client. @JakubSchneller @krizacekcz what version of the server are you running? Can you provide code samples? |
Hello @jamm3e3333, thank you for creating the issue. In addition to what @Jarema mentioned, However, what we can and should add and it should solve your issue, is a method on It will loos something like this (method name may be different, not sure yet): go func(cctx jetstream.ConsumeContext) {
<-ctx.Done()
fmt.Println("DRAIN")
cctx.Drain()
<-cctx.Closed()
}(cctx) This will also allow you to add a timeout with simple Would that solution satisfy your use case? |
@piotrpio you answered in the wrong thread my boy |
@piotrpio this functionality that would notify over the channel about drain state will be part of next minor/patch ? |
Oh, sorry about that, I started answering on the other issue but wanted to mention @Jarema's comment 🤦 Yes, that would be a part of the next minor release. |
In our server on production we use 2.10.12.. is it possible that it was fixed in patch @piotrpio ? However we tried locally with updated versions and the outcome was the same. We tried to publish 3 messages, we manually force graceful shutdown (with Drain attached to the ctx.Done) and after we restart the app, we either have duplicates messages or messages are lost completely. Could u give us some support please ? |
@piotrpio it is a bit misleading because the documentation for that function states that
Which can be understood as the |
@Jarema I tried with 2.10.17 server and 1.36.0 client, but I observer kinda the same behaviour and eventually messages lost after couple of app restarts. As you said:
That's true but the draining the subscription worked properly and now it behaves kinda random. And regarding the comparison of new and old NATS API why when using Anyways that's what I observed from logs: 14:01:53 INF NATS has been connected!
MSG_HANDLE_START sub.2d229b4e-d3bf-4097-a11b-2bdd08af2589
MSG_HANDLE_START sub.e3798536-af06-4952-a81f-6e26f75be290
MSG_HANDLE_START sub.82e0ce01-08d1-4a5e-9413-df331b0c5726
DRAIN
MSG_NAK_OK sub.e3798536-af06-4952-a81f-6e26f75be290
14:02:02 INF shutdown signaled
MSG_NAK_OK sub.2d229b4e-d3bf-4097-a11b-2bdd08af2589
MSG_NAK_OK sub.82e0ce01-08d1-4a5e-9413-df331b0c5726
MSG_HANDLE_START sub.e3798536-af06-4952-a81f-6e26f75be290
MSG_HANDLE_START sub.82e0ce01-08d1-4a5e-9413-df331b0c5726
MSG_HANDLE_START sub.2d229b4e-d3bf-4097-a11b-2bdd08af2589
MSG_NAK_OK sub.2d229b4e-d3bf-4097-a11b-2bdd08af2589
MSG_NAK_OK sub.e3798536-af06-4952-a81f-6e26f75be290
MSG_NAK_OK sub.82e0ce01-08d1-4a5e-9413-df331b0c5726
14:02:02 INF shutdown completed!
Process Exit with Code 0
building...
running...
14:02:07 INF NATS has been connected!
MSG_HANDLE_START sub.2d229b4e-d3bf-4097-a11b-2bdd08af2589
MSG_HANDLE_START sub.e3798536-af06-4952-a81f-6e26f75be290
MSG_HANDLE_START sub.82e0ce01-08d1-4a5e-9413-df331b0c5726
14:02:07 INF start listening GRPC on port 50051
DRAIN
14:02:17 INF shutdown signaled
MSG_NAK_OK sub.e3798536-af06-4952-a81f-6e26f75be290
MSG_NAK_OK sub.82e0ce01-08d1-4a5e-9413-df331b0c5726
MSG_NAK_OK sub.2d229b4e-d3bf-4097-a11b-2bdd08af2589
14:02:17 INF shutdown completed!
building...
running...
14:02:21 INF NATS has been connected!
MSG_HANDLE_START sub.2d229b4e-d3bf-4097-a11b-2bdd08af2589
MSG_HANDLE_START sub.82e0ce01-08d1-4a5e-9413-df331b0c5726
MSG_HANDLE_START sub.e3798536-af06-4952-a81f-6e26f75be290
DRAIN
MSG_NAK_OK sub.82e0ce01-08d1-4a5e-9413-df331b0c5726
MSG_NAK_OK sub.e3798536-af06-4952-a81f-6e26f75be290
MSG_NAK_OK sub.2d229b4e-d3bf-4097-a11b-2bdd08af2589
14:02:31 INF shutdown signaled
MSG_HANDLE_START sub.2d229b4e-d3bf-4097-a11b-2bdd08af2589
MSG_HANDLE_START sub.e3798536-af06-4952-a81f-6e26f75be290
MSG_NAK_OK sub.2d229b4e-d3bf-4097-a11b-2bdd08af2589
MSG_NAK_OK sub.e3798536-af06-4952-a81f-6e26f75be290
MSG_HANDLE_START sub.82e0ce01-08d1-4a5e-9413-df331b0c5726
MSG_HANDLE_START sub.e3798536-af06-4952-a81f-6e26f75be290
MSG_HANDLE_START sub.2d229b4e-d3bf-4097-a11b-2bdd08af2589
14:02:31 INF shutdown completed!
building...
running...
14:02:35 INF NATS has been connected!
DRAIN
14:02:45 INF shutdown signaled
14:02:45 INF shutdown completed! After the context is canceled (app shut down) messages are or maybe we can contribute and try to fix it ourselves 🤷 |
@jamm3e3333 v2.10.17 fixed some issues with
It is not random, it's non blocking. It is non-blocking in the old API as well, you simply do not have callback-based pull consumer so you end up calling In the new API you do not have @thomas-maurice I will try to rephrase documentation to make it more clear (although I do not think the doc implies that To summarize:
|
Would you have some time for the quick call? Because it's behavior is really weird. I think you either come and see the problem, or you will have some feedback for next release to fix something. |
I see, but that way for new NATS API you are not able to handle the graceful shutdown with Fetch, because when you call fetch you get some batch of messages and then the app is shut down and you're not able to |
Im interested to see if this is correct. So not sure but it looks like Consume just does normal Subscribe on the nats connections. Line 256 in 97e6a52
So doesnt that mean all the messages that were published in response to the internal pull request is just handled by a normal Subscriber. Which then means the nc.Drain would infact wait for the messages received to be handled before moving to closed state? Of course you would need to use wg.Done inside the nats.ClosedHandler |
Yes, that is something you could do (as you said, in conjunction with We're thinking of adding a In the meantime, I created a PR adding a |
Observed behavior
I'm using a durable consumer according to the docs and I wanna gracefully shutdown the subscription to durable consumer. That means putting subscription into the
drain
state,Nak
all the messages that are in the pull batch and stop accepting new messages that are coming to the stream and stop sending it to the consumer. The behaviour that I'm observing is strange because every time I try to "restart" the application (sendingSIGINT
signal) it really doesNak
all the messages for the subscription to the durable consumer, but when I start the application again not all theNak
ed messages are consumed again, some are duplicated and overall the behaviour is pretty strange.I use the the consumer from the new nats API from the jetstream package and the only time I can drain the subscription is when having the
jetstream.ConsumeContext
orjetstream.MessageContext
that means only when calling theConsume
method on the consumer orMessages
method on the consumer. But maybe I'm doing something wrong, not sure exactly.Strange thing is that for the all NATS API for the it worked perfectly fine.
This is my code and what I've tried
and how did I test it is that on app start I published 3 unique message:
then after 10s app was shut down and ctx done was being propagated, then I restarted the app observed the logs and do this couple of times and these are the logs from the app:
I thought that maybe sometimes messages are
Nak
ed before the subscription is put into the "drain" state, I thought that this might be caused because I'm handling ctx.Done() on 2 places so I did a little changes with just one place of handling ctx.Done():but without luck and the behaviour was the same as before.
It seems to me that the subscription is not really in the drain state because after
Nak
the message is being handled again, but after 1st restart it seems that all 3 messages was consumed again, but after few more restarts messages are being duplicated or lost.Not sure if I'm doing smth wrong, but I'm unable to find anything related to Drain and new nats api.
I even tried with
Messages
method:but I observed the same behaviour. Basically I have just 2 ways where I can call the Drain method, when I have
ConsumeContext
andMessageContext
. Can you please check that 🙏I did the same implementation with the old NATS API and it worked as expected.
Expected behavior
On app shutdown, when the subscription for durable consumer is in the
Drain
state and INak
the messages I expect the messages to beNak
ed once and no new messages are being handled for the subscription and after app restart all messages are consumed again and no messages are lost for the subscription. That's what I really wish for.Server and client version
server:
2.9.15
client:
v1.35.0
IMO the server version shouldn't matter that much because I observe different behaviour for new nats api and old nats api with the same server version.
Host environment
No response
Steps to reproduce
No response
The text was updated successfully, but these errors were encountered: