-
Notifications
You must be signed in to change notification settings - Fork 302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update docs and mention explicit close required change in v1.8.11 #445
Comments
Oh no :( Can you post a goroutine trace of the leaked goroutines? |
maybe You can give me some pointers how I could do it easiest and in a form most useful for You? |
If you send your program a SIGQUIT it should dump a stack of all active goroutines. That would help lots. If you cannot do that, you can use delve to attach to a running go program and get a trace of all goroutines. I think net/http/pprof also supports a debug endpoint that prints a trace of all goroutines. All equally useful options, please pick whichever is most convenient for you and get back to me as soon as possible. |
Also just to note we do ensure there are no hanging goroutines after running all tests. So whatever it is, it's something not coming up in the tests. How do you know it's a goroutine leak actually? Could just be some memory being held somewhere indefinitely somehow. |
Actually idk how that could be either. Def need a trace or further info to debug this. |
Tnx. I'll try to look into this. I know this by Prometheus/Grafana metrics we have / are tracking. With newest patch version this drop was not there like it should be, it was pretty much flatlined. EDIT |
I let it run in test litte-bit and I already see patterns that goroutines tend to go up. I profiled it in a moment where there were I added part of I let it run more and will see if that number goes more up. On Monday I get newer numbers. |
What kind of goroutines profile is that? What do the percentages mean? It would help tons if you could get the stacks of those timeoutLoop goroutines. I looked over the code and it isn't clear how it could be getting stuck |
I'm using https://pkg.go.dev/net/http/pprof I took new profile and now it's 123 with no actual connections. So it's steadily growing in test environment. I added latest Then You can check it out via browser. I can also sent that snapshot from other profile types like heap, mutex etc. All what is supported by Let me know 🙏 Also next week I will debug |
It's confusing but there are multiple goroutine profiles. Can you go to |
One possible scenario in which I can see this behavior arising is if you're not calling Though the docs have always read:
So this behavior was never to be relied on. Though I can see how that's confusing with the next line:
I'll remove this line from the docs. |
The quoted docs are from: https://godocs.io/nhooyr.io/websocket#Conn |
You're right. If connection was closed from client side and main read loop ended because of that then on server side we did not explicitly close as without it also it worked and released resources on it's own. Also as maybe I'm not the only one and previous behavior was not intentional, but it still was there then would be wise to mention it in release notes also? |
Awesome! Yup will add to the release notes thanks @marekmartins. |
Letting know that seems that adding explicit |
Awesome good to hear! |
Problem
In our production systems we upgraded to latest patch version and we noticed excessive memory usage, seemed to be goroutine leak as far as I checked. Memory went up but when connections count went down, memory stayed roughly at the same level.
Probably on
close
etc it was not cleaned up properly, but I did not dig any further - for now we rolled back to versionv1.8.10
and all is back to normal.Our use-case:
We have roughly opened 50k websockets on peak time at day, 500 messages pushed per second, 30 new websockets made per second, 30 old websockets closed per second.
Also for each websocket connection we execute in separate goroutine
PING-PONG
er to prevent websockets closing for clients under some circumstances.Maybe it's some edge case with our usage, but for us it seems regression introduced.
If I can help anyhow more then let me know 🙏
The text was updated successfully, but these errors were encountered: