Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Error 110 while writing to socket. Connection timed out." With kombu 4.4.0/4.5.0 and redis 3.2.0/3.2.1 #1019

Closed
alexandre-paroissien opened this issue Mar 4, 2019 · 53 comments · Fixed by #1113 or #1122

Comments

@alexandre-paroissien
Copy link

alexandre-paroissien commented Mar 4, 2019

I was previously on kombu 4.3.0 and redis 2.10.6, and after upgrading to kombu 4.4.0/4.5.0 and redis 3.2.0/3.2.1 I noticed a new issue on my Django REST endpoints:

Error 110 while writing to socket. Connection timed out.

These endpoints never had any issues before and this issue popped up only once since I upgraded, but happened on both endpoints simultaneously

Edit: Bug still there in most recent versions of the libraries (celery 4.3.0, kombu 4.5.0, redis 3.2.1)
(I also had the bug when just upgrading kombu/redis and keeping celery at 4.2.2)
Bug is not there in celery 4.2.2, kombu 4.3.0, redis 2.10.6

2019-05-21T07:51:02.022118+00:00 app[worker.1]: [2019-05-21 07:51:02,021: ERROR/ForkPoolWorker-5] Connection to Redis lost: Retry (0/20) now.
2019-05-21T07:51:02.024750+00:00 app[worker.1]: [2019-05-21 07:51:02,024: ERROR/ForkPoolWorker-5] Connection to Redis lost: Retry (1/20) in 1.00 second.
2019-05-21T07:51:03.028332+00:00 app[worker.1]: [2019-05-21 07:51:03,028: ERROR/ForkPoolWorker-5] Connection to Redis lost: Retry (2/20) in 1.00 second.
2019-05-21T07:51:04.032513+00:00 app[worker.1]: [2019-05-21 07:51:04,032: ERROR/ForkPoolWorker-5] Connection to Redis lost: Retry (3/20) in 1.00 second.
2019-05-21T07:51:05.037741+00:00 app[worker.1]: [2019-05-21 07:51:05,037: ERROR/ForkPoolWorker-5] Connection to Redis lost: Retry (4/20) in 1.00 second.
2019-05-21T07:51:06.041513+00:00 app[worker.1]: [2019-05-21 07:51:06,041: ERROR/ForkPoolWorker-5] Connection to Redis lost: Retry (5/20) in 1.00 second.
2019-05-21T07:51:07.045367+00:00 app[worker.1]: [2019-05-21 07:51:07,045: ERROR/ForkPoolWorker-5] Connection to Redis lost: Retry (6/20) in 1.00 second.
2019-05-21T07:51:08.048339+00:00 app[worker.1]: [2019-05-21 07:51:08,048: ERROR/ForkPoolWorker-5] Connection to Redis lost: Retry (7/20) in 1.00 second.
2019-05-21T07:51:09.052390+00:00 app[worker.1]: [2019-05-21 07:51:09,052: ERROR/ForkPoolWorker-5] Connection to Redis lost: Retry (8/20) in 1.00 second.
@nyhobbit
Copy link

I confirm that I encountered the same issue and also Error 32 (broken pipe) AS PER #1018 after upgrading to the combination of:

  • celery 4.3.0 RC2
  • kombu 4.4.0
  • redis 3.2.0

The errors are intermittent: certain transactions work out and write into redis just fine, others break down. The stack traces when errors occur are identical to the ones from the OP.
I have not been able to uncover any specifics about which transactions are likely to fail and which are likely to work. The entries I write in redis are all very similar, and it looks like sometimes the same entry that had broken down earlier get written just fine later.

The most up-to-date combination of libraries that seems to guarantee the disappearance of these issues is to my knowledge as follows:

  • celery 4.2.1
  • kombu 4.3.0
  • redis 2.10.6

However, I would really like to be able to upgrade to redis >= 3.0 without losing this guarantee. If you could publish a 4.2.x celery version that supports redis 3.x and is stable with respect to #1018 and #1019 it would be greatly appreciated.

@alexandre-paroissien
Copy link
Author

Here is my environment, exceptions only happen when I switch from kombu 4.3.0 and redis 2.10.6 to kombu 4.4.0 and redis 3.2.0 (The environment and the other libraries remaining unchanged)

Stack:
Ubuntu 18 (Heroku-18)
Python 3.6.8
Redis 3.2.12 (Redis To Go)

Libraries:
Django==2.1.7
djangorestframework==3.9.1
celery==4.2.1
kombu==4.3.0 ok / 4.4.0 not ok
billiard==3.5.0.5
amqp==2.4.2
vine==1.2.0
django-celery-beat==1.4.0
redis==2.10.6 ok / redis 3.2.0 not ok

I hope this can help understanding the exceptions. Is there any other information I could collect or tests I could do?

@alexandre-paroissien alexandre-paroissien changed the title "Error 110 while writing to socket. Connection timed out." After upgrading to kombu 4.4.0 and redis 3.0 "Error 110 while writing to socket. Connection timed out." After upgrading to kombu 4.4.0 and redis 3.2.0 Mar 13, 2019
@alexandre-paroissien
Copy link
Author

alexandre-paroissien commented Mar 13, 2019

@thedrow Ok I will check the logs

By upstream you mean python-redis library?

I just checked and it seems you're right! Found the following issues in python-redis:

Sporadic "Connection not ready" exceptions with BlockingConnectionPool since 3.2.0 (22 days ago)
redis/redis-py#1136

3.2.0 Error while reading from socket: ('Connection closed by server.',) (15 days ago)
redis/redis-py#1140

@alexandre-paroissien
Copy link
Author

alexandre-paroissien commented Mar 13, 2019

These 2 previous issues are there in python-redis 3.2.0

Reverting to 3.1.0 would reintroduce this issue that was fixed in 3.2.0:
redis/redis-py#1127

So maybe the previous python-redis 3.0.1 would be fine?

But it seems that currently kombu is requiring redis >= 3.2.0

@alexandre-paroissien
Copy link
Author

I am not using eventlet

I just posted this comment on redis-py redis/redis-py#1140 (comment)

@dejlek
Copy link

dejlek commented Mar 22, 2019

I've noticed similar behaviour as well. I had to fix this by catching the error and reconnecting... It is really annoying. It reminds me of another problem that started few months ago - with connection timeouts. I've reported this to Celery but who knows what is really causing it...

@alexandre-paroissien
Copy link
Author

Hi @nyhobbit and @dejlek I still have the issue, how about you? Any update?
Maybe you can report your situation there too redis/redis-py#1140

@auvipy
Copy link
Member

auvipy commented May 7, 2019

celery 4.3 and latest kombu?

@harrybiddle
Copy link

harrybiddle commented May 7, 2019

Hey @alexandre-paroissien, I'm sorry, I gave up and downgraded to Redis 2.10.6 / Kombu 4.3.0 / a forked Celery 4.2 with Python 3.7 support...!

I also considered doing what @dejlek did; I think if we have to upgrade and the problem is still there that's what I'll have to do too :(.

@auvipy
Copy link
Member

auvipy commented May 7, 2019

did you try kombu 4.5 and celery 4.3?

@harrybiddle
Copy link

I saw this on two stacks: one with

  • celery 4.3.0rc1
  • kombu 4.3.0
  • redis 3.2.0

and another with

  • celery 4.3.0rc1
  • kombu 4.4.0
  • redis 3.2.1

In both cases the downgrade to Celery 4.2 / kombu 4.3.0 / redis 2.10.6 fixed it. I don't know which library was causing the issue!

Unfortunately we don't have a great testing environment for me to try other combinations. It only appeared intermittently on production, and not at all on staging, I guess because the staging traffic was too low. We have a plan to run some load testing on staging. I very much doubt we'll do it any time soon, but if we do get to that and this issue is still open I'll try newer versions of the libraries!

@auvipy auvipy added this to the 4.5.x Maintenance milestone May 9, 2019
@alexandre-paroissien
Copy link
Author

alexandre-paroissien commented May 21, 2019

I confirm I still encounter this issue in the most recent versions of the libraries
celery 4.3.0, kombu 4.5.0, redis 3.2.1

(I also had the bug when just upgrading kombu/redis and keeping celery at 4.2.1)

I tested in a test app with no traffic apart from me, I launched a simple task manually, first time worked, second time gave the following output (and ending up working)


2019-05-21T07:51:02.022118+00:00 app[worker.1]: [2019-05-21 07:51:02,021: ERROR/ForkPoolWorker-5] Connection to Redis lost: Retry (0/20) now.
2019-05-21T07:51:02.024750+00:00 app[worker.1]: [2019-05-21 07:51:02,024: ERROR/ForkPoolWorker-5] Connection to Redis lost: Retry (1/20) in 1.00 second.
2019-05-21T07:51:03.028332+00:00 app[worker.1]: [2019-05-21 07:51:03,028: ERROR/ForkPoolWorker-5] Connection to Redis lost: Retry (2/20) in 1.00 second.
2019-05-21T07:51:04.032513+00:00 app[worker.1]: [2019-05-21 07:51:04,032: ERROR/ForkPoolWorker-5] Connection to Redis lost: Retry (3/20) in 1.00 second.
2019-05-21T07:51:05.037741+00:00 app[worker.1]: [2019-05-21 07:51:05,037: ERROR/ForkPoolWorker-5] Connection to Redis lost: Retry (4/20) in 1.00 second.
2019-05-21T07:51:06.041513+00:00 app[worker.1]: [2019-05-21 07:51:06,041: ERROR/ForkPoolWorker-5] Connection to Redis lost: Retry (5/20) in 1.00 second.
2019-05-21T07:51:07.045367+00:00 app[worker.1]: [2019-05-21 07:51:07,045: ERROR/ForkPoolWorker-5] Connection to Redis lost: Retry (6/20) in 1.00 second.
2019-05-21T07:51:08.048339+00:00 app[worker.1]: [2019-05-21 07:51:08,048: ERROR/ForkPoolWorker-5] Connection to Redis lost: Retry (7/20) in 1.00 second.
2019-05-21T07:51:09.052390+00:00 app[worker.1]: [2019-05-21 07:51:09,052: ERROR/ForkPoolWorker-5] Connection to Redis lost: Retry (8/20) in 1.00 second.

@alexandre-paroissien alexandre-paroissien changed the title "Error 110 while writing to socket. Connection timed out." After upgrading to kombu 4.4.0 and redis 3.2.0 "Error 110 while writing to socket. Connection timed out." With kombu 4.4.0/4.5.0 and redis 3.2.0/3.2.1 May 22, 2019
@alexandre-paroissien
Copy link
Author

Could it be related to this issue? celery/celery#3932

@alexandre-paroissien
Copy link
Author

Right, for anyone reading this, it seems the issue appeared from the 3.x branch of redispy indeed (which is required by celery 4.3+ / kombu 4.4+), and the status of the issue is evolving there:

redis/redis-py#1140

@auvipy auvipy closed this as completed Jun 24, 2019
@auvipy auvipy pinned this issue Jun 24, 2019
@auvipy auvipy reopened this Jun 25, 2019
@mlissner
Copy link

mlissner commented Aug 5, 2019

I'm unfortunately in no position to fix this, but I thought I'd at least make sure that the dots were connected between redis-py and here. Over on their upstream issue, they just did a release (3.3.0) with a new feature, and they say:

For Celery users, this change won't automatically fix ConnectionErrors encountered by Celery. Celery uses PubSub in a non-standard way which can not take advantage of the automatic health checks at this time. Once this code is released, we should be able to create a PR for Celery to regularly call pubsub.check_health().

(redis/redis-py#1140 (comment))

It'd be great to have Celery start using this functionality.

@bwilliams18
Copy link

Is there currently a bodge to prevent this from happening while a more permanent fix is in the works?

@Fiftyseventheory
Copy link

I'm experiencing this issue as well.
celery==4.3.0
kombu==4.6.4
redis==3.3.8

@rotten
Copy link

rotten commented Nov 27, 2019 via email

@auvipy
Copy link
Member

auvipy commented Nov 28, 2019

thanks for the report. there is an open PR relating to this check #1122 plz

@wisefool769
Copy link

I am still seeing this with:
celery==4.4.0
django-redis==4.10.0
kombu==4.6.7
redis==3.3.11

Is there any setting i have to change to pick up the fixes in #1122 ?

@jingtt0704
Copy link

I am still seeing this with:
celery==4.4.0
django-redis==4.10.0
kombu==4.6.7
redis==3.3.11

Is there any setting i have to change to pick up the fixes in #1122 ?

I am also still having the issue with the same lib versions

@Ashish-Bansal
Copy link
Contributor

@wisefool769 @jingtt0704 This issue will still persist in case celery's event loop is not running.

Ref - #1019 (comment)

@sklarsa
Copy link

sklarsa commented Feb 21, 2020

@auvipy can this issue be reopened? I don't think this is fully resolved yet, as I'm experiencing errors similar to many of the recent commenters on this thread

@sklarsa
Copy link

sklarsa commented Feb 21, 2020

@Ashish-Bansal Is there any way that I can help fix this in the case you described where celery's event loop is not running?

For clarification, I'm running

celery = "=4.3.0"
kombu = "=4.4.0"
redis = "=2.10.5"

@auvipy
Copy link
Member

auvipy commented Feb 26, 2020

try latest celery with latest kombu

@gtalarico
Copy link

gtalarico commented Mar 24, 2020

Sorry to pile up on this, just wanted to report I am having the same error, and I am on the latest releases of celery, kombu and redis.

Traceback

[2020-03-24 16:28:44,787: INFO/MainProcess] mingle: all alone
[2020-03-24 16:30:23,873: WARNING/MainProcess] consumer: Connection to broker lost. Trying to re-establish the connection...
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/celery/worker/consumer/consumer.py", line 318, in start
    blueprint.start(self)
  File "/usr/local/lib/python3.7/site-packages/celery/bootsteps.py", line 119, in start
    step.start(parent)
  File "/usr/local/lib/python3.7/site-packages/celery/worker/consumer/consumer.py", line 599, in start
    c.loop(*c.loop_args())
  File "/usr/local/lib/python3.7/site-packages/celery/worker/loops.py", line 83, in asynloop
    next(loop)
  File "/usr/local/lib/python3.7/site-packages/kombu/asynchronous/hub.py", line 364, in create_loop
    cb(*cbargs)
  File "/usr/local/lib/python3.7/site-packages/kombu/transport/redis.py", line 1088, in on_readable
    self.cycle.on_readable(fileno)
  File "/usr/local/lib/python3.7/site-packages/kombu/transport/redis.py", line 359, in on_readable
    chan.handlers[type]()
  File "/usr/local/lib/python3.7/site-packages/kombu/transport/redis.py", line 693, in _receive
    ret.append(self._receive_one(c))
  File "/usr/local/lib/python3.7/site-packages/kombu/transport/redis.py", line 703, in _receive_one
    response = c.parse_response()
  File "/usr/local/lib/python3.7/site-packages/redis/client.py", line 3453, in parse_response
    response = self._execute(conn, conn.read_response)
  File "/usr/local/lib/python3.7/site-packages/redis/client.py", line 3427, in _execute
    return command(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 734, in read_response
    response = self._parser.read_response()
  File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 316, in read_response
    response = self._buffer.readline()
  File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 248, in readline
    self._read_from_socket()
  File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 193, in _read_from_socket
    raise ConnectionError(SERVER_CLOSED_CONNECTION_ERROR)
redis.exceptions.ConnectionError: Connection closed by server.
[2020-03-24 16:30:23,911: INFO/MainProcess] Connected to redis://redis:11535//

celery==4.4.1
kombu==4.6.8
redis==3.4.1

@adi-
Copy link

adi- commented Apr 21, 2020

Same here. Any fixes on that?

@yashu-seth
Copy link

Same here.

@PaiHsuehChung
Copy link

celery==4.4.0
kombu==4.6.11
redis==3.5.3

encounter same issue

@hampsterx
Copy link

what version of redis are you running? we had a version prior to 5.x and had this issue with following libs noted at the time:

redis 3.2.0
celery 4.3.0

After upgrading we haven't seen the connection time out issue since.

Worth ruling out redis in any case..

@PaiHsuehChung
Copy link

Redis version is 6.2
So should i downgrading redis version or not?

@hampsterx
Copy link

nope, was just a idea :(

@westonplatter
Copy link

After upgrading we haven't seen the connection time out issue since.

@hampsterx what package and version did you upgrade to?

@seanquinn
Copy link

What's the solution for this - seen recently on:

celery==4.4.0
kombu==4.6.11
redis==3.2.0

@matusvalo
Copy link
Member

matusvalo commented Sep 11, 2021

Let me reopen the issue since it seems to be occurring.

@matusvalo matusvalo reopened this Sep 11, 2021
@KFoxder
Copy link
Contributor

KFoxder commented Sep 11, 2021

@matusvalo @seanquinn We were able to fix this by doing the following.

You should be able to resolve this by setting the health_check_interval option which was added in kombu version 4.6.8. You will need to specify this via Celery's broker_transport_options option.

broker_transport_options = {
    # Check the health of connections every 5 seconds
    'health_check_interval': 5,
}  

@auvipy
Copy link
Member

auvipy commented Sep 12, 2021

@matusvalo @seanquinn We were able to fix this by doing the following.

You should be able to resolve this by setting the health_check_interval option which was added in kombu version 4.6.8. You will need to specify this via Celery's broker_transport_options option.

broker_transport_options = {
    # Check the health of connections every 5 seconds
    'health_check_interval': 5,
}  

is this mentioned in the docs already? if so we can close this

alexnsu added a commit to alexnsu/lemur that referenced this issue Mar 29, 2022
this might solve one of the issues we see when our redis pod restarts celery/kombu#1019
@auvipy auvipy closed this as completed Apr 23, 2022
@harshita01398
Copy link

Still facing the same issue on

Celery: 4.3.1
Redis: 4.1.2
Kombu: 4.6.11

Even after adding health_check_interval, seeing errors in large numbers.

Have already added below setting
broker_transport_options = {
# Check the health of connections every 5 seconds
'health_check_interval': 5,
}

Do we have a fix in any latest updates?

@westonplatter
Copy link

Fixed after we upgraded Celery

nezdolik pushed a commit to spotify/lemur that referenced this issue Mar 23, 2023
* Add a health check from celery to redis

this solved connection problems between celery and redis for other peeps celery/kombu#1019

* oops had a dev thingy still present in the conf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment