memcached.PyMemcacheCache reentrancy problem with ASGI-based runserver #534

harmv · 2024-10-24T12:50:57Z

As requested (Carlton Gibson) in Django ticket https://code.djangoproject.com/ticket/35757#comment:21 I open an issue in this repo.

Problem

When using 'daphne' + 'channels' in a Django polls project, then runserver has reentrancy issues, which becomes visible when enabling for example memcached.PyMemcacheCache.

Steps to reproduce

Start an project mysite, and add application polls
Add the following contents to polls/views.py

from django.http import HttpResponse
from django.core.cache import cache

def incr():
    key = 'example:key'
    try:
        val = cache.incr(key, 1)
    except ValueError:
        cache.set(key, 1)
        val = 1

    return val

def index(request):
    v = incr()
    return HttpResponse("Hello, world. You're at the polls index. %d" % v)

Add the following to polls/urls.py

from django.urls import path

from . import views

urlpatterns = [
    path("", views.index, name="index"),
]

Add the following settings to setting.py

CACHES = {
    'default': {
        'BACKEND': 'django.core.cache.backends.memcached.PyMemcacheCache',
        # 'BACKEND': 'django.core.cache.backends.locmem.LocMemCache',           # using LocMemCache 'fixes'(=hides?) the problem
        'LOCATION': "localhost:11211"
    },
}

INSTALLED_APPS = [
    'daphne',     #   removing this fixes the problem (with channels 4)
    'channels',   

    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
]

ASGI_APPLICATION = "mysite.asgi.application"                # required for channels

run a memcached instance: memcached -vv
Fire up runserver ./manage.py runserver
Browse to http://localhost:8000/polls/ and see everything working. An incremeting counter shows.
Now to reproduce the bug do many requests simultaneously, by running a script get_parallel.sh

#!/bin/bash
#
# do requests in parallel
# https://code.djangoproject.com/ticket/35757
url='http://localhost:8000/polls/'

curl $url&
curl $url&
curl $url&
curl $url&
curl $url&
curl $url&
curl $url

Result

within a few seconds exceptions occur like


  File "..../views/api.py", line 2512, in example_url
    cache.incr("some key", 1)
  File "./venv/lib/python3.12/site-packages/django/core/cache/backends/memcached.py", line 110, in incr
    val = self._cache.incr(key, delta)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./venv/lib/python3.12/site-packages/pymemcache/client/hash.py", line 350, in incr
    return self._run_cmd("incr", key, False, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./venv/lib/python3.12/site-packages/pymemcache/client/hash.py", line 322, in _run_cmd
    return self._safely_run_func(client, func, default_val, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./venv/lib/python3.12/site-packages/pymemcache/client/hash.py", line 211, in _safely_run_func
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "./venv/lib/python3.12/site-packages/pymemcache/client/base.py", line 827, in incr
    results = self._misc_cmd([cmd], b"incr", noreply)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./venv/lib/python3.12/site-packages/pymemcache/client/base.py", line 1291, in _misc_cmd
    buf, line = _reader(self.sock, buf)
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "./venv/lib/python3.12/site-packages/pymemcache/client/base.py", line 1658, in _readline
    buf = _recv(sock, RECV_SIZE)
          ^^^^^^^^^^^^^^^^^^^^^^
  File "./venv/lib/python3.12/site-packages/pymemcache/client/base.py", line 1750, in _recv
    return sock.recv(size)

 OSError: [Errno 9] Bad file descriptor

After a few runs runserver stops serving requests completely, all requests, even sequential (slow ones) ones from a browser, get a 500 with this traceback.

Traceback (most recent call last):
  File "venv/lib/python3.12/site-packages/asgiref/sync.py", line 518, in thread_handler
    raise exc_info[1]
  File "venv/lib/python3.12/site-packages/django/core/handlers/exception.py", line 42, in inner
    response = await get_response(request)
  File "venv/lib/python3.12/site-packages/asgiref/sync.py", line 518, in thread_handler
    raise exc_info[1]
  File "/venv/lib/python3.12/site-packages/django/core/handlers/base.py", line 253, in _get_response_async
    response = await wrapped_callback(
  File "venv/lib/python3.12/site-packages/asgiref/sync.py", line 468, in __call__
    ret = await asyncio.shield(exec_coro)
  File ".../Frameworks/Python.framework/Versions/3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/venv/lib/python3.12/site-packages/asgiref/sync.py", line 522, in thread_handler
    return func(*args, **kwargs)
  File "mysite/polls/views.py", line 20, in index
    incr()
  File "mysite/polls/views.py", line 11, in incr
    val = cache.incr(key, 1)
  File "venv/lib/python3.12/site-packages/django/core/cache/backends/memcached.py", line 110, in incr
    val = self._cache.incr(key, delta)
  File "venv/lib/python3.12/site-packages/pymemcache/client/hash.py", line 350, in incr
    return self._run_cmd("incr", key, False, *args, **kwargs)
  File "venv/lib/python3.12/site-packages/pymemcache/client/hash.py", line 314, in _run_cmd
    client = self._get_client(key)
  File "venv/lib/python3.12/site-packages/pymemcache/client/hash.py", line 182, in _get_client
    raise MemcacheError("All servers seem to be down right now")

pymemcache.exceptions.MemcacheError: All servers seem to be down right now

Versions

channels==4.1.0
daphne==4.1.2
Django==5.1.1
python 3.12.6

Reproducibility

The problem appears within seconds when running the .sh script.
I'm also able to reproduce the problem with channels 3
The problem only occurs with runserver, on production we use daphne <ourappname>.asgi:application -b 0.0.0.0 -p 8000 and that seems to work fine.

When removing daphne from INSTALLED_APPS, the problem disappears. (but then ofcourse channels are not working properly)

The text was updated successfully, but these errors were encountered:

carltongibson · 2024-10-24T14:17:59Z

Thanks for the follow up @harmv. Very clear. 👍

adiroiban · 2024-10-29T11:43:41Z

From what I can see , pymemcache implement a custom threading pool ... https://github.com/pinterest/pymemcache/blob/master/pymemcache/pool.py

For Twisted based application, any code that is expected to run in a thread should use the "official" Twisted threading api

see the docs here https://docs.twisted.org/en/stable/core/howto/threading.html

From what I can see, the code from pymemcache/client/hash.py is not compatible with Twisted.

As a general rule, you can't mix Twisted code with random threading code.

Twisted exists to avoid the need for using threads and the majority of Twisted code is not thread safe.

I think that the fix here, is to implement a Twisted aware memcache backend

carltongibson · 2024-10-29T12:05:51Z

Thanks for the inout @adiroiban! That makes sense.

harmv · 2024-11-03T11:34:46Z

@adiroiban , I doubt that that is the problem. I actually can't find a threadpool in pymemcache.

When I look in https://github.com/pinterest/pymemcache/blob/master/pymemcache/pool.py (or in any code of pymemcache) I don't see it using threads at all.

I see threading.Lock() beeing used, but only that. Nowhere I see an invocation of threading.Thread().

$ find . -name "*.py" | xargs grep threading
./pymemcache/pymemcache/pool.py:import threading
./pymemcache/pymemcache/pool.py:            self._lock = threading.Lock()

or am I missing something?

adiroiban · 2024-11-03T12:00:17Z

I am not familiar with how the django server ./manage.py runserver handles each HTTP request.

Is each request execute inside the Twisted loop or inside Django's own threads ?

I see this, which hint that the pymemcache is executed from a thread.
pymemcache might not create a thread by itself, buy maybe the code that uses pymemcache is called from a thread.

  File "venv/lib/python3.12/site-packages/asgiref/sync.py", line 518, in thread_handler

esbozos · 2024-11-10T17:59:57Z

This issue has been ongoing for a couple of months now: any concurrent requests cause the server to crash. Although removing Daphne temporarily solved the problem during development.
Using uvicorn for asgi meanwhile.

Application instance <Task cancelling name='Task-563' coro=<ASGIStaticFilesHandler.__call__() running at /home/dev/.envs/parka/lib/python3.12/site-packages/django/contrib/staticfiles/handlers.py:101> wait_for=<_GatheringFuture pending cb=[Task.task_wakeup()]>> for connection <WebRequest at 0x7f4450791280 method=GET uri=/api/config/site/?version=0 clientproto=HTTP/1.1> took too long to shut down and was killed.

greezybacon · 2024-12-28T05:37:10Z

I can confirm this is an issue with the runserver feature of Django when using Daphne. It consistently (albeit randomly and racily) corrupted either the data in memcached or corrupted the socket protocol in the connection and resulted in "timeout" errors.

I was able to resolve the issue using the ThreadMappedPool feature of pylibmc like this:

from django.core.cache.backends.memcached import PyLibMCCache
from functools import cached_property
from pylibmc.pools import ThreadMappedPool

class DeferredLookup:
    def __init__(self, pool):
        self._pool = pool

    def __getattr__(self, name):
        def deferred(*args, **kwargs):
            with self._pool.reserve() as conn:
                return getattr(conn, name)(*args, **kwargs)

        return deferred

class ThreadedPyLibMCCache(PyLibMCCache):
    """
    Slight extension to the pylibmc cache for Django which uses the
    ThreadMappedPool feature of pylibmc to ensure that each thread has a
    corresponding connection to the memcached server. This provides both a
    performance benefit that no thread needs to wait for a connection to be
    available and also a safety benefit that the connection protocol and data is
    not corrupted from thread preemption.

    References:
    https://sendapatch.se/projects/pylibmc/pooling.html#thread-mapped-pooling
    """

    @cached_property
    def _cache(self):
        # Return an object which, when an attribute is looked up (eg. such as
        # `.get`), returns a callable closure. Then, when the closure is called,
        # acquire a connection from the pool and use it to perform the cache
        # operation
        pool = ThreadMappedPool(super()._cache)
        return DeferredLookup(pool)

And then use it for the Django CACHES

CACHES = {
    'default': {
        'BACKEND': 'path.to.ThreadedPyLibMCCache',
        'LOCATION': '127.0.0.1:11211',
        'OPTIONS': {
            'binary': True
        }
    }
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memcached.PyMemcacheCache reentrancy problem with ASGI-based runserver #534

memcached.PyMemcacheCache reentrancy problem with ASGI-based runserver #534

harmv commented Oct 24, 2024 •

edited

Loading

carltongibson commented Oct 24, 2024

adiroiban commented Oct 29, 2024

carltongibson commented Oct 29, 2024

harmv commented Nov 3, 2024

adiroiban commented Nov 3, 2024

esbozos commented Nov 10, 2024

greezybacon commented Dec 28, 2024

memcached.PyMemcacheCache reentrancy problem with ASGI-based runserver #534

memcached.PyMemcacheCache reentrancy problem with ASGI-based runserver #534

Comments

harmv commented Oct 24, 2024 • edited Loading

Problem

Steps to reproduce

Result

Versions

Reproducibility

carltongibson commented Oct 24, 2024

adiroiban commented Oct 29, 2024

carltongibson commented Oct 29, 2024

harmv commented Nov 3, 2024

adiroiban commented Nov 3, 2024

esbozos commented Nov 10, 2024

greezybacon commented Dec 28, 2024

harmv commented Oct 24, 2024 •

edited

Loading