Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add conditional to avoid loading unexisting apps. Fixes #58 #59

Closed
wants to merge 4 commits into from

Conversation

jlariza
Copy link
Contributor

@jlariza jlariza commented Apr 2, 2024

Validates that the app label is in the available configs before retrieving the model instance to index. Fixes #58

@JordanHyatt
Copy link
Contributor

JordanHyatt commented Apr 2, 2024

I think this fixes one of the problems....but I don't think it addresses the TaskResult issue. I think the issue there is that the instance is in a transaction. So when the task try's to re-pull the object it gets a DB error. Maybe just a try/catch on that error handles this?

@jlariza
Copy link
Contributor Author

jlariza commented Apr 3, 2024

I think this fixes one of the problems....but I don't think it addresses the TaskResult issue. I think the issue there is that the instance is in a transaction. So when the task try's to re-pull the object it gets a DB error. Maybe just a try/catch on that error handles this?

I just added the try/catch. I guess that if the instance does not exists, the task should do nothing, right?

@JordanHyatt
Copy link
Contributor

I think this fixes one of the problems....but I don't think it addresses the TaskResult issue. I think the issue there is that the instance is in a transaction. So when the task try's to re-pull the object it gets a DB error. Maybe just a try/catch on that error handles this?

I just added the try/catch. I guess that if the instance does not exists, the task should do nothing, right?

Yeah, the only other thing I can think of is involving transaction.on_commit somehow but not sure if that would even work. I'd say the current solution is good enough to get this patched.

@qcoumes
Copy link
Member

qcoumes commented Apr 3, 2024

Thank you for your reports and the PR!

I get the problem with transactions raising a DoesNotExists exception, but I'm not sure I understand where the LookupError: No installed app with label '<app>' comes from.

Regardless, I think the following comment made by @JordanHyatt makes sense and we should avoid sending "useless" tasks to Celery if possible:

#58 (comment)
It seems the signal handler is passing though every object that saves to the celery task then letting the registry do the "vetting". The problem is if that object is in transaction the task is failing to retrieve the object. Seems like the fix maybe to "vet" the object first before invoking the celery task. Edit* looks like you are working on the fix!

So in my opinion the if app_label in apps.app_configs: part should be done in CelerySignalProcessor.handle_save.

In the meantime I'll check why docker-compose is suddenly unavailable in actions 🙃.

@qcoumes
Copy link
Member

qcoumes commented Apr 3, 2024

I also think we should mention the transaction edge case in the documentation, maybe below django_opensearch_dsl.signals.CelerySignalProcessor in the OPENSEARCH_DSL_SIGNAL_PROCESSOR section.

@qcoumes qcoumes added the bug Something isn't working label Apr 3, 2024
@JordanHyatt
Copy link
Contributor

JordanHyatt commented Apr 3, 2024

So in my opinion the if app_label in apps.app_configs: part should be done in CelerySignalProcessor.handle_save.

I agree. I guess I didn't realize the PR was doing it in the actual signal function. Honestly, thinking about it more, invoking handle_save_task with transaction.on_commit may solve the mid transaction problem

@JordanHyatt
Copy link
Contributor

Also while we are on the subject....shouldn't those task calls be using the handle_save_task.delay method ? Otherwise they just get run in the same thread

@qcoumes
Copy link
Member

qcoumes commented Apr 3, 2024

invoking handle_save_task with transaction.on_commit may solve the mid transaction problem

In that case, checking if we're in a transaction with transaction.get_autocommit() in CelerySignalProcessor.handle_save and raising an explicit exception may be a better solution.

This way the exception will happen on the main thread, which can be better since an exception in Celery might be overlooked.

Also while we are on the subject....shouldn't those task calls be using the handle_save_task.delay method ? Otherwise they just get run in the same thread

Uh, that's a major oversight on my part 😓, thanks for catching it. This make the processor kinda pointless. @jlariza Can you add this to your MR ?

@qcoumes
Copy link
Member

qcoumes commented Apr 3, 2024

By the way, I fixed the actions, rebasing on master will allow them to run.

@JordanHyatt
Copy link
Contributor

JordanHyatt commented Apr 3, 2024

invoking handle_save_task with transaction.on_commit may solve the mid transaction problem

In that case, checking if we're in a transaction with transaction.get_autocommit() in CelerySignalProcessor.handle_save and raising an explicit exception may be a better solution.

This way the exception will happen on the main thread, which can be better since an exception in Celery might be overlooked.

Also while we are on the subject....shouldn't those task calls be using the handle_save_task.delay method ? Otherwise they just get run in the same thread

Uh, that's a major oversight on my part 😓, thanks for catching it. This make the processor kinda pointless. @jlariza Can you add this to your MR ?

Also I think we should add a method (instance_requires_update ?) to the registry that does a logic check on the instance to see if it even has a document attached to it that way we could avoid unnecessary task calls and the signal handler would look something like.....

def handle_save(self, sender, instance, **kwargs):
    """Update the instance in model and associated model indices."""
    required = registry.instance_requires_update(instance)
    if required:
        # do the additional checks here we have been discussing
        handle_save_task(instance._meta.app_label, instance.__class__.__name__, instance.pk)

the method would basically do the same logic update has....
if instance.__class__ in self._models

I'm happy to write the code, but I don't want too many cooks in the kitchen.....just let me know!

@jlariza jlariza force-pushed the celerysignal-proccesor-fix branch 2 times, most recently from 3763999 to 48460d2 Compare April 5, 2024 13:54
@jlariza
Copy link
Contributor Author

jlariza commented Apr 5, 2024

@qcoumes @JordanHyatt code updated. Please, check. Thank you

@jlariza jlariza force-pushed the celerysignal-proccesor-fix branch from 48460d2 to 08e1e3c Compare April 5, 2024 13:56
@JordanHyatt
Copy link
Contributor

JordanHyatt commented Apr 5, 2024

@qcoumes @JordanHyatt code updated. Please, check. Thank you

We are getting there! I think we need to improve the logic of whether to invoke the task. what you currently have will not send if its a related instance, which we still want. Also I think this check should be applied to delete as well. this is what I have, let me know what you think! We are currently using this custom Processor in our code base and it seems to be working well, obviously needs unit tests though.

class CelerySignalProcessor(CelerySignalProcessor):

    def instance_requires_update(self, instance):
        model = instance._meta.model
        m1 = model in registry._models
        m2 = instance.__class__.__base__ in registry._models
        m3 = bool(list(registry._get_related_doc(instance)))
        if m1 or m2 or m3:
            return True
        return False
        
    def handle_save(self, sender, instance, **kwargs):
        """Update the instance in model and associated model indices."""
        if self.instance_requires_update(instance):
            transaction.on_commit(
                partial(
                    handle_save_task.delay,
                    app_label=instance._meta.app_label,
                    model=instance.__class__.__name__,
                    pk=instance.pk,
                )
            )

    def handle_pre_delete(self, sender, instance, **kwargs):
        """Delete the instance from model and associated model indices."""
        if self.instance_requires_update(instance):
            handle_pre_delete_task.delay(
                serialize(
                    "json",
                    [instance],
                    cls=DODConfig.signal_processor_serializer_class(),
                )
            )

@qcoumes
Copy link
Member

qcoumes commented Apr 5, 2024

We indeed also need to check for related instances. I like the idea of the instance_requires_update. I think it should be added to the registry, or at least BaseSignalProcessor so that it could also be used by future processors.

@JordanHyatt
Copy link
Contributor

Hey team, I submitted a new PR in an attempt to get this important patch pushed through.

@qcoumes
Copy link
Member

qcoumes commented Apr 13, 2024

Merged in #63, I added you as co-authored @jlariza.

@qcoumes qcoumes closed this Apr 13, 2024
@jlariza jlariza deleted the celerysignal-proccesor-fix branch April 16, 2024 02:30
@jlariza
Copy link
Contributor Author

jlariza commented Apr 16, 2024

sorry @JordanHyatt @qcoumes I was a little busy last week. Thanks for the new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CelerySignalProcessor causes DoesNotExist error when used with django_celery_results
3 participants