Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify retrying of sync tasks to use resumesync command #9493

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

bjester
Copy link
Member

@bjester bjester commented Jun 7, 2022

Summary

  • Refactors task restart behavior to utilize JobValidator for customization on a per-task basis
  • Hooks into retry/restart behavior of sync tasks to attempt a resumesync if possible

References

Addresses #7270

Reviewer guidance

  1. Initiate a sync that will transfer a significant amount of data
  2. Interrupt the sync during the transferring stage, like breaking connectivity (disconnecting from wifi)
  3. Wait for the task to fail
  4. Restablish connectivity
  5. Click RETRY
  6. The sync task should pick up where it left off

Testing checklist

  • Contributor has fully tested the PR manually
  • If there are any front-end changes, before/after screenshots are included
  • Critical user journeys are covered by Gherkin stories
  • Critical and brittle code paths are covered by unit tests

PR process

  • PR has the correct target branch and milestone
  • PR has 'needs review' or 'work-in-progress' label
  • If PR is ready for review, a reviewer has been added. (Don't use 'Assignees')
  • If this is an important user-facing change, PR or related issue has a 'changelog' label
  • If this includes an internal dependency change, a link to the diff is provided

Reviewer checklist

  • Automated test coverage is satisfactory
  • PR is fully functional
  • PR has been tested for accessibility regressions
  • External dependency files were updated if necessary (yarn and pip)
  • Documentation is updated
  • Contributor is in AUTHORS.md

@github-actions
Copy link
Contributor

github-actions bot commented Jun 7, 2022

@bjester bjester force-pushed the v2-retry-resumesync branch from 0aee570 to ea5f630 Compare June 9, 2022 00:45
@bjester bjester added TODO: needs review Waiting for review python Pull requests that update Python code labels Jun 9, 2022
@bjester bjester added this to the 0.16.0 milestone Jun 9, 2022
@bjester bjester marked this pull request as ready for review June 9, 2022 13:34
@bjester bjester requested review from rtibbles and nucleogenesis June 9, 2022 13:35
Copy link
Member

@rtibbles rtibbles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I am following - but I think I'd prefer the invocation of the retry validation to be a little more explicit.

@@ -182,18 +182,6 @@ def from_json(cls, json_string):

return Job(func, **working_dictionary)

@classmethod
def from_job(cls, job, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer to leave this method here and reference it elsewhere - so that it can continue to be updated in lockstep with updates to the Job constructor.

For example in this PR where I've added a new parameter to the Job object: https://github.com/learningequality/kolibri/pull/9503/files#diff-09ac69ff66caa119977d3ae21fe8dc8bf59f0eaa27b34fbb002e8446f09ca98bR151

Seeing how you're now using it below, maybe we change it from a class method to a regular method and it becomes to_constructor_args or some such?

"facility_id": job.facility_id,
}

def to_representation(self, instance):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a little while to work through in my head why this was doing something in the case of a restart, but not in any other instance.

I wonder if it might be better to not override to_representation here, and just explicitly call validate_for_restart in the API endpoint restart method to get the final data that we want?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hacking of a REST serializer for job validation is what makes this awkward, in my opinion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for validating data coming from the API endpoint, so I think it's a reasonable use case - any suggestions for how to update the way we're using it to make it less awkward? I was not a biggest fan of having to return and update the dict of the job args, but I couldn't think of another way while implementing it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fairly uncoupled from the API endpoints, so its implementation doesn't seem to enforce use solely through API requests. If that's the intent, it seems that the logic in Task.validate_job_data would be better placed within the TasksViewSet.validate_create_req_data method.

Otherwise, if we keep validate_job_data, then I think removing inheritance of serializers.Serializer and defining only the functionality we intend, purely validation related logic, will clarify the implementation. The to_representation method is an unimplemented method in BaseSerializer but has an implementation in the Serializer which seems we don't intend to utilize, per your comments. That would remove the validator's use case of accepting an instance, as opposed to a data object, which is the functionality that I hacked given the current structure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fairly uncoupled from the API endpoints, so its implementation doesn't seem to enforce use solely through API requests.

This is also a feature of DRF Serializers in general - they can be used for deserialization/validation outside of an API endpoint.

If that's the intent, it seems that the logic in Task.validate_job_data would be better placed within the TasksViewSet.validate_create_req_data method.

There is one place it is used outside of an API endpoint - in migrating the validate_sync_task function that it replaced, it is used not to process data from an API endpoint within the single user syncing functions.

I think removing inheritance of serializers.Serializer and defining only the functionality we intend, purely validation related logic, will clarify the implementation

This seems fine - I had not done this previously to avoid vendoring/duplicating code from DRF, but I think as long as it is using the serializer MetaClass there shouldn't be any gotchas here.

I can file a follow up issue for cleanup to avoid this being blocking here!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they can be used for deserialization/validation outside of an API endpoint.

And serialization, like I've done here :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed here: #9513

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And serialization, like I've done here :)

Hah - true - maybe just some code comments to sign post it would be enough, I just had to think longer than I thought I should have to realize.

@Pritimayasahoo
Copy link

I want to work on this issue please assign me this issue

@bjester
Copy link
Member Author

bjester commented Jul 16, 2024

I think I'd rather start fresh with this. I could pull bits of code from this to pull in a new PR. Just some of the Facility task panel updates look rather complicated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python Pull requests that update Python code TODO: needs review Waiting for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants