-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed TaskGroup and CancelScope exit issues on asyncio #774
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one has been melting my mind a bit, sorry it has taken a while.
-
Consider adding a regression test for the case written in Different cancel scope behaviour on asyncio vs Trio #698. cac85cc (
test_cancelled_raises_beyond_origin
).- Another suggested case to add: 56cb8b2.
-
Consider adding a regression test for a case closely related to the case written in Different cancel scope behaviour on asyncio vs Trio #698. 73f1ab7 (
test_deadline_based_checkpoint
). Note that the failure mode oftest_deadline_based_checkpoint
on master is different than the failure mode oftest_cancelled_raises_beyond_origin
on master.One reason that I like this test case is because I found Trio's behavior in the case written in Different cancel scope behaviour on asyncio vs Trio #698 ("a level cancellation exception can keep propagating beyond the level that spawned the exception") to be surprising. (It violated my assumptions about Trio's level cancellation, and as pointed out in https://trio.discourse.group/t/do-cancelled-exceptions-know-which-block-they-belong-to/508 it seems to violate the first paragraph of Trio's "cancellation semantics" docs too.)
The nice thing about this test, IMO, is that if you accept the premise that
with CancelScope(deadline=-math.inf) as inner_scope: await sleep_forever()
is a valid way to implement
checkpoint()
, then the strange behavior follows directly from that (mostly). (I say "mostly" because in thetest_deadline_based_checkpoint
case, theCancelled
was legitimately triggered by either scope (I think it is undefined which scope is responsible for triggering it). Thetest_cancelled_raises_beyond_origin
case is stranger, because at the time ofcoro.throw(Cancelled())
, it is unambiguous which scope is responsible for throwing theCancelled
, but the exception still propagates up beyond that scope.) -
Consider adding a regression test for a case closely related to Different cancel scope behaviour on asyncio vs Trio #698. c1a4a85 (
test_cancelled_scope_based_checkpoint
). This is very similar totest_deadline_based_checkpoint
, except (IIUC)deadline_based_checkpoint
was a validcheckpoint()
implementation prior to Delay deciding which cancel scope a Cancelled exception belongs to python-trio/trio#860 butcancelled_scope_based_checkpoint
only became equivalent tocheckpoint()
in Delay deciding which cancel scope a Cancelled exception belongs to python-trio/trio#860. -
A bug:
CancelScope.cancelled_caught
is wrong in some cases. (I did not notice this until a lot later, but for the sake of the review being more organized, I've retroactively added tests for this to the four tests suggested above, which fail on this branch right now.) -
A longer comment regarding Different cancel scope behaviour on asyncio vs Trio #698:
-
AnyIO currently uses
task.cancel(msg)
to associate a level-CancelledError
with the cancel scope that made the correspondingtask.cancel(msg="cancelled by scope {origin}")
call (the level-CancelledError
's "origin" scope).These associations are what Different cancel scope behaviour on asyncio vs Trio #698 is all about. Trio does not associate each
Cancelled
with a particular "origin" scope (Delay deciding which cancel scope a Cancelled exception belongs to python-trio/trio#860) 1. The asyncio backend does make such associations, however. Thinking about it from Trio's perspective, this seems strange. Why wouldAsyncIOBackend
associate each level-CancelledError
with a particular cancel scope?AsyncIOBackend
needs to behave like Trio, so it needs to behave like it doesn't make such associations. On asyncio, is making these associations for some reason necessary in order to produce the correct, association-free, behavior? -
Something you mentioned: When Python 3.8 was supported, AnyIO could not rely on cancellation messages. It could do so in
sys.version_info
blocks, but so long as 3.8 was supported, the overall strategy of theCancelScope
andTaskGroup
code was restricted to not rely much on cancellation messages being available. IIUC, you dropped 3.8 ahead of EoL specifically so that AnyIO can always rely on cancellation messages in_uncancel
.I had the thought that it's possible that
_uncancel
is not the only avenue that themsg
support opens up. AnyIO now has the (new?) ability to differentiate between a level-CancelledError
and a native-CancelledError
. That could possibly be useful beyond_uncancel
.
So, one way to state Different cancel scope behaviour on asyncio vs Trio #698 is: AnyIO needs to behave as if it's not "tagging" a
get_cancelled_exc_class()
at.throw
-time with an origin scope. Maybe a good way to do that would be to stop tagging each level-CancelledError
with an origin scope in the first place? Is that even possible? (AnyIO does have this new(?) ability mentioned above, so perhaps it could help?)During the process of trying to learn/understand both Delay deciding which cancel scope a Cancelled exception belongs to python-trio/trio#860 and this PR I ended up playing with the code a lot, and at some point I realized I had basically attempted to do the above. It seems to work. (At least, it passes all tests, but it could easily still be broken around uncancellation or something.) Link: c1a4a85...d802a81. I did not find the
cancelled_caught
misbehavior until later but this implementation passes those four suggested tests as well.Do you think that the idea of relying on
msg
like this could be useful? (It may be that this particular implementation is better to use only as a toy model for the sake of me building understanding, but themsg
idea may still be independently useful.)(There are also a few things I found valuable about this implementation of
CancelScope.__exit__
&TaskGroup.__aexit__
that I think are worth pointing out. One is that it is very feasible to cross-reference with the equivalent code in the Trio project. While I was trying to get the behavior right, being able to compare line-by-line with the equivalent code in Trio was very helpful in finding and fixing cases that I missed/misunderstood. My mental model of cancel scopes also maps much more directly onto this implementation, line-by-line, so I find it much easier to reason about. I would be curious if it is this way for others as well. However, this implementation is more verbose (is this good or bad? it depends on the use-case) and the processing cost during__(a)exit__
is likely a bit higher because of the.split
calls (which Trio does too) and because the_parent_cancelled()
call.I point out these details of the implementation because I think the
CancelScope.__exit__
&TaskGroup.__aexit__
implementations on master and in this PR are very difficult to read through in order to audit correctness of, and to make changes to when a bug is found2. I would be curious to know if you and others feel the same or not. My thought is that perhaps these things could be made less difficult if the implementation aligned itself in structure more directly with the mental model it is implementing.) -
Footnotes
I added two commits to my toy implementation to fix uncancellation: c1a4a85...cbc8c2c (new changes only: d802a81...cbc8c2c) |
My takeaways here:
|
What I have some difficulties understanding is this: async def test_cancelled_raises_beyond_origin() -> None:
with CancelScope() as outer_scope:
with CancelScope() as inner_scope:
inner_scope.cancel()
try:
await checkpoint()
finally:
outer_scope.cancel()
pytest.fail("checkpoint should have raised")
assert not inner_scope.cancelled_caught
assert outer_scope.cancelled_caught By what logic should
Here, the inner scope should not have it set because it doesn't catch the Perhaps the logic is that, given that both cancel scopes were cancelled, it's the outer scope that's wholly responsible for the cancellation? And if the inner scope was shielded, it would be doing its own cancellation? But in that case, what would be the logic for I made two separate versions of this test case for clarity, and added comments reflecting how I think Trio's logic goes: async def test_cancelled_raises_beyond_origin_unshielded() -> None:
with CancelScope() as outer_scope:
with CancelScope() as inner_scope:
inner_scope.cancel()
try:
await checkpoint()
finally:
outer_scope.cancel()
pytest.fail("checkpoint should have raised")
# Here, the outer scope is responsible for the cancellation, so the inner scope
# won't catch the cancellation exception, but the outer scope will
assert not inner_scope.cancelled_caught
assert outer_scope.cancelled_caught
async def test_cancelled_raises_beyond_origin_shielded() -> None:
with CancelScope() as outer_scope:
with CancelScope(shield=True) as inner_scope:
inner_scope.cancel()
try:
await checkpoint()
finally:
outer_scope.cancel()
pytest.fail("checkpoint should have raised")
# Here, the inner scope is the one responsible for cancellation, and given that the
# outer scope was also cancelled, it is not considered to have "caught" the
# cancellation, even though it swallows it, because the inner scope triggered it
assert inner_scope.cancelled_caught
assert not outer_scope.cancelled_caught Could you validate or correct my assumptions here? |
After a day of debugging, I realized that I had gotten one thing wrong with my asyncio cancel scope fix: apparently when exiting a cancel scope, it should only reraise a cancellation exception if the inner scope was not shielded. I don't know how I could've deduced that from anything... |
It would seem that my solution of raising a new |
I've refactored the asyncio |
The pre-release version now works properly on 3.13.
I've still got a long review backlog after vacation + illness, sorry. Skimming this briefly, it might be easier to review if we separated out the drop-py-38 changes from the taskgroup and cancelscope changes. |
@gschaffner is reviewing this now, and I trust his insights in the matter. I wasn't going to drop 3.8 support, but it was a hard requirement to fix these issues. |
Trio's documentation is slightly wrong here. As Arthur pointed out on discourse, Trio's docs weren't updated (in various locations) when the behavior was changed in python-trio/trio#901. Reading through that PR and corresponding issue, I think it's clear that Trio's
(cc @arthur-tacca: Have you thought about raising a GH issue with Trio about the various places this is wrong in the documentation? I agree with your findings on Discourse and it seems like the docs should presumably get fixed upstream.)
I believe one wrong assumption here is that Trio is using "which scope is responsible for [catching] this cancellation?" logic.
|
Yeah. A comment that I would recommend reading if you haven't yet is python-trio/trio#860 (comment). For me at least, this is the clearest explanation of the behavior that I have found (although YMMV of course). The documentation of the behavior in the docs is quite buried and the docs are subtly wrong about it in various places (self-inconsistent) though :/ |
if not self.cancel_scope._effectively_cancelled: | ||
self.cancel_scope.cancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that this is incorrect. (However: this is not a regression here; it was already incorrect on master too but I noticed it during this review. See #787.).
Do you want to fix #787 in this PR alongside the other bugs? If so, I think suggested changes would be:
-
Suggested change
if not self.cancel_scope._effectively_cancelled: self.cancel_scope.cancel() self.cancel_scope.cancel() -
the hypothetical
task_done
changes I described in Child tasks don't cancel their group's scope correctly when they raise an exception on asyncio #787 (assuming that that approach is viable) -
regression test (
test_shield_after_task_failed
from there) -
changelog entry
task.cancel(f"Cancelled by cancel scope {id(origin):x}") | ||
if task is origin._host_task: | ||
origin._cancel_calls += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a case that's failing still:
async def test_cancelled_scope_based_checkpoint(self) -> None:
"""Regression test closely related to #698."""
with CancelScope() as outer_scope:
outer_scope.cancel()
try:
# The following three lines are a way to implement a checkpoint
# function. See also https://github.com/python-trio/trio/issues/860.
with CancelScope() as inner_scope:
inner_scope.cancel()
await sleep_forever()
finally:
assert cast(asyncio.Task, asyncio.current_task()).cancelling()
pytest.fail("checkpoint should have raised")
assert not cast(asyncio.Task, asyncio.current_task()).cancelling()
I believe this case should pass. In my toy implementation I fixed it via #774 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a note, in case needed in the the future: it was decided not to deal with this right now, and see if it causes issues. the discussion was at:
Co-authored-by: Ganden Schaffner <[email protected]>
For organization's sake/future reference, here's an enumeration of the potential upstream follow-ups here:
|
Changes
This fixes both cancel scope exit behavior on asyncio where an outer cancel scope has been cancelled, and constant unfruitful cancellation attempts of a task that is waiting on
TaskGroup__aexit__()
.Fixes #695.
Fixes #698.
Checklist
If this is a user-facing code change, like a bugfix or a new feature, please ensure that
you've fulfilled the following conditions (where applicable):
tests/
) added which would fail without your patchdocs/
, in case of behavior changes or newfeatures)
docs/versionhistory.rst
).If this is a trivial change, like a typo fix or a code reformatting, then you can ignore
these instructions.
Updating the changelog
If there are no entries after the last release, use
**UNRELEASED**
as the version.If, say, your patch fixes issue #123, the entry should look like this:
* Fix big bad boo-boo in task groups (#123 <https://github.com/agronholm/anyio/issues/123>_; PR by @yourgithubaccount)
If there's no issue linked, just link to your pull request instead by updating the
changelog after you've created the PR.