Delay deciding which cancel scope a Cancelled exception belongs to #860

njsmith · 2019-01-17T09:49:28Z

Right now, whenever we create a Cancelled exception, we immediately tag it with the cancel scope that's going to catch it.

There's another option that's briefly mentioned in #606: we could wait until CancelScope.__exit__ to decide whether a given Cancelled exception "belongs" to this scope. The check would be "on the current task's stack, am I the outermost unshielded scope that's in the cancelled state?"

This can behave differently from what we have now in cases where there are multiple cancelled CancelScopes on the stack.

I'm not entirely sure if this would actually help with #606, because of complications noted there (hello KeyboardInterrupt). But it would probably simplify the cancel scope code in general, in particular _cancel_no_notify, and the issues described in this comment: #835 (comment)

The text was updated successfully, but these errors were encountered:

oremanj · 2019-02-04T00:02:42Z

I just posted #900 which offers a less elegant, but also less invasive, solution to this problem. I prefer the solution in this issue on aesthetic grounds but I couldn't convince myself that it would be safe in the presence of a cancellation landscape that might be changing as the Cancelled exception propagates. Also, it makes the delivery of a cancellation be quadratic in the depth of the cancel stack, but probably that doesn't add up to much in practice?

oremanj · 2019-02-04T00:29:18Z

OK, let's try doing this analysis semi-rigorously.

I'm interpreting "outermost unshielded scope that's in the cancelled state" here as "the thing that pending_cancel_scope() returns", with the properties:

it's always a cancel scope with cancel_called = True, or None
it's only None if the innermost cancelled scope encloses the innermost shielded scope

Looking from the perspective of a single CancelScope.__exit__ with a Cancelled exception propagating:

If cancel_called is false, we can't possibly be the target of this exception; let it continue.
If this scope is pending_cancel_scope(), swallow the exception and set cancelled_caught, we're done. By premise, we did the right thing for the current state of the cancel stack.
If not, then since we're at the top of the cancel stack, whatever pending_cancel_scope() returned must be outside us.
- It can't be None, since you only get None if the innermost shield is further in than the innermost cancelled scope, and we're a cancelled scope and we're the furthest-in thing there is.
- So it must be some cancel scope outside us (call it C), and we should let the exception keep propagating so they can catch it.
  - By the time the exception gets to C, the result of pending_cancel_scope() might well have changed. But for C to have been returned from pending_cancel_scope() in the first place, it must have had cancel_called = True, and cancel_called never goes from True to False. So if the exception gets to C, either pending_cancel_scope() will return C (i.e., nothing changed), or it will return some scope outside C, in which case it will be proper for C to let the exception continue propagating to that further-out scope. It can't return None for C for the same reasons it couldn't return None for us: C is the innermost CancelScope at that point and it's cancelled, no possibility for a shield further in.
  - Something something induction eventually we run out of cancel stack.

I think my conclusion here is that I was being overly conservative, this will work fine, and we can drop #900. But I'd like another set of eyes on it before being quite sure. :-)

njsmith · 2019-02-04T00:35:43Z

couldn't convince myself that it would be safe in the presence of a cancellation landscape that might be changing as the Cancelled exception propagates

Good question, yeah.

Right now, at the moment when we inject the Cancelled exception, we look at a single consistent snapshot of the whole stack, determine where the outermost-unshielded-cancelled-scope is, and then lock that in as the destination of the exception.

In this proposal, each time the exception passes through a cancel scope, we look at a single consistent snapshot of the whole stack, and determine whether the current scope is the outermost-unshielded-cancelled-scope. (So the naive implementation is O(stack depth * number of scopes in the cancel-requested state), which is probably much better than O(stack depth * stack depth), and I think this could be optimized further.)

Now, if we discover that this is the outermost-unshielded-cancelled-scope, and catch the exception, then that's definitely OK, because it was the right thing to do at the moment we did it, and then there's nothing afterwards that could go wrong.

The riskier case is when we decide not to catch the exception. We know that at the moment we made the decision, there was some outer scope that was a better candidate for catching it. But, what it won't actually be caught until later. Could something happen between now and then that would mess things up? In particular, the thing that would be bad is if we got into a state where there was a live Cancelled propagating, but there was no outer scope that was a candidate for catching it. So, could that outer scope somehow stop being a candidate between when we raise the exception and when the exception reaches it?

To be a candidate, a scope has to meet two criteria:

It has to be in the cancel-requested state
There can't be any other scope that's in the cancel-requested state and that falls between our candidate and the next outermost shield

Once a scope is in the cancel-requested state, it can never leave it. So if we identified a scope as being a candidate at time T1, then for all times T2 > T1 it is definitely still in the cancel-requested state. But what about the second criterion? It could certainly happen that our original candidate stops fulfilling that – while our Cancelled is propagating, an outer scope could enter the cancel-requested state, or a shield could be removed to reveal an outer scope that was already in the cancel-requested state. But, in all of these cases, it means that our candidate is replaced by another, even-more-outer candidate.

Therefore, if in a snapshot at time T1 we decide that our exception has an outer candidate, then we are guaranteed that it will eventually be caught at time T2, even if it's by a different scope than the one we originally thought.

Ah, and I see you've just posted osmething too! I will read it.

njsmith · 2019-02-04T00:36:38Z

OK looks like we made pretty much the same argument :-)

oremanj · 2019-02-04T00:37:13Z

Put another way, if a cancel scope does interpret a Cancelled exception as being "for us", we can assume that determination is correct (as a matter of definition - the mapping from cancel stacks to "which scope is cancelled here right now" defines the cancellation semantics, we're just following it). So the only real correctness concern is that everyone could conclude "not mine!" and let a Cancelled exception fly out of the entire program. But for a cancelled scope to say "not mine!", it has to find some specific other further-out cancelled scope that it believes will say "mine!", based on the current state of the cancel stack. Since scopes never become uncancelled, at minimum the outermost cancelled scope will be forced to conclude "mine!". So as long as we never throw Cancelled into a task without any cancelled scopes on its cancel stack, we'll be OK.

oremanj · 2019-02-04T00:38:36Z

Great, I'm sufficiently convinced now and will close #900 and work on this instead. :-)

oremanj · 2019-02-04T01:12:52Z

Running the testsuite with the new behavior exposed a corner case I wasn't expecting, though I don't think it's actually harmful. If you have multiple tasks that are sleeping until the same time using trio.sleep(), and when one of them wakes up it cancels the others, the others will raise Cancelled out of that same call to trio.sleep() -- because sleep() is implemented using cancellation, and when the Cancelled reaches it, it sees there's a further-out cancelled scope that it should prefer instead.

njsmith · 2019-02-04T01:30:48Z

Oh yeah, this will definitely change behavior in some edge cases. And some of our low-level tests are very reliant on edge cases, or even checking for them explicitly, so they'll need a bit of tweaking for sure...

oremanj mentioned this issue Jan 19, 2019

Add support for unbound cancel scopes #835

Merged

This was referenced Jan 28, 2019

Deterministic scheduling via Hypothesis python-trio/pytest-trio#73

Merged

Add trio.hazmat.batch_cancellations() #900

Closed

oremanj mentioned this issue Feb 4, 2019

Delay deciding which cancel scope a Cancelled exception belongs to #901

Merged

oremanj closed this as completed in #901 Feb 5, 2019

arthur-tacca mentioned this issue Mar 14, 2024

Nested TaskGroup can silently swallow cancellation request from parent TaskGroup python/cpython#116720

Closed

gschaffner mentioned this issue Sep 2, 2024

Fixed TaskGroup and CancelScope exit issues on asyncio agronholm/anyio#774

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delay deciding which cancel scope a Cancelled exception belongs to #860

Delay deciding which cancel scope a Cancelled exception belongs to #860

njsmith commented Jan 17, 2019 •

edited

Loading

oremanj commented Feb 4, 2019

oremanj commented Feb 4, 2019

njsmith commented Feb 4, 2019

njsmith commented Feb 4, 2019

oremanj commented Feb 4, 2019

oremanj commented Feb 4, 2019

oremanj commented Feb 4, 2019

njsmith commented Feb 4, 2019

Delay deciding which cancel scope a Cancelled exception belongs to #860

Delay deciding which cancel scope a Cancelled exception belongs to #860

Comments

njsmith commented Jan 17, 2019 • edited Loading

oremanj commented Feb 4, 2019

oremanj commented Feb 4, 2019

njsmith commented Feb 4, 2019

njsmith commented Feb 4, 2019

oremanj commented Feb 4, 2019

oremanj commented Feb 4, 2019

oremanj commented Feb 4, 2019

njsmith commented Feb 4, 2019

njsmith commented Jan 17, 2019 •

edited

Loading