Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delay deciding which cancel scope a Cancelled exception belongs to #860

Closed
njsmith opened this issue Jan 17, 2019 · 8 comments
Closed

Delay deciding which cancel scope a Cancelled exception belongs to #860

njsmith opened this issue Jan 17, 2019 · 8 comments

Comments

@njsmith
Copy link
Member

njsmith commented Jan 17, 2019

Right now, whenever we create a Cancelled exception, we immediately tag it with the cancel scope that's going to catch it.

There's another option that's briefly mentioned in #606: we could wait until CancelScope.__exit__ to decide whether a given Cancelled exception "belongs" to this scope. The check would be "on the current task's stack, am I the outermost unshielded scope that's in the cancelled state?"

This can behave differently from what we have now in cases where there are multiple cancelled CancelScopes on the stack.

I'm not entirely sure if this would actually help with #606, because of complications noted there (hello KeyboardInterrupt). But it would probably simplify the cancel scope code in general, in particular _cancel_no_notify, and the issues described in this comment: #835 (comment)

@oremanj
Copy link
Member

oremanj commented Feb 4, 2019

I just posted #900 which offers a less elegant, but also less invasive, solution to this problem. I prefer the solution in this issue on aesthetic grounds but I couldn't convince myself that it would be safe in the presence of a cancellation landscape that might be changing as the Cancelled exception propagates. Also, it makes the delivery of a cancellation be quadratic in the depth of the cancel stack, but probably that doesn't add up to much in practice?

@oremanj
Copy link
Member

oremanj commented Feb 4, 2019

OK, let's try doing this analysis semi-rigorously.

I'm interpreting "outermost unshielded scope that's in the cancelled state" here as "the thing that pending_cancel_scope() returns", with the properties:

  • it's always a cancel scope with cancel_called = True, or None
  • it's only None if the innermost cancelled scope encloses the innermost shielded scope

Looking from the perspective of a single CancelScope.__exit__ with a Cancelled exception propagating:

  • If cancel_called is false, we can't possibly be the target of this exception; let it continue.
  • If this scope is pending_cancel_scope(), swallow the exception and set cancelled_caught, we're done. By premise, we did the right thing for the current state of the cancel stack.
  • If not, then since we're at the top of the cancel stack, whatever pending_cancel_scope() returned must be outside us.
    • It can't be None, since you only get None if the innermost shield is further in than the innermost cancelled scope, and we're a cancelled scope and we're the furthest-in thing there is.
    • So it must be some cancel scope outside us (call it C), and we should let the exception keep propagating so they can catch it.
      • By the time the exception gets to C, the result of pending_cancel_scope() might well have changed. But for C to have been returned from pending_cancel_scope() in the first place, it must have had cancel_called = True, and cancel_called never goes from True to False. So if the exception gets to C, either pending_cancel_scope() will return C (i.e., nothing changed), or it will return some scope outside C, in which case it will be proper for C to let the exception continue propagating to that further-out scope. It can't return None for C for the same reasons it couldn't return None for us: C is the innermost CancelScope at that point and it's cancelled, no possibility for a shield further in.
      • Something something induction eventually we run out of cancel stack.

I think my conclusion here is that I was being overly conservative, this will work fine, and we can drop #900. But I'd like another set of eyes on it before being quite sure. :-)

@njsmith
Copy link
Member Author

njsmith commented Feb 4, 2019

couldn't convince myself that it would be safe in the presence of a cancellation landscape that might be changing as the Cancelled exception propagates

Good question, yeah.

Right now, at the moment when we inject the Cancelled exception, we look at a single consistent snapshot of the whole stack, determine where the outermost-unshielded-cancelled-scope is, and then lock that in as the destination of the exception.

In this proposal, each time the exception passes through a cancel scope, we look at a single consistent snapshot of the whole stack, and determine whether the current scope is the outermost-unshielded-cancelled-scope. (So the naive implementation is O(stack depth * number of scopes in the cancel-requested state), which is probably much better than O(stack depth * stack depth), and I think this could be optimized further.)

Now, if we discover that this is the outermost-unshielded-cancelled-scope, and catch the exception, then that's definitely OK, because it was the right thing to do at the moment we did it, and then there's nothing afterwards that could go wrong.

The riskier case is when we decide not to catch the exception. We know that at the moment we made the decision, there was some outer scope that was a better candidate for catching it. But, what it won't actually be caught until later. Could something happen between now and then that would mess things up? In particular, the thing that would be bad is if we got into a state where there was a live Cancelled propagating, but there was no outer scope that was a candidate for catching it. So, could that outer scope somehow stop being a candidate between when we raise the exception and when the exception reaches it?

To be a candidate, a scope has to meet two criteria:

  • It has to be in the cancel-requested state
  • There can't be any other scope that's in the cancel-requested state and that falls between our candidate and the next outermost shield

Once a scope is in the cancel-requested state, it can never leave it. So if we identified a scope as being a candidate at time T1, then for all times T2 > T1 it is definitely still in the cancel-requested state. But what about the second criterion? It could certainly happen that our original candidate stops fulfilling that – while our Cancelled is propagating, an outer scope could enter the cancel-requested state, or a shield could be removed to reveal an outer scope that was already in the cancel-requested state. But, in all of these cases, it means that our candidate is replaced by another, even-more-outer candidate.

Therefore, if in a snapshot at time T1 we decide that our exception has an outer candidate, then we are guaranteed that it will eventually be caught at time T2, even if it's by a different scope than the one we originally thought.

Ah, and I see you've just posted osmething too! I will read it.

@njsmith
Copy link
Member Author

njsmith commented Feb 4, 2019

OK looks like we made pretty much the same argument :-)

@oremanj
Copy link
Member

oremanj commented Feb 4, 2019

Put another way, if a cancel scope does interpret a Cancelled exception as being "for us", we can assume that determination is correct (as a matter of definition - the mapping from cancel stacks to "which scope is cancelled here right now" defines the cancellation semantics, we're just following it). So the only real correctness concern is that everyone could conclude "not mine!" and let a Cancelled exception fly out of the entire program. But for a cancelled scope to say "not mine!", it has to find some specific other further-out cancelled scope that it believes will say "mine!", based on the current state of the cancel stack. Since scopes never become uncancelled, at minimum the outermost cancelled scope will be forced to conclude "mine!". So as long as we never throw Cancelled into a task without any cancelled scopes on its cancel stack, we'll be OK.

@oremanj
Copy link
Member

oremanj commented Feb 4, 2019

Great, I'm sufficiently convinced now and will close #900 and work on this instead. :-)

@oremanj
Copy link
Member

oremanj commented Feb 4, 2019

Running the testsuite with the new behavior exposed a corner case I wasn't expecting, though I don't think it's actually harmful. If you have multiple tasks that are sleeping until the same time using trio.sleep(), and when one of them wakes up it cancels the others, the others will raise Cancelled out of that same call to trio.sleep() -- because sleep() is implemented using cancellation, and when the Cancelled reaches it, it sees there's a further-out cancelled scope that it should prefer instead.

@njsmith
Copy link
Member Author

njsmith commented Feb 4, 2019

Oh yeah, this will definitely change behavior in some edge cases. And some of our low-level tests are very reliant on edge cases, or even checking for them explicitly, so they'll need a bit of tweaking for sure...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants