-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ready-cache: Ensure cancelation can be observed #668
Conversation
`tokio::task` enforces a cooperative scheduling regime that can cause `oneshot::Receiver::poll` to return pending after the sender has sent an update. `ReadyCache` uses a oneshot to notify pending services that they should not become ready. When a cancelation is not observed, the ready cache return service instances that should have been canceled, which breaks assumptions and causes an invalid state. Fixes #415 Co-authored-by: Eliza Weisman <eliza@buoyant.io>
tower-rs/tower#668 fixes a bug that can cause the load balancer to fail to properly observe cancelations when an endpoint is replaced or removed. This change updates the proxy to use tower from a fixed git SHA. Signed-off-by: Oliver Gould <ver@buoyant.io>
tower-rs/tower#668 fixes a bug that can cause the load balancer to fail to properly observe cancelations when an endpoint is replaced or removed. This change updates the proxy to use tower from a fixed git SHA. Signed-off-by: Oliver Gould <ver@buoyant.io>
@carllerche points out that, even with `unconstrained`, there are no guarantees that a `oneshot::Receiver` will observe the sent value immediately, even when using `tokio::task::unconstrained`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is great, i think we should go ahead and merge it, release a 0.4.13, and then get it forward-ported to master
!
i commented on some very minor style nits, but there are no blockers here IMO!
tower/src/ready_cache/cache.rs
Outdated
if let Poll::Ready(r) = Pin::new(&mut fut).poll(cx) { | ||
// This MUST return ready as soon as the sender has been notified so | ||
// that we don't return a service that has been canceled, so we disable | ||
// cooperative scheduling on the receiver. Otherwise, the receiver can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tiny nit:
// cooperative scheduling on the receiver. Otherwise, the receiver can | |
// cooperative yielding on the receiver. Otherwise, the receiver can |
it's all cooperative scheduling :)
tower/src/ready_cache/cache.rs
Outdated
// This MUST return ready as soon as the sender has been notified so | ||
// that we don't return a service that has been canceled, so we disable | ||
// cooperative scheduling on the receiver. Otherwise, the receiver can | ||
// sporadically return pending even though the sender has fired. | ||
let mut cancel = | ||
tokio::task::unconstrained(self.cancel.as_mut().expect("polled after complete")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style nit, take it or leave it: might consider something like this, so that it's very obvious that the comment is referring to the use of unconstrained
:
// This MUST return ready as soon as the sender has been notified so | |
// that we don't return a service that has been canceled, so we disable | |
// cooperative scheduling on the receiver. Otherwise, the receiver can | |
// sporadically return pending even though the sender has fired. | |
let mut cancel = | |
tokio::task::unconstrained(self.cancel.as_mut().expect("polled after complete")); | |
let cancel = self.cancel.as_mut().expect("polled after complete"); | |
// This MUST return ready as soon as the sender has been notified so | |
// that we don't return a service that has been canceled, so we disable | |
// cooperative scheduling on the receiver. Otherwise, the receiver can | |
// sporadically return pending even though the sender has fired. | |
let mut cancel = tokio::task::unconstrained(cancel); |
Signed-off-by: Oliver Gould <ver@buoyant.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think this use of Notify
is actually correct since we are dropping the Notified
future at the end of the poll. i have a thought on what I think is the best way to do this, i can open a PR against this branch?
/// A `tokio::sync::oneshot` is NOT used, as a `Receiver` is not guaranteed to | ||
/// observe results as soon as a `Sender` fires. Using an `AtomicBool` allows | ||
/// the state to be observed as soon as the cancelation is triggered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for the comment
This replaces the use of `tokio::sync::Notify` with `futures::task::AtomicWaker`. `Notify` requires the `Notified` future to be held in order to remain interested in the notification. Dropping the future returned by `Notify::notified` cancels interest in the notification. So, the current code using `Notify` is incorrect, as the `Notified` future is created on each poll and then immediately dropped, releasing interest in the wakeup. We could solve this by storing the `Notify::notified` future in the `Pending` future, but this would be a bit of a pain, as the `Notified` future borrows the `Notify`. I thought that it was a nicer alternative to just rewrite this code to use `AtomicWaker` instead. Also, `AtomicWaker` is a much simpler, lower-level primitive. It simply stores a single waker that can wake up a single task. `Notify` is capable of waking multiple tasks, either in order or all at once, which makes it more complex.
`tokio::task` enforces a cooperative scheduling regime that can cause `oneshot::Receiver::poll` to return pending after the sender has sent an update. `ReadyCache` uses a oneshot to notify pending services that they should not become ready. When a cancelation is not observed, the ready cache return service instances that should have been canceled, which breaks assumptions and causes an invalid state. This branch replaces the use of `tokio::sync::oneshot` for canceling pending futures with a custom cancelation handle using an `AtomicBool` and `futures::task::AtomicWaker`. This ensures that canceled `Pending` services are always woken even when the task's budget is exceeded. Additionally, cancelation status is now always known to the `Pending` future, by checking the `AtomicBool` immediately on polls, even in cases where the canceled `Pending` future was woken by the inner `Service` becoming ready, rather than by the cancelation. Fixes #415 Signed-off-by: Oliver Gould <ver@buoyant.io> Co-authored-by: Eliza Weisman <eliza@buoyant.io>
`tokio::task` enforces a cooperative scheduling regime that can cause `oneshot::Receiver::poll` to return pending after the sender has sent an update. `ReadyCache` uses a oneshot to notify pending services that they should not become ready. When a cancelation is not observed, the ready cache return service instances that should have been canceled, which breaks assumptions and causes an invalid state. This branch replaces the use of `tokio::sync::oneshot` for canceling pending futures with a custom cancelation handle using an `AtomicBool` and `futures::task::AtomicWaker`. This ensures that canceled `Pending` services are always woken even when the task's budget is exceeded. Additionally, cancelation status is now always known to the `Pending` future, by checking the `AtomicBool` immediately on polls, even in cases where the canceled `Pending` future was woken by the inner `Service` becoming ready, rather than by the cancelation. Fixes #415 Signed-off-by: Oliver Gould <ver@buoyant.io> Co-authored-by: Eliza Weisman <eliza@buoyant.io>
# 0.4.13 (June 17, 2022) ### Added - **load_shed**: Public constructor for `Overloaded` error ([#661]) ### Fixed - **util**: Fix hang with `call_all` when the `Stream` of requests is pending ([#656]) - **ready_cache**: Ensure cancelation is observed by pending services ([#668], fixes [#415]) - **docs**: Fix a missing section header due to a typo ([#646]) - **docs**: Fix broken links to `Service` trait ([#659]) [#661]: #661 [#656]: #656 [#668]: #668 [#415]: #415 [#646]: #646 [#659]: #659
# 0.4.13 (June 17, 2022) ### Added - **load_shed**: Public constructor for `Overloaded` error ([#661]) ### Fixed - **util**: Fix hang with `call_all` when the `Stream` of requests is pending ([#656]) - **ready_cache**: Ensure cancelation is observed by pending services ([#668], fixes [#415]) - **docs**: Fix a missing section header due to a typo ([#646]) - **docs**: Fix broken links to `Service` trait ([#659]) [#661]: #661 [#656]: #656 [#668]: #668 [#415]: #415 [#646]: #646 [#659]: #659
tokio::task
enforces a cooperative scheduling regime that can causeoneshot::Receiver::poll
to return pending after the sender has sent anupdate.
ReadyCache
uses a oneshot to notify pending services that theyshould not become ready. When a cancelation is not observed, the ready
cache return service instances that should have been canceled, which
breaks assumptions and causes an invalid state.
Fixes #415
Co-authored-by: Eliza Weisman eliza@buoyant.io
This is based on tower-0.4.12. We probably need to make a an 0.4 branch to get a bugfix release out.