Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a possible data race with rollback manager and plugin reload #19468

Merged
merged 8 commits into from
Mar 14, 2023

Conversation

fairclothjm
Copy link
Contributor

@fairclothjm fairclothjm commented Mar 6, 2023

The data race occurs when

  • Rollback manager triggers a rollback and performs a READ on RouteEntry.backend in MatchingBackend().
  • A reload command is issued and reloadBackendCommon() performs a WRITE on RouteEntry.backend

@fairclothjm fairclothjm requested a review from a team March 6, 2023 21:57
@fairclothjm fairclothjm changed the title fix data race on plugin reload fix data race on plugin reload and rollback Mar 6, 2023
@fairclothjm fairclothjm changed the title fix data race on plugin reload and rollback Fix a possible data race with rollback manager and plugin reload Mar 6, 2023
Copy link
Contributor

@vinay-gopalan vinay-gopalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

vault/router.go Outdated Show resolved Hide resolved
@raymonstah
Copy link
Contributor

Would it be simple to write a unit test that validates this fix?

@fairclothjm
Copy link
Contributor Author

Would it be simple to write a unit test that validates this fix?

@raymonstah Good call! I didn't think it would be possible but your comment led me to look a little closer and I realized I could set the RollbackPeriod for the test to a small value. This in turn led me to discover that I was not using the appropriate lock to address the race condition. I have updated this PR to use the lock on the routeEntry instead of the Router.

Comment on lines +467 to +470
re.l.RLock()
defer re.l.RUnlock()

return re.backend
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be worthwhile to extract these lines to a helper method so that clients don't have to remember to place a lock when reading backend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sympathetic to this. However, I have a few concerns

  • use of routeEntry all happens within the vault package so to be sure that clients use the helper methods we would want to refactor this to another package and make the fields private so that access to the fields require getters that hold locks, but
  • this code is in the hot path of vault requests so I am hesitant to do too much refactoring without involving the vault Core team

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I will leave this as-is and reach out to Core to see what they think about any more changes to the routeEntry

@fairclothjm fairclothjm merged commit 28537ef into main Mar 14, 2023
@fairclothjm fairclothjm deleted the fix-plugin-reload-race branch March 14, 2023 14:36
raymonstah pushed a commit that referenced this pull request Mar 17, 2023
)

* fix data race on plugin reload

* add changelog

* add comment for posterity

* revert comment and return assignment in router.go

* rework plugin continue on error tests to use compilePlugin

* fix race condition on route entry

* add test for plugin reload and rollback race detection

* add go doc for test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants