Use a small pool of workers to run postUnsealFuncs in parallel #18244

sgmiller · 2022-12-06T17:19:37Z

Initially size 32, configurable via a VAULT_POSTUNSEAL_FUNC_CONCURRENCY
variable in case of trouble.

vercel · 2022-12-06T17:23:11Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated
vault	✅ Ready (Inspect)	Visit Preview	Dec 6, 2022 at 5:47PM (UTC)

raskchanky · 2022-12-06T20:02:57Z

vault/core.go

+		}
+		for i, v := range c.postUnsealFuncs {
+			wg.Add(1)
+			workerChans[i%postUnsealFuncConcurrency] <- v


I don't know a terrible amount about c.postUnsealFuncs but if this variable contains truly independent functions that we can run in parallel, I wonder if it makes more sense to just loop over them and run each one in its own goroutine, vs having a worker pool with a static size.

My main concern with this particular piece is index wraparound when len(c.postUnsealFuncs) > postUnsealFuncConcurrency. At that point, you'll be overwriting the contents of the workerChans slice, no? Given the default size of 32, maybe the probability of this is low (I don't know what len(c.postUnsealFuncs) typically is), but since the size is tunable, it could be anything.

Thoughts?

The issue concretely is that we have some customers with 5k+ PKI mounts that need to be migrated when upgrading from <=1.10.x to 1.11+.

If we run 4k+ PKI mount migration in full parallel, we absolutely slam backend storage and also the Vault node's CPU, in addition to potentially having too many goroutines, thus starving Raft's heatbeats (if its in use).

Setting a relatively safe upper limit here is good I think. I'd like to maybe see 2x CPU count by default rather than a hard-coded default (just to prevent starvation on nodes with a single CPU core).

Each mount's migration/initialize func yes, is strictly independent, but the combination of work of all of them might be enough to bring down Vault if run in parallel.

Today, we run them strictly serially, which does avoid this problem, but means we strictly stall for IO, then do compute, then stall for more IO, &c.. --- and if each migration only takes 100ms, on 5k+ mounts, we're sitting north of 8 minutes to even unseal, which blows past the default context deadline.

Yeah, I'd thought about runtime.NumCPUS(), which I'm back to liking again.

cipherboy

Thank you!

vault/core.go

Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>

ncabatoff · 2022-12-12T21:02:55Z

vault/core.go

 				for v := range workerChans[i] {
-					v()
+					func() {
+						defer wg.Done()


We Add every time we send to a channel, so I agree we need to Done every time we read from it. Why the extra func/defer though? Couldn't we just have a loop body of { v(); wg.Done() }?

Although alternatively, we could instead Add once per postUnsealFuncConcurrency and Done when the worker exits.

Yeah, I suspect so. The defer would still let us Done in a panic, but we'd have bigger problems then.

* Initial worker pool * Run postUnsealFuncs in parallel * Use the old logic for P=1 * changelog * Use a CPU count relative worker pool * Update vault/core.go Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com> * Done must be called once per postUnsealFunc * Defer is overkill Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>

In further review with new understanding after #18244, loading configuration and CRLs within the backend's initialize function is the ideal approach: Factory construction is strictly serial, resulting in backend initialization blocking until config and CRLs are loaded. By using an InitializeFunc(...), we delay loading until after all backends are constructed (either right on startup in 1.12+, else during the initial PeriodicFunc(...) invocation on 1.11 and earlier). We also invoke initialize automatically on test Factory construction. Resolves: #17847 Co-authored-by: valli_0x <personallune@mail.ru> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>

* Move cert auth backend setup into initialize In further review with new understanding after #18244, loading configuration and CRLs within the backend's initialize function is the ideal approach: Factory construction is strictly serial, resulting in backend initialization blocking until config and CRLs are loaded. By using an InitializeFunc(...), we delay loading until after all backends are constructed (either right on startup in 1.12+, else during the initial PeriodicFunc(...) invocation on 1.11 and earlier). We also invoke initialize automatically on test Factory construction. Resolves: #17847 Co-authored-by: valli_0x <personallune@mail.ru> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> --------- Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Co-authored-by: valli_0x <personallune@mail.ru>

sgmiller added 4 commits December 6, 2022 11:12

Initial worker pool

f959284

Run postUnsealFuncs in parallel

bffd4e1

Use the old logic for P=1

eaf4518

changelog

daf8fdc

sgmiller requested a review from a team December 6, 2022 17:20

sgmiller added backport/1.11.x labels Dec 6, 2022

Merge remote-tracking branch 'origin/main' into parallize-initfuncs

f0503c4

vercel bot deployed to Preview December 6, 2022 17:47 View deployment

schultz-is approved these changes Dec 6, 2022

View reviewed changes

raskchanky reviewed Dec 6, 2022

View reviewed changes

Use a CPU count relative worker pool

4c06bd1

cipherboy approved these changes Dec 12, 2022

View reviewed changes

Merge remote-tracking branch 'origin/main' into parallize-initfuncs

1e384f9

vercel bot deployed to Preview December 12, 2022 18:45 View deployment

ncabatoff reviewed Dec 12, 2022

View reviewed changes

vault/core.go Show resolved Hide resolved

ncabatoff reviewed Dec 12, 2022

View reviewed changes

vault/core.go Outdated Show resolved Hide resolved

sgmiller and others added 2 commits December 12, 2022 14:26

Update vault/core.go

d061e71

Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>

Done must be called once per postUnsealFunc

e221044

ncabatoff reviewed Dec 12, 2022

View reviewed changes

sgmiller added 2 commits December 12, 2022 15:06

Defer is overkill

db26e32

Merge remote-tracking branch 'origin/main' into parallize-initfuncs

fcb7e5d

sgmiller merged commit 3c842fb into main Dec 12, 2022

sgmiller deleted the parallize-initfuncs branch December 12, 2022 23:07

This was referenced Dec 12, 2022

Backport of Use a small pool of workers to run postUnsealFuncs in parallel into release/1.11.x #18324

Closed

Backport of Use a small pool of workers to run postUnsealFuncs in parallel into release/1.12.x #18325

Closed

cipherboy mentioned this pull request Jan 24, 2023

PKI mounts may end up in a broken state due to intermittent storage errors #18583

Closed

cipherboy mentioned this pull request Jan 27, 2023

Move cert auth backend setup into initialize #18885

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a small pool of workers to run postUnsealFuncs in parallel #18244

Use a small pool of workers to run postUnsealFuncs in parallel #18244

sgmiller commented Dec 6, 2022

vercel bot commented Dec 6, 2022 •

edited

Loading

raskchanky Dec 6, 2022

cipherboy Dec 6, 2022

sgmiller Dec 12, 2022

cipherboy left a comment

ncabatoff Dec 12, 2022

ncabatoff Dec 12, 2022

sgmiller Dec 12, 2022

Use a small pool of workers to run postUnsealFuncs in parallel #18244

Use a small pool of workers to run postUnsealFuncs in parallel #18244

Conversation

sgmiller commented Dec 6, 2022

vercel bot commented Dec 6, 2022 • edited Loading

raskchanky Dec 6, 2022

Choose a reason for hiding this comment

cipherboy Dec 6, 2022

Choose a reason for hiding this comment

sgmiller Dec 12, 2022

Choose a reason for hiding this comment

cipherboy left a comment

Choose a reason for hiding this comment

ncabatoff Dec 12, 2022

Choose a reason for hiding this comment

ncabatoff Dec 12, 2022

Choose a reason for hiding this comment

sgmiller Dec 12, 2022

Choose a reason for hiding this comment

vercel bot commented Dec 6, 2022 •

edited

Loading