Skip to content

feat(spanner): support MultiEndpoint#9565

Merged
rahul2393 merged 5 commits intogoogleapis:mainfrom
nimf:grpc-gcp
Apr 30, 2024
Merged

feat(spanner): support MultiEndpoint#9565
rahul2393 merged 5 commits intogoogleapis:mainfrom
nimf:grpc-gcp

Conversation

@nimf
Copy link
Contributor

@nimf nimf commented Mar 12, 2024

This PR adds optional gRPC-GCP channel pool (which is available in Cloud Spanner Java client) with its new MultiEndpoint feature.

The MultiEndpoint accepts a grouped lists of endpoints and creates a gRPC-GCP channel pool for each unique endpoint. The purposes of MultiEndpoint are:

  • Fallback to an alternative endpoint (host:port) when the original
    endpoint is completely unavailable.
  • Be able to route a Cloud Spanner call to a specific group of endpoints.
  • Be able to reconfigure endpoints in runtime.

Here is an example of creating a client with MultiEndpoint

gmeConfig := &grpcgcp.GCPMultiEndpointOptions{
	MultiEndpoints: map[string]*multiendpoint.MultiEndpointOptions{
		"default": {
			Endpoints: []string{
				"psc-east4:443",
			},
		},
	},
	Default: "default",
}

client, gme, err := NewMultiEndpointClient(ctx, database, gmeConfig)

Endpoints reconfiguration

Then, with the above example, to route the calls to a different endpoint we can:

gmeConfig = &grpcgcp.GCPMultiEndpointOptions{
	MultiEndpoints: map[string]*multiendpoint.MultiEndpointOptions{
		"default": {
			Endpoints: []string{
				"psc-west2:443",
			},
		},
	},
	Default: "default",
}

err := gme.UpdateMultiEndpoints(gmeConfig)

And all the following calls in the client will go to the west2 endpoint instead of east4.

Using several MultiEndpoints

We can configure several MultiEndpoints, for example:

gmeConfig := &grpcgcp.GCPMultiEndpointOptions{
	MultiEndpoints: map[string]*multiendpoint.MultiEndpointOptions{
		"default": {
			Endpoints: []string{
				"psc-east4:443",
			},
		},
		"west": {
			Endpoints: []string{
				"psc-west2:443",
			},
		},
	},
	Default: "default",
}

client, gme, err := NewMultiEndpointClient(ctx, database, gmeConfig)

Then we can route a call to a specific MultiEndpoint ("default" is used by default) using context:

westCtx := grpcgcp.NewMEContext(ctx, "west")

iter := client.Single().Query(westCtx, spanner.NewStatement(sql))

The call above will use an endpoint from the "west" MultiEndpoint group.

Fallback

If we add more than one endpoint to a MultiEndpoint, like in the example below:

gmeConfig := &grpcgcp.GCPMultiEndpointOptions{
	MultiEndpoints: map[string]*multiendpoint.MultiEndpointOptions{
		"default": {
			Endpoints: []string{
				"psc-east4:443",
				"psc-east4-fallback:443",
				"psc-west2:443",
				"psc-west2-fallback:443",
			},
			RecoveryTimeout: recoveryTimeout,
			SwitchingDelay:  switchingDelay,
		},
		"west": {
			Endpoints: []string{
				"psc-west2:443",
				"psc-west2-fallback:443",
				"psc-east4:443",
				"psc-east4-fallback:443",
			},
			RecoveryTimeout: recoveryTimeout,
			SwitchingDelay:  switchingDelay,
		},
	},
	Default: "default",
}

client, gme, err := NewMultiEndpointClient(ctx, database, gmeConfig)

Each MultiEndpoint will use the first endpoint in the list as long as any connection to that endpoint is alive.
When all connections to the first endpoint are broken, the MultiEndpoint will wait for recoveryTimeout and if the connections are still down, fail over to the next endpoint and so forth.
When a connection to one of the previous endpoints in the list is recovered, the MultiEndpoint will switch back to that endpoint after switchingDelay.

@nimf nimf requested a review from a team as a code owner March 12, 2024 22:25
@nimf nimf requested a review from a team March 12, 2024 22:25
@codyoss
Copy link
Member

codyoss commented Mar 13, 2024

please updated your title to be in the form of feat(spanner): ... and include a little more detail in the commit title.

@nimf nimf changed the title feat: MultiEndpoint feat(spanner): MultiEndpoint Mar 13, 2024
@rahul2393 rahul2393 added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 30, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 30, 2024
@rahul2393 rahul2393 changed the title feat(spanner): MultiEndpoint feat(spanner): support MultiEndpoint Apr 30, 2024
@rahul2393 rahul2393 merged commit 0ac0d26 into googleapis:main Apr 30, 2024
@nimf nimf deleted the grpc-gcp branch May 6, 2024 22:26
option.WithoutAuthentication(),
internaloption.SkipDialSettingsValidation(),
}
opts = append(opts, emulatorOpts...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that mutating a var args parameter (opts) is dangerous. This code should be making a copy first.

https://go.dev/play/p/UAEsaz4xxXg

This is a contrived example to illustrate what's happening, but there is real risk in mutating a variadic argument, especially for exported APIs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also fixing in #11151

Comment on lines +209 to +213
copts := opts

for _, do := range dopts {
copts = append(copts, option.WithGRPCDialOption(do))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could potentially be even worse. If the DialFunc is run concurrently in the ChannelPool, then this function will be overwriting memory in one goroutine while reading it in another, which could cause serious problems, and should be expected to trigger the race detector if nothing else. It looks like that doesn't happen today - grpc clients are created serially, but I could see a user complaining about startup performance and sending a pr that does them each in their own goroutine, and then this becomes broken as a result. A copy should ideally be made before mutating the opts slice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, that wasn't intended. Thank you for pointing this out. Sending PR #11151 to fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants