The storage controller sits between administrative API clients and pageservers, and handles the details of mapping tenants to pageserver tenant shards. For example, creating a tenant is one API call to the storage controller, which is mapped into many API calls to many pageservers (for multiple shards, and for secondary locations).
It implements a pageserver-compatible API that may be used for CRUD operations on tenants and timelines, translating these requests into appropriate operations on the shards within a tenant, which may be on many different pageservers. Using this API, the storage controller may be used in the same way as the pageserver's administrative HTTP API, hiding the underlying details of how data is spread across multiple nodes.
The storage controller also manages generations, high availability (via secondary locations) and live migrations for tenants under its management. This is done with a reconciliation loop pattern, where tenants have an “intent” state and a “reconcile” task that tries to make the outside world match the intent.
The storage controller’s HTTP server implements four logically separate APIs:
/v1/...
path is the pageserver-compatible API. This has to be at the path root because that’s where clients expect to find it on a pageserver./control/v1/...
path is the storage controller’s API, which enables operations such as registering and management pageservers, or executing shard splits./debug/v1/...
path contains endpoints which are either exclusively used in tests, or are for use by engineers when supporting a deployed system./upcall/v1/...
path contains endpoints that are called by pageservers. This includes the/re-attach
and/validate
APIs used by pageservers to ensure data safety with generation numbers.
The API is authenticated with a JWT token, and tokens must have scope pageserverapi
(i.e. the same scope as pageservers’ APIs).
See the http.rs
file in the source for where the HTTP APIs are implemented.
The storage controller uses a postgres database to persist a subset of its state. Note that the storage controller does not keep all its state in the database: this is a design choice to enable most operations to be done efficiently in memory, rather than having to read from the database. See persistence.rs
for a more comprehensive comment explaining what we do and do not persist: a useful metaphor is that we persist objects like tenants and nodes, but we do not
persist the relationships between them: the attachment state of a tenant's shards to nodes is kept in memory and
rebuilt on startup.
The file persistence.rs
contains all the code for accessing the database, and has a large doc comment that goes into more detail about exactly what we persist and why.
The diesel
crate is used for defining models & migrations.
Running a local cluster with cargo neon
automatically starts a vanilla postgress process to host the storage controller’s database.
If you need to modify the database schema, here’s how to create a migration:
- Install the diesel CLI with
cargo install diesel_cli
- Use
diesel migration generate <name>
to create a new migration - Populate the SQL files in the
migrations/
subdirectory - Use
DATABASE_URL=... diesel migration run
to apply the migration you just wrote: this will update the[schema.rs](http://schema.rs)
file automatically.- This requires a running database: the easiest way to do that is to just run
cargo neon init ; cargo neon start
, which will leave a database available atpostgresql://localhost:1235/storage_controller
- This requires a running database: the easiest way to do that is to just run
- Commit the migration files and the changes to schema.rs
- If you need to iterate, you can rewind migrations with
diesel migration revert -a
and thendiesel migration run
again. - The migrations are build into the storage controller binary, and automatically run at startup after it is deployed, so once you’ve committed a migration no further steps are needed.
The storcon_cli
tool enables interactive management of the storage controller. This is usually
only necessary for debug, but may also be used to manage nodes (e.g. marking a node as offline).
storcon_cli --help
includes details on commands.
This section is aimed at engineers deploying the storage controller outside of Neon's cloud platform, as part of a self-hosted system.
General note: since the default neon_local
environment includes a storage controller, this is a useful
reference when figuring out deployment.
It is essential that the database used by the storage controller is durable (do not store it on ephemeral local disk). This database contains pageserver generation numbers, which are essential to data safety on the pageserver.
The resource requirements for the database are very low: a single CPU core and 1GiB of memory should work well for most deployments. The physical size of the database is typically under a gigabyte.
Set the URL to the database using the --database-url
CLI option.
There is no need to run migrations manually: the storage controller automatically applies migrations when it starts up.
- The pageserver
control_plane_api
andcontrol_plane_api_token
should be set in thepageserver.toml
file. The API setting should point to the "upcall" prefix, for examplehttp://127.0.0.1:1234/upcall/v1/
is used in neon_local clusters. - Create a
metadata.json
file in the same directory aspageserver.toml
: this enables the pageserver to automatically register itself with the storage controller when it starts up. See the example below for the format of this file.
{"host":"acmehost.localdomain","http_host":"acmehost.localdomain","http_port":9898,"port":64000}
port
andhost
refer to the postgres port and host, and these must be accessible from wherever postgres runs.http_port
andhttp_host
refer to the pageserver's HTTP api, this must be accessible from where the storage controller runs.
The storage controller independently moves tenant attachments between pageservers in response to changes such as a pageserver node becoming unavailable, or the tenant's shard count changing. To enable postgres clients to handle such changes, the storage controller calls an API hook when a tenant's pageserver location changes.
The hook is configured using the storage controller's --compute-hook-url
CLI option. If the hook requires
JWT auth, the token may be provided with --control-plane-jwt-token
. The hook will be invoked with a PUT
request.
In the Neon cloud service, this hook is implemented by Neon's internal cloud control plane. In neon_local
systems
the storage controller integrates directly with neon_local to reconfigure local postgres processes instead of calling
the compute hook.
When implementing an on-premise Neon deployment, you must implement a service that handles the compute hook. This is not complicated:
the request body has format of the ComputeHookNotifyRequest
structure, provided below for convenience.
struct ComputeHookNotifyRequestShard {
node_id: NodeId,
shard_number: ShardNumber,
}
struct ComputeHookNotifyRequest {
tenant_id: TenantId,
stripe_size: Option<ShardStripeSize>,
shards: Vec<ComputeHookNotifyRequestShard>,
}
When a notification is received:
-
Modify postgres configuration for this tenant:
- set
neon.pageserver_connstr
to a comma-separated list of postgres connection strings to pageservers according to theshards
list. The shards identified byNodeId
must be converted to the address+port of the node. - if stripe_size is not None, set
neon.stripe_size
to this value
- set
-
Send SIGHUP to postgres to reload configuration
-
Respond with 200 to the notification request. Do not return success if postgres was not updated: if an error is returned, the controller will retry the notification until it succeeds..
{
"tenant_id": "1f359dd625e519a1a4e8d7509690f6fc",
"stripe_size": 32768,
"shards": [
{"node_id": 344, "shard_number": 0},
{"node_id": 722, "shard_number": 1},
],
}