Skip to content

Commit

Permalink
Improve documentation for SparkJobNamespace (kubeflow#608)
Browse files Browse the repository at this point in the history
* Improve documentation for SparkJobNamespace

* Modifications after review
  • Loading branch information
michaeljmarshall authored and liyinan926 committed Sep 11, 2019
1 parent 82e238d commit fb59a3a
Showing 1 changed file with 14 additions and 3 deletions.
17 changes: 14 additions & 3 deletions docs/quick-start-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ For a more detailed guide on how to use, compose, and work with `SparkApplicatio
* [Running the Examples](#running-the-examples)
* [Configuration](#configuration)
* [Upgrade](#upgrade)
* [About the Spark Job Namespace](#about-the-spark-job-namespace)
* [About the Service Account for Driver Pods](#about-the-service-account-for-driver-pods)
* [Enable Metric Exporting to Prometheus](#enable-metric-exporting-to-prometheus)
* [Driver UI Access and Ingress](#driver-ui-access-and-ingress)
Expand All @@ -20,10 +21,10 @@ To install the operator, use the Helm [chart](https://github.com/helm/charts/tre

```bash
$ helm repo add incubator http://storage.googleapis.com/kubernetes-charts-incubator
$ helm install incubator/sparkoperator --namespace spark-operator
$ helm install incubator/sparkoperator --namespace spark-operator --set sparkJobNamespace=default
```

Installing the chart will create a namespace `spark-operator` if it doesn't exist, set up RBAC for the operator to run in the namespace. It will also set up RBAC for driver pods of your Spark applications to be able to manipulate executor pods. In addition, the chart will create a Deployment in the namespace `spark-operator`. The chart by default does not enable [Mutating Admission Webhook](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/) for Spark pod customization. When enabled, a webhook service and a secret storing the x509 certificate called `spark-webhook-certs` are created for that purpose. To install the operator **with** the mutating admission webhook on a Kubernetes cluster, install the chart with the flag `enableWebhook=true`:
Installing the chart will create a namespace `spark-operator` if it doesn't exist, and helm will set up RBAC for the operator to run in the namespace. It will also set up RBAC in the `default` namespace for driver pods of your Spark applications to be able to manipulate executor pods. In addition, the chart will create a Deployment in the namespace `spark-operator`. The chart's [Spark Job Namespace](#about-the-spark-job-namespace) is set to `""` by default, in which case it will not set up RBAC. The chart by default does not enable [Mutating Admission Webhook](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/) for Spark pod customization. When enabled, a webhook service and a secret storing the x509 certificate called `spark-webhook-certs` are created for that purpose. To install the operator **with** the mutating admission webhook on a Kubernetes cluster, install the chart with the flag `enableWebhook=true`:

```bash
$ helm install incubator/sparkoperator --namespace spark-operator --set enableWebhook=true
Expand All @@ -49,14 +50,16 @@ To run the Spark Pi example, run the following command:
$ kubectl apply -f examples/spark-pi.yaml
```

Note that `spark-pi.yaml` configures the driver pod to use the `spark` service account to communicate with the Kubernetes API server. You might need to replace it with the approprate service account before submitting the job. If you installed the operator using the Helm chart, the Spark job namespace (i.e. `default` by default) already has a service account you can use. Its name ends with `-spark` and starts with the Helm release name. The Helm chart has a configuration option called `sparkJobNamespace` which defaults to `default`. For example, If you would like to run your Spark job in another namespace called `test-ns`, then first make sure it already exists and then install the chart with the command:
Note that `spark-pi.yaml` configures the driver pod to use the `spark` service account to communicate with the Kubernetes API server. You might need to replace it with the appropriate service account before submitting the job. If you installed the operator using the Helm chart and overrode `sparkJobNamespace`, the service account name ends with `-spark` and starts with the Helm release name. For example, if you would like to run your Spark jobs to run in a namespace called `test-ns`, first make sure it already exists, and then install the chart with the command:

```bash
$ helm install incubator/sparkoperator --namespace spark-operator --set sparkJobNamespace=test-ns
```

Then the chart will set up a service account for your Spark jobs to use in that namespace.

See the section on the [Spark Job Namespace](#about-the-spark-job-namespace) for details on the behavior of the default Spark Job Namespace.

Running the above command will create a `SparkApplication` object named `spark-pi`. Check the object by running the following command:

```bash
Expand Down Expand Up @@ -155,6 +158,14 @@ $ helm upgrade <YOUR-HELM-RELEASE-NAME> --set operatorImageName=org/image --set

Refer to the Helm [documentation](https://docs.helm.sh/helm/#helm-upgrade) for more details on `helm upgrade`.

## About the Spark Job Namespace

The Spark Job Namespace value defines the namespace(s) where `SparkApplications` can be deployed. The Helm chart value for the Spark Job Namespace is `sparkJobNamespace`, and its default value is `""`, as defined in the Helm chart's [README](https://github.com/helm/charts/blob/master/incubator/sparkoperator/README.md). Note that in the [Kubernetes apimachinery](https://github.com/kubernetes/kubernetes/tree/master/staging/src/k8s.io/apimachinery) project, the constants `NamespaceAll` and `NamespaceNone` are both defined as the empty string. In this case, the empty string represents `NamespaceAll`. When set to `""`, the Spark Operator supports deploying `SparkApplications` to all namespaces. The Helm chart will create a service account in the namespace where the spark-operator is deployed, but Helm skips setting up the RBAC for driver pods of your `SparkApplications` to be able to manipulate executor pods. In order to successfully deploy `SparkApplications`, you will need to ensure the driver pod's service account meets the criteria described in the [service accounts for driver pods](#about-the-service-account-for-driver-pods) section.

On the other hand, if you installed the operator using the Helm chart and overrode the `sparkJobNamespace` to some other, pre-existing namespace, the Helm chart will create the necessary service account and RBAC in the specified namespace.

The Spark Operator uses the Spark Job Namespace to identify and filter relevant events for the `SparkApplication` CRD. If you specify a namespace for Spark Jobs, and then submit a SparkApplication resource to another namespace, the Spark Operator will filter out the event, and the resource will not get deployed. If you don't specify a namespace, the Spark Operator will see `SparkApplication` events for all namespaces, and will deploy them to the namespace requested in the create call.

## About the Service Account for Driver Pods

A Spark driver pod need a Kubernetes service account in the pod's namespace that has permissions to create, get, list, and delete executor pods, and create a Kubernetes headless service for the driver. The driver will fail and exit without the service account, unless the default service account in the pod's namespace has the needed permissions. To submit and run a `SparkApplication` in a namespace, please make sure there is a service account with the permissions in the namespace and set `.spec.driver.serviceAccount` to the name of the service account. Please refer to [spark-rbac.yaml](../manifest/spark-rbac.yaml) for an example RBAC setup that creates a driver service account named `spark` in the `default` namespace, with a RBAC role binding giving the service account the needed permissions.
Expand Down

0 comments on commit fb59a3a

Please sign in to comment.