Skip to content

Commit

Permalink
feat(spark): integrate Spark operator in Kubeflow manifests (#2889)
Browse files Browse the repository at this point in the history
* feat(manifests): integrate Spark operator in Kubeflow manifests

- Add Spark operator manifests for distributed Spark workloads.
- Ensure integration with Kubeflow pipelines for seamless Spark job execution.

Signed-off-by: Gezim Sejdiu <[email protected]>

* fix: resolve permission issue for test spark script in makefile

Signed-off-by: Gezim Sejdiu <[email protected]>

* chore: bump spark-operator version to v2.0.2
Fix some issues with tests,

Signed-off-by: Gezim Sejdiu <[email protected]>

* fix: fix running spark pi-python example

Signed-off-by: Gezim Sejdiu <[email protected]>

* networkpolices

Signed-off-by: juliusvonkohout <[email protected]>

* enable webhook

Signed-off-by: juliusvonkohout <[email protected]>

* fix networkpolicy

Signed-off-by: juliusvonkohout <[email protected]>

* fix networkpolicy

Signed-off-by: juliusvonkohout <[email protected]>

* fix yamllint

Signed-off-by: juliusvonkohout <[email protected]>

* fix yamllint

Signed-off-by: juliusvonkohout <[email protected]>

* fix yamllint

Signed-off-by: juliusvonkohout <[email protected]>

* fix yamllint

Signed-off-by: juliusvonkohout <[email protected]>

* fix identation and newline for ymllint test

Signed-off-by: juliusvonkohout <[email protected]>

* set webhook port to 443

Signed-off-by: juliusvonkohout <[email protected]>

* disable istio injection for the webhook and set runasnonroot on the webhook

Signed-off-by: juliusvonkohout <[email protected]>

* disable istio injection and set runasnonroot and user 185

Signed-off-by: juliusvonkohout <[email protected]>

* wait for the webhook to become ready

Signed-off-by: juliusvonkohout <[email protected]>

* pod logs and webhook port 9443

Signed-off-by: juliusvonkohout <[email protected]>

* remove pod logs

Signed-off-by: juliusvonkohout <[email protected]>

* remove debug stuff

Signed-off-by: juliusvonkohout <[email protected]>

* fix owners file

Signed-off-by: juliusvonkohout <[email protected]>

---------

Signed-off-by: Gezim Sejdiu <[email protected]>
Signed-off-by: juliusvonkohout <[email protected]>
Co-authored-by: juliusvonkohout <[email protected]>
  • Loading branch information
GezimSejdiu and juliusvonkohout authored Oct 15, 2024
1 parent a2348b5 commit c2ad9e6
Show file tree
Hide file tree
Showing 15 changed files with 24,217 additions and 22 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/linting_bash_python_yaml_files.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ jobs:
fetch-depth: 0

- name: Install yamllint
run: pip install yamllint
run: python3 -m venv myenv && source myenv/bin/activate && pip install yamllint

- name: YAML Formatting Guidelines
run: |
Expand Down
41 changes: 41 additions & 0 deletions .github/workflows/spark_test.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
name: Build & Apply Spark manifest in KinD
on:
pull_request:
paths:
- tests/gh-actions/install_KinD_create_KinD_cluster_install_kustomize.sh
- .github/workflows/spark_test.yaml
- contrib/spark/**

jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4

- name: Install KinD, Create KinD cluster and Install kustomize
run: ./tests/gh-actions/install_KinD_create_KinD_cluster_install_kustomize.sh

- name: Install Istio
run: ./tests/gh-actions/install_istio.sh

- name: Install oauth2-proxy
run: ./tests/gh-actions/install_oauth2-proxy.sh

- name: Install cert-manager
run: ./tests/gh-actions/install_cert_manager.sh

- name: Create kubeflow namespace
run: kustomize build common/kubeflow-namespace/base | kubectl apply -f -

- name: Install KF Multi Tenancy
run: ./tests/gh-actions/install_multi_tenancy.sh

- name: Create KF Profile
run: kustomize build common/user-namespace/base | kubectl apply -f -

- name: Build & Apply manifests
run: |
cd contrib/spark/
export KF_PROFILE=kubeflow-user-example-com
make test
43 changes: 22 additions & 21 deletions common/networkpolicies/base/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,24 +2,25 @@ apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: kubeflow
resources:
- cache-server.yaml
- centraldashboard.yaml
- default-allow-same-namespace.yaml
- jupyter-web-app.yaml
- katib-controller.yaml
- katib-db-manager.yaml
- katib-ui.yaml
- kserve-models-web-app.yaml
- kserve.yaml
- metadata-envoy.yaml
- metadata-grpc-server.yaml
- minio.yaml
- ml-pipeline-ui.yaml
- ml-pipeline.yaml
- model-registry.yaml
- poddefaults.yaml
- pvcviewer-webhook.yaml
- seldon.yaml
- tensorboards-web-app.yaml
- training-operator-webhook.yaml
- volumes-web-app.yaml
- cache-server.yaml
- centraldashboard.yaml
- default-allow-same-namespace.yaml
- jupyter-web-app.yaml
- katib-controller.yaml
- katib-db-manager.yaml
- katib-ui.yaml
- kserve-models-web-app.yaml
- kserve.yaml
- metadata-envoy.yaml
- metadata-grpc-server.yaml
- minio.yaml
- ml-pipeline-ui.yaml
- ml-pipeline.yaml
- model-registry.yaml
- poddefaults.yaml
- pvcviewer-webhook.yaml
- seldon.yaml
- spark-operator-webhook.yaml
- tensorboards-web-app.yaml
- training-operator-webhook.yaml
- volumes-web-app.yaml
24 changes: 24 additions & 0 deletions common/networkpolicies/base/spark-operator-webhook.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
name: spark-operator-webhook
namespace: kubeflow
spec:
podSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- spark-operator
- key: app.kubernetes.io/component
operator: In
values:
- webhook
# https://www.elastic.co/guide/en/cloud-on-k8s/1.1/k8s-webhook-network-policies.html
# The kubernetes api server must reach the webhook
ingress:
- ports:
- protocol: TCP
port: 9443
policyTypes:
- Ingress
17 changes: 17 additions & 0 deletions contrib/spark/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
SPARK_OPERATOR_RELEASE_VERSION ?= 2.0.2
SPARK_OPERATOR_HELM_CHART_REPO ?= https://kubeflow.github.io/spark-operator

.PHONY: spark-operator/base
spark-operator/base:
mkdir -p spark-operator/base
cd spark-operator/base && \
helm template -n kubeflow --include-crds spark-operator spark-operator \
--set "spark.jobNamespaces={}" \
--set webhook.enable=true \
--set webhook.port=9443 \
--version ${SPARK_OPERATOR_RELEASE_VERSION} \
--repo ${SPARK_OPERATOR_HELM_CHART_REPO} > resources.yaml

.PHONY: test
test:
./test.sh ${KF_PROFILE}
5 changes: 5 additions & 0 deletions contrib/spark/OWNERS
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
approvers:
- juliusvonkohout
reviewers:
- juliusvonkohout

26 changes: 26 additions & 0 deletions contrib/spark/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Kubeflow Spark Operator

[![Integration Test](https://github.com/kubeflow/spark-operator/actions/workflows/integration.yaml/badge.svg)](https://github.com/kubeflow/spark-operator/actions/workflows/integration.yaml)[![Go Report Card](https://goreportcard.com/badge/github.com/kubeflow/spark-operator)](https://goreportcard.com/report/github.com/kubeflow/spark-operator)

## What is Spark Operator?

The Kubernetes Operator for Apache Spark aims to make specifying and running [Spark](https://github.com/apache/spark) applications as easy and idiomatic as running other workloads on Kubernetes. It uses
[Kubernetes custom resources](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) for specifying, running, and surfacing status of Spark applications.

## Overview

For a complete reference of the custom resource definitions, please refer to the [API Definition](docs/api-docs.md). For details on its design, please refer to the [Architecture](https://www.kubeflow.org/docs/components/spark-operator/overview/#architecture). It requires Spark 2.3 and above that supports Kubernetes as a native scheduler backend.

The Kubernetes Operator for Apache Spark currently supports the following list of features:

* Supports Spark 2.3 and up.
* Enables declarative application specification and management of applications through custom resources.
* Automatically runs `spark-submit` on behalf of users for each `SparkApplication` eligible for submission.
* Provides native [cron](https://en.wikipedia.org/wiki/Cron) support for running scheduled applications.
* Supports customization of Spark pods beyond what Spark natively is able to do through the mutating admission webhook, e.g., mounting ConfigMaps and volumes, and setting pod affinity/anti-affinity.
* Supports automatic application re-submission for updated `SparkApplication` objects with updated specification.
* Supports automatic application restart with a configurable restart policy.
* Supports automatic retries of failed submissions with optional linear back-off.
* Supports mounting local Hadoop configuration as a Kubernetes ConfigMap automatically via `sparkctl`.
* Supports automatically staging local application dependencies to Google Cloud Storage (GCS) via `sparkctl`.
* Supports collecting and exporting application-level metrics and driver/executor metrics to Prometheus.
6 changes: 6 additions & 0 deletions contrib/spark/UPGRADE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Upgrading
```sh
# Step 1: Update SPARK_OPERATOR_RELEASE_VERSION in Makefile
# Step 2: Create new Spark operator manifest
make spark-operator/base
```
65 changes: 65 additions & 0 deletions contrib/spark/spark-operator/base/aggregated-roles.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kubeflow-spark-admin
labels:
app: spark-operator
app.kubernetes.io/name: spark-operator
rbac.authorization.kubeflow.org/aggregate-to-kubeflow-admin: "true"
rules: []
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kubeflow-spark-edit
labels:
app: spark-operator
app.kubernetes.io/name: spark-operator
rbac.authorization.kubeflow.org/aggregate-to-kubeflow-edit: "true"
rbac.authorization.kubeflow.org/aggregate-to-kubeflow-admin: "true"
rules:
- apiGroups:
- sparkoperator.k8s.io
resources:
- sparkapplications
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- sparkoperator.k8s.io
resources:
- sparkapplications/status
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kubeflow-spark-view
labels:
app: spark-operator
app.kubernetes.io/name: spark-operator
rbac.authorization.kubeflow.org/aggregate-to-kubeflow-view: "true"
rules:
- apiGroups:
- sparkoperator.k8s.io
resources:
- sparkapplications
verbs:
- get
- list
- watch
- apiGroups:
- sparkoperator.k8s.io
resources:
- sparkapplications/status
verbs:
- get
---

61 changes: 61 additions & 0 deletions contrib/spark/spark-operator/base/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- resources.yaml
- aggregated-roles.yaml
namespace: kubeflow
patches:
- target:
kind: Deployment
labelSelector: "app.kubernetes.io/name=spark-operator"
patch: |-
- op: add
path: /spec/template/spec/containers/0/securityContext
value:
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
runAsNonRoot: true
runAsUser: 185
seccompProfile:
type: RuntimeDefault
- target:
kind: Deployment
labelSelector: "app.kubernetes.io/name=spark-operator"
patch: |-
- op: add
path: /spec/template/spec/containers/0/securityContext
value:
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
runAsNonRoot: true
runAsUser: 185
seccompProfile:
type: RuntimeDefault
- target:
kind: Deployment
name: spark-operator-webhook
patch: |-
apiVersion: apps/v1
kind: Deployment
metadata:
name: spark-operator-webhook
spec:
template:
metadata:
annotations:
sidecar.istio.io/inject: "false"
- target:
kind: Deployment
name: spark-operator-controller
patch: |-
apiVersion: apps/v1
kind: Deployment
metadata:
name: spark-operator-webhook
spec:
template:
metadata:
annotations:
sidecar.istio.io/inject: "false"
Loading

0 comments on commit c2ad9e6

Please sign in to comment.