Skip to content

Commit

Permalink
Fix broken links in Training Operator #3812 (#3899)
Browse files Browse the repository at this point in the history
* Fix broken links in Training Operator

Signed-off-by: varodrig <[email protected]>

* replace missing link

Signed-off-by: varodrig <[email protected]>

* update kubernetes api version

Signed-off-by: varodrig <[email protected]>

* removing old links and updating version

Signed-off-by: varodrig <[email protected]>

---------

Signed-off-by: varodrig <[email protected]>
  • Loading branch information
varodrig authored Oct 10, 2024
1 parent 360573b commit cd64b6f
Show file tree
Hide file tree
Showing 4 changed files with 4 additions and 15 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,6 @@ In Katib examples, you can find the following examples for Trial's Workers:

- [Kubeflow `PyTorchJob`](/docs/components/training/user-guides/pytorch/)

- [Kubeflow `MXJob`](/docs/components/training/user-guides/mxnet)

- [Kubeflow `XGBoostJob`](/docs/components/training/user-guides/xgboost)

- [Kubeflow `MPIJob`](/docs/components/training/user-guides/mpi)
Expand Down
4 changes: 2 additions & 2 deletions content/en/docs/components/training/user-guides/mpi.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ kubectl kustomize base | kubectl apply -f -

## Creating an MPI Job

You can create an MPI job by defining an `MPIJob` config file. See [TensorFlow benchmark example](https://github.com/kubeflow/mpi-operator/blob/master/examples/v2beta1/tensorflow-benchmarks.yaml) config file for launching a multi-node TensorFlow benchmark training job. You may change the config file based on your requirements.
You can create an MPI job by defining an `MPIJob` config file. See [TensorFlow benchmark example](https://github.com/kubeflow/mpi-operator/blob/master/examples/v2beta1/tensorflow-benchmarks/tensorflow-benchmarks.yaml) config file for launching a multi-node TensorFlow benchmark training job. You may change the config file based on your requirements.

```
cat examples/v2beta1/tensorflow-benchmarks/tensorflow-benchmarks.yaml
Expand All @@ -83,7 +83,7 @@ kubectl apply -f examples/v2beta1/tensorflow-benchmarks/tensorflow-benchmarks.ya

## Scheduling Policy

The MPI Operator supports the [gang-scheduling](/docs/components/training/job-scheduling#running-jobs-with-gang-scheduling).
The MPI Operator supports the [gang-scheduling](/docs/components/training/user-guides/job-scheduling/#running-jobs-with-gang-scheduling).
If you want to modify the PodGroup parameters, you can configure in the following:

```diff
Expand Down
11 changes: 2 additions & 9 deletions content/en/docs/components/training/user-guides/tensorflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ replica (as listed above) to the **TFReplicaSpec** for that replica. **TFReplica
consists of 3 fields

- **replicas** The number of replicas of this type to spawn for this `TFJob`.
- **template** A [PodTemplateSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.11/#podtemplatespec-v1-core) that describes the pod to create
- **template** A [PodTemplateSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.29/#podtemplatespec-v1-core) that describes the pod to create
for each replica.

- **The pod must include a container named `tensorflow`**.
Expand Down Expand Up @@ -592,13 +592,6 @@ further analysis.
### Stackdriver on GKE
See the guide to [logging and monitoring](/docs/gke/monitoring/) for
instructions on getting logs using Stackdriver.
As described in the guide to
[logging and monitoring](https://www.kubeflow.org/docs/gke/monitoring/#filter-with-labels),
it's possible to fetch the logs for a particular replica based on pod labels.
Using the Stackdriver UI you can use a query like
```
Expand Down Expand Up @@ -710,4 +703,4 @@ Here are some steps to follow to troubleshoot your job
- Learn about [distributed training](/docs/components/training/reference/distributed-training/) in Training Operator.
- See how to [run a job with gang-scheduling](/docs/use-cases/job-scheduling#running-jobs-with-gang-scheduling).
- See how to [run a job with gang-scheduling](/docs/components/training/user-guides/job-scheduling/#running-jobs-with-gang-scheduling).
2 changes: 0 additions & 2 deletions netlify.toml
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,6 @@ package = "netlify-plugin-checklinks"
"public/docs/components/pipelines/fonts/",
"public/docs/components/pipelines/legacy-v1/fonts/",

"public/docs/components/training/",

"public/docs/distributions/gke/",

"public/docs/distributions/aws/",
Expand Down

0 comments on commit cd64b6f

Please sign in to comment.