You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jun 6, 2024. It is now read-only.
Copy file name to clipboardExpand all lines: README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@
6
6
[](https://gitter.im/Microsoft/pai?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
**OpenPAI [v1.7.0](./RELEASE_NOTE.md#April-2021-version-170) has been released!**
9
+
**OpenPAI [v1.8.0](./RELEASE_NOTE.md#July-2021-version-180) has been released!**
10
10
11
11
With the release of v1.0, OpenPAI is switching to a more robust, more powerful and lightweight architecture. OpenPAI is also becoming more and more modular so that the platform can be easily customized and expanded to suit new needs. OpenPAI also provides many AI user-friendly features, making it easier for end users and administrators to complete daily AI tasks.
Copy file name to clipboardExpand all lines: docs/manual/cluster-admin/installation-guide.md
+12-12Lines changed: 12 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ To install OpenPAI >= `v1.0.0`, please first check [Installation Requirements](#
8
8
9
9
The deployment of OpenPAI requires you to have **at least 3 separate machines**: one dev box machine, one master machine, and one worker machine.
10
10
11
-
Dev box machine controls masters and workers through SSH during installation, maintenance, and uninstallation. There should be one, and only one dev box.
11
+
Dev box machine controls masters and workers through SSH during installation, maintenance, and uninstallation. There should be one, and only one dev box.
12
12
13
13
The master machine is used to run core Kubernetes components and core OpenPAI services. Currently, OpenPAI does not support high availability and you can only specify one master machine.
14
14
@@ -27,7 +27,7 @@ We recommend you to use CPU-only machines for dev box and master. The detailed r
27
27
<td>Dev Box Machine</td>
28
28
<td>
29
29
<ul>
30
-
<li>It can communicate with all other machines (master and worker machines).</li>
30
+
<li>It can communicate with all other machines (master and worker machines).</li>
31
31
<li>It is separate from the cluster which contains the master machine and worker machines.</li>
32
32
<li>It can access the internet, especially needs to have access to the docker hub registry service or its mirror. Deployment process will pull Docker images.</li>
33
33
</ul>
@@ -38,7 +38,7 @@ We recommend you to use CPU-only machines for dev box and master. The detailed r
38
38
<li>SSH service is enabled.</li>
39
39
<li>Passwordless ssh to all other machines (master and worker machines).</li>
40
40
<li>Docker is installed.</li>
41
-
</ul>
41
+
</ul>
42
42
</td>
43
43
</tr>
44
44
<tr>
@@ -66,16 +66,16 @@ We recommend you to use CPU-only machines for dev box and master. The detailed r
66
66
67
67
The worker machines are used to run jobs. You can use multiple workers during installation.
68
68
69
-
We support various types of workers: CPU workers, GPU workers, and workers with other computing devices (e.g. TPU, NPU).
69
+
We support various types of workers: CPU workers, GPU workers, and workers with other computing devices (e.g. TPU, NPU).
70
70
71
71
At the same time, we also support two schedulers: the Kubernetes default scheduler, and [hivedscheduler](https://github.com/microsoft/hivedscheduler).
72
72
73
-
Hivedscheduler is the default for OpenPAI. It supports virtual cluster division, topology-aware resource guarantee, and optimized gang scheduling, which are not supported in the k8s default scheduler.
73
+
Hivedscheduler is the default for OpenPAI. It supports virtual cluster division, topology-aware resource guarantee, and optimized gang scheduling, which are not supported in the k8s default scheduler.
74
74
75
75
76
76
For now, the support for CPU/NVIDIA GPU workers and workers with other computing device is different:
77
77
78
-
- For CPU workers and NVIDIA GPU workers, both k8s default scheduler and hived scheduler can be used.
78
+
- For CPU workers and NVIDIA GPU workers, both k8s default scheduler and hived scheduler can be used.
79
79
- For workers with other types of computing devices (e.g. TPU, NPU), currently, we only support the usage of the k8s default scheduler. You can only include workers with the same computing device in the cluster. For example, you can use TPU workers, but all workers should be TPU workers. You cannot use TPU workers together with GPU workers in one cluster.
80
80
81
81
Please check the following requirements for different types of worker machines:
@@ -116,7 +116,7 @@ Please check the following requirements for different types of worker machines:
116
116
<ul>
117
117
<li><b>NVIDIA GPU Driver is installed.</b> You may use <a href="./installation-faqs-and-troubleshooting.html#how-to-check-whether-the-gpu-driver-is-installed">a command</a> to check it. Refer to <a href="./installation-faqs-and-troubleshooting.html#how-to-install-gpu-driver">the installation guidance</a> in FAQs if the driver is not successfully installed. If you are wondering which version of GPU driver you should use, please also refer to <a href="./installation-faqs-and-troubleshooting.html#which-version-of-nvidia-driver-should-i-install">FAQs</a>.</li>
118
118
<li><b><a href="https://github.com/NVIDIA/nvidia-container-runtime">nvidia-container-runtime</a> is installed. And be configured as the default runtime of docker.</b> Please configure it in <a href="https://docs.docker.com/config/daemon/#configure-the-docker-daemon">docker-config-file (daemon.json)</a>, instead of in the systemd's config. You can use command <code>sudo docker run --rm nvidia/cuda:10.0-base nvidia-smi</code> to check it. This command should output information of available GPUs if it is setup properly. Refer to <a href="./installation-faqs-and-troubleshooting.html#how-to-install-nvidia-container-runtime">the installation guidance</a> if it is not successfully set up. We don't recommend to use <code>nvidia-docker2</code>. For a detailed comparison between <code>nvidia-container-runtime</code> and <code>nvidia-docker2</code>, please refer to <a href="https://github.com/NVIDIA/nvidia-docker/issues/1268#issuecomment-632692949">here</a>. </li>
119
-
</ul>
119
+
</ul>
120
120
</td>
121
121
</tr>
122
122
<tr>
@@ -139,7 +139,7 @@ Please check the following requirements for different types of worker machines:
139
139
<li>The driver of the device is installed.</li>
140
140
<li>The container runtime of the device is installed. And be configured as the default runtime of docker. Please configure it in <a href="https://docs.docker.com/config/daemon/#configure-the-docker-daemon">docker-config-file</a>, because systemd's env will be overwritten during installation.</li>
141
141
<li>You should have a deployable <a href="https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/">device plugin</a> of the computing device. After the Kubernetes is set up, you should manually deploy it in cluster. </li>
142
-
</ul>
142
+
</ul>
143
143
</td>
144
144
</tr>
145
145
</tbody>
@@ -163,7 +163,7 @@ cd pai
163
163
Choose a version to install by checkout to a specific tag:
164
164
165
165
```bash
166
-
git checkout v1.7.0
166
+
git checkout v1.8.0
167
167
```
168
168
169
169
Please edit `layout.yaml` and a `config.yaml` file under `<pai-code-dir>/contrib/kubespray/config` folder.
By default, we don't set up `kubeconfig` or install `kubectl` client on the dev box machine, but we put the Kubernetes config file in `~/pai-deploy/kube/config`. You can use the config with any Kubernetes client to verify the installation.
365
+
By default, we don't set up `kubeconfig` or install `kubectl` client on the dev box machine, but we put the Kubernetes config file in `~/pai-deploy/kube/config`. You can use the config with any Kubernetes client to verify the installation.
366
366
367
367
Also, you can use the command `ansible-playbook -i ${HOME}/pai-deploy/kubespray/inventory/pai/hosts.yml set-kubectl.yml --ask-become-pass` to set up `kubeconfig` and `kubectl` on the dev box machine. It will copy the config to `~/.kube/config` and set up the `kubectl` client. After it is executed, you can use `kubectl` on the dev box machine directly.
0 commit comments