You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Add SKYPILOT_NUM_NODES env var
* Update docs/source/running-jobs/environment-variables.rst
Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>
* Update docs/source/running-jobs/environment-variables.rst
Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>
* Update docs/source/running-jobs/environment-variables.rst
Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>
* format
* add remove version
* add smoke test for num nodes
* fix test
---------
Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>
Copy file name to clipboardExpand all lines: docs/source/running-jobs/environment-variables.rst
+8-1
Original file line number
Diff line number
Diff line change
@@ -120,8 +120,12 @@ Environment variables for ``setup``
120
120
- Rank (an integer ID from 0 to :code:`num_nodes-1`) of the node being set up.
121
121
- 0
122
122
* - ``SKYPILOT_SETUP_NODE_IPS``
123
-
- A string of IP addresses of the nodes in the cluster with the same order as the node ranks, where each line contains one IP address.
123
+
- A string of IP addresses of the nodes in the cluster with the same order as the node ranks, where each line contains one IP address. Note that this is not necessarily the same as the nodes in ``run`` stage, as the ``setup`` stage runs on all nodes of the cluster, while the ``run`` stage can run on a subset of nodes.
124
124
- 1.2.3.4
125
+
3.4.5.6
126
+
* - ``SKYPILOT_NUM_NODES``
127
+
- Number of nodes in the cluster. Same value as ``$(echo "$SKYPILOT_NODE_IPS" | wc -l)``.
128
+
- 2
125
129
* - ``SKYPILOT_TASK_ID``
126
130
- A unique ID assigned to each task.
127
131
@@ -159,6 +163,9 @@ Environment variables for ``run``
159
163
* - ``SKYPILOT_NODE_IPS``
160
164
- A string of IP addresses of the nodes reserved to execute the task, where each line contains one IP address. Read more :ref:`here <dist-jobs>`.
161
165
- 1.2.3.4
166
+
* - ``SKYPILOT_NUM_NODES``
167
+
- Number of nodes assigned to execute the current task. Same value as ``$(echo "$SKYPILOT_NODE_IPS" | wc -l)``. Read more :ref:`here <dist-jobs>`.
168
+
- 1
162
169
* - ``SKYPILOT_NUM_GPUS_PER_NODE``
163
170
- Number of GPUs reserved on each node to execute the task; the same as the
164
171
count in ``accelerators: <name>:<count>`` (rounded up if a fraction). Read
0 commit comments