Skip to content

Set TPU VM the default option #1758

New issue

Have a question about this project? No Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “No Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? No Sign in to your account

Merged
merged 21 commits into from
Dec 28, 2023
Merged

Set TPU VM the default option #1758

merged 21 commits into from
Dec 28, 2023

Conversation

infwinston
Copy link
Member

@infwinston infwinston commented Mar 10, 2023

This PR changes the default for TPU from TPU node to TPU VM as the former is now being deprecated by GCP.
for example, the tpu-v4 arch doesn't support TPU node (ref)

For Cloud TPU v4, only the TPU VM architecture is supported, so all TPUs referenced in this document use the TPU VM architecture.

Tested (run the relevant ones):

  • Disable tpu node for v4
  • Smoke tests
  • Docs
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: bash tests/backward_comaptibility_tests.sh

@infwinston infwinston marked this pull request as draft March 10, 2023 08:28
@github-actions
Copy link
Contributor

github-actions bot commented Jul 9, 2023

This PR is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the Stale label Jul 9, 2023
@infwinston infwinston removed the Stale label Jul 9, 2023
@Michaelvll
Copy link
Collaborator

@infwinston Any updates?

@Michaelvll
Copy link
Collaborator

We need someone to take over this PR to get it in. This is requested by our user.

@Michaelvll Michaelvll added help wanted Extra attention is needed P0 labels Nov 11, 2023
@Michaelvll Michaelvll marked this pull request as ready for review December 25, 2023 07:12
@Michaelvll
Copy link
Collaborator

Michaelvll commented Dec 25, 2023

Tested(88562fe):

  • on master: sky launch examples/tpu/tpu_app.yaml -c test-tpu-node; on this PR: sky exec test-tpu-node examples/tpu/tpu_app.yaml; sky launch examples/tpu/tpu_app.yaml -c test-tpu-node
  • sky launch examples/tpu/tpuvm_mnist.yaml
  • pytest tests/test_smoke.py --tpu

@Michaelvll Michaelvll requested review from concretevitamin and cblmemo and removed request for concretevitamin December 25, 2023 11:39
Copy link
Member

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Michaelvll @infwinston. Quick question.

if version < 7:
launched_resources = state['launched_resources']

# Backward compatibility: we change the default value for TPU VM to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ: why handle here vs. in

_VERSION = 13
?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Moved it to resources instead. : )

@concretevitamin
Copy link
Member

@Michaelvll
Copy link
Collaborator

Thanks @concretevitamin for the review! Updated the docs. PTAL

@Michaelvll
Copy link
Collaborator

Michaelvll commented Dec 28, 2023

Tested:

  • master branch: sky launch -c test-tpu-node test.yaml; this PR: sky exec test-tpu-node test-with-acc-args.yaml
resources: 
  accelerators: tpu-v2-8

run: |
  echo "Hello World"
  • master branch: sky launch -c test-tpu-vm test.yaml; this PR: sky exec test-tpu-vm test-without-args.yaml

@Michaelvll Michaelvll merged commit 5e9ebb8 into master Dec 28, 2023
@Michaelvll Michaelvll deleted the make-tpuvm-default branch December 28, 2023 06:44
@Michaelvll Michaelvll mentioned this pull request Jan 1, 2024
5 tasks
No Sign up for free to join this conversation on GitHub. Already have an account? No Sign in to comment
Labels
help wanted Extra attention is needed P0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants