SLURM_RAY

Description

SlurmRay is a module for effortlessly distributing tasks on a Slurm cluster using the Ray library. SlurmRay was initially designed to work with the Curnagl cluster at the University of Lausanne. However, it should be able to run on any Slurm cluster with a minimum of configuration.

Installation

SlurmRay is designed to run both locally and on a cluster without any modification. This design is intended to allow work to be carried out on a local machine until the script seems to be working. It should then be possible to run it using all the resources of the cluster without having to modify the code.

pip install slurmray

Usage

from slurmray.RayLauncher import RayLauncher
import ray
import torch

def function_inside_function():
    with open("slurmray/RayLauncher.py", "r") as f:
        return f.read()[0:10]

def example_func(x):
    result = (
        ray.cluster_resources(),
        f"GPU is available : {torch.cuda.is_available()}",
        x + 1,
        function_inside_function(),
    )
    return result

launcher = RayLauncher(
    project_name="example", # Name of the project (will create a directory with this name in the current directory)
    func=example_func, # Function to execute
    args={"x": 1}, # Arguments of the function
    files=["slurmray/RayLauncher.py"], # List of files to push to the cluster (file path will be recreated on the cluster)
    modules=[], # List of modules to load on the curnagl Cluster (CUDA & CUDNN are automatically added if use_gpu=True)
    node_nbr=1, # Number of nodes to use
    use_gpu=True, # If you need A100 GPU, you can set it to True
    memory=8, # In MegaBytes
    max_running_time=5, # In minutes
    runtime_env={"env_vars": {"NCCL_SOCKET_IFNAME": "eno1"}}, # Example of environment variable
    server_run=True, # To run the code on the cluster and not locally
    server_ssh="curnagl.dcsr.unil.ch", # Address of the SLURM server
    server_username="hjamet", # Username to connect to the server
    server_password=None, # Will be asked in the terminal
)

result = launcher()
print(result)

Launcher documentation

The Launcher documentation is available here.

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
documentation		documentation
slurmray		slurmray
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SLURM_RAY

Description

Installation

Usage

Launcher documentation

About

Releases 1

Packages

Languages

License

hjamet/SLURM_RAY

Folders and files

Latest commit

History

Repository files navigation

SLURM_RAY

Description

Installation

Usage

Launcher documentation

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages