Skip to content

Launchers

Stoke supports the following launchers...

PyTorch DDP

Prefer the torch.distributed.launch utility described here (Note: the local_rank requirement propagates through to stoke)

python -m torch.distributed.launch,'--nproc_per_node=NUM_GPUS_YOU_HAVE, --use_env

Horovod

Refer to the docs here

horovodrun -np 4 -H localhost:4 python train.py
or
horovodrun -np 16 -H server1:4,server2:4,server3:4,server4:4 python train.py

Horovod w/ OpenMPI

Refer to the docs here. Can also be used with k8s via the MPI Operator

mpirun -np 4 \
    --allow-run-as-root -bind-to none -map-by slot \
    -x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH -x PATH \
    -mca pml ob1 -mca btl ^openib \
    python train.py
or
mpirun -np 16 \
    -H server1:4,server2:4,server3:4,server4:4 \
    -bind-to none -map-by slot \
    -x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH -x PATH \
    -mca pml ob1 -mca btl ^openib \
    python train.py

Deepspeed w/ OpenMPI

Prefer the OpenMPI version here over the native launcher. Deepspeed will automatically discover devices, etc. via mpi4py. Can also be used with k8s via the MPI Operator

mpirun -np 4 \
    --allow-run-as-root -bind-to none -map-by slot \
    -x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH -x PATH \
    -mca pml ob1 -mca btl ^openib \
    python train.py
or
mpirun -np 16 \
    -H server1:4,server2:4,server3:4,server4:4 \
    -bind-to none -map-by slot \
    -x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH -x PATH \
    -mca pml ob1 -mca btl ^openib \
    python train.py

PyTorch DDP w/ OpenMPI

Leverage Deepspeed functionality to automatically discover devices, etc. via mpi4py. Can also be used with k8s via the MPI Operator

mpirun -np 4 \
    --allow-run-as-root -bind-to none -map-by slot \
    -x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH -x PATH \
    -mca pml ob1 -mca btl ^openib \
    python train.py
or
mpirun -np 16 \
    -H server1:4,server2:4,server3:4,server4:4 \
    -bind-to none -map-by slot \
    -x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH -x PATH \
    -mca pml ob1 -mca btl ^openib \
    python train.py

Back to top