Setting Up the Environment
We encourage users to do everything inside a Docker container spawned with our pre-built Docker image.
Tip
Docker may not be an option for some users. In that case,
- Python still needs the Linux
SYS_ADMIN
capability to change the GPU's power limit. One dirty way is to run Python withsudo
. - Skim through our Dockerfile (shown below) to make sure you have the stuff that's being installed.
- Follow the instructions in Installing and Building.
Zeus Docker image
We provide a pre-built Docker image in Docker Hub.
On top of the nvidia/cuda:11.8.0-devel-ubuntu22.04
image, the following are added:
- Miniconda3 23.3.1, PyTorch 2.0.1, torchvision 0.15.2
- A copy of the Zeus repo in
/workspace/zeus
. - An editable install of the
zeus
package in/workspace/zeus/zeus
. Users can override the copy of the repo by mounting the edited repo into the container. See instructions below.
Dockerfile
# Copyright (C) 2023 Jae-Won Chung <jwnchung@umich.edu>
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Build instructions
# If you're building this image locally, make sure you specify `TARGETARCH`.
# Currently, this image supports `amd64` and `arm64`. For instance:
# docker build -t mlenergy/zeus:master --build-arg TARGETARCH=amd64 .
FROM nvidia/cuda:11.8.0-base-ubuntu22.04
# Basic installs
ARG DEBIAN_FRONTEND=noninteractive
ENV TZ='America/Detroit'
RUN apt-get update -qq \
&& apt-get -y --no-install-recommends install \
build-essential software-properties-common wget git tar rsync cmake \
&& apt-get clean all \
&& rm -r /var/lib/apt/lists/*
# Install Miniconda3 23.3.1
ENV PATH="/root/.local/miniconda3/bin:$PATH"
ARG TARGETARCH
RUN if [ "$TARGETARCH" = "amd64" ]; then \
export CONDA_INSTALLER_PATH="Miniconda3-py39_23.3.1-0-Linux-x86_64.sh"; \
elif [ "$TARGETARCH" = "arm64" ]; then \
export CONDA_INSTALLER_PATH="Miniconda3-py39_23.3.1-0-Linux-aarch64.sh"; \
else \
echo "Unsupported architecture ${TARGETARCH}" && exit 1; \
fi \
&& mkdir -p /root/.local \
&& wget "https://repo.anaconda.com/miniconda/$CONDA_INSTALLER_PATH" \
&& mkdir /root/.conda \
&& bash "$CONDA_INSTALLER_PATH" -b -p /root/.local/miniconda3 \
&& rm -f "$CONDA_INSTALLER_PATH" \
&& ln -sf /root/.local/miniconda3/etc/profile.d/conda.sh /etc/profile.d/conda.sh
# Install PyTorch and CUDA Toolkit
RUN pip install --no-cache-dir torch==2.0.1 torchvision==0.15.2 --index-url https://download.pytorch.org/whl/cu118
# Place stuff under /workspace
WORKDIR /workspace
# Snapshot of Zeus
ADD . /workspace/zeus
# When an outside zeus directory is mounted, have it apply immediately.
RUN cd /workspace/zeus && pip install --no-cache-dir -e .
Tip
If you want to build our Docker image locally, you should specify TARGETARCH
to be one of amd64
or arm64
based on your environment's architecture:
docker build -t mlenergy/zeus:master --build-arg TARGETARCH=amd64 .
Dependencies
Spawn the container
The default command would be:
docker run -it \
--gpus all \ # (1)!
--cap-add SYS_ADMIN \ # (2)!
--ipc host \ # (3)!
mlenergy/zeus:latest \
bash
- Mounts all GPUs into the Docker container.
nvidia-docker2
provides this option. SYS_ADMIN
capability is needed to manage the power configurations of the GPU via NVML.- PyTorch DataLoader workers need enough shared memory for IPC. Without this, they may run out of shared memory and die.
Use the -v
option to mount outside data into the container.
For instance, if you would like your changes to zeus/
outside the container to be immediately applied inside the container, mount the repository into the container.
You can also mount training data into the container.
# Working directory is repository root
docker run -it \
--gpus all \ # (1)!
--cap-add SYS_ADMIN \ # (2)!
--ipc host \ # (3)!
-v $(pwd):/workspace/zeus \ # (4)!
-v /data/imagenet:/data/imagenet:ro \
mlenergy/zeus:latest \
bash
- Mounts all GPUs into the Docker container.
nvidia-docker2
provides this option. SYS_ADMIN
capability is needed to manage the power configurations of the GPU via NVML.- PyTorch DataLoader workers need enough shared memory for IPC. Without this, they may run out of shared memory and die.
- Mounts the repository directory into the Docker container. Since the
zeus
installation inside the container is editable, changes you made outside will apply immediately.