Running the Tensorflow Object Detection API in a Singularity container


TL;DR Here’s my Singularity Definition File for the Tensorflow Object Detection API (including X11 forwarding).

I love the Tensorflow Object Detection API. It sits on top of Tensorflow and makes powerful object detection models like ResNet, CenterNet, and EfficientDet (relatively) easy to implement. In the past, I’ve used the API to detect fluorescent kernels on maize ears. Unfortunately, setting up the Object Detection API for that project was a pain. I had to manually install the right versions of CUDA/Tensorflow on my cluster and I still worry that the code is difficult for other people (and maybe even for me) to run. To solve these problems, I’m developing a new object detection pipeline in a containerized environment. Using a container should reduce the effort needed to run the object detection code, make the methods more reproducible, and make it portable between the cluster and the cloud. In addition to setting up the basic container, I’m including the configurations I used to allow X11 forwarding from the cluster. This allows me to do things like view images and to complete the example code included with the Object Detection API.

Conveniently, the Object Detection API comes with a Dockerfile which can be used to build a Docker container. Inconveniently, Docker can’t be run on most university clusters, including the cluster here at the University of Arizona. Singularity solves this problem: it doesn’t require Docker’s extensive permissions and it interacts better with the shared environment. Here I’ll describe the method I used to convert the Object Detection API Dockerfile into a working Singularity container.

Object Detection API Dockerfile

First, take a look at the Dockerfile included in the Object Detection API:

FROM tensorflow/tensorflow:2.2.0-gpu

ARG DEBIAN_FRONTEND=noninteractive

# Install apt dependencies
RUN apt-get update && apt-get install -y \
    git \
    gpg-agent \
    python3-cairocffi \
    protobuf-compiler \
    python3-pil \
    python3-lxml \
    python3-tk \
    wget

# Install gcloud and gsutil commands
# https://cloud.google.com/sdk/docs/quickstart-debian-ubuntu
RUN export CLOUD_SDK_REPO="cloud-sdk-$(lsb_release -c -s)" && \
    echo "deb http://packages.cloud.google.com/apt $CLOUD_SDK_REPO main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list && \
    curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - && \
    apt-get update -y && apt-get install google-cloud-sdk -y

# Add new user to avoid running as root
RUN useradd -ms /bin/bash tensorflow
USER tensorflow
WORKDIR /home/tensorflow

# Copy this version of of the model garden into the image
COPY --chown=tensorflow . /home/tensorflow/models

# Compile protobuf configs
RUN (cd /home/tensorflow/models/research/ && protoc object_detection/protos/*.proto --python_out=.)
WORKDIR /home/tensorflow/models/research/

RUN cp object_detection/packages/tf2/setup.py ./
ENV PATH="/home/tensorflow/.local/bin:${PATH}"

RUN python -m pip install -U pip
RUN python -m pip install .

ENV TF_CPP_MIN_LOG_LEVEL 3

It seems pretty straightforward: it uses the tensorflow:2.2.0-gpu container as a foundation, installs some dependencies with apt-get, adds a new user to avoid running as root, then copies the Object Detection API into the image and installs it. I first tried using the conversion utility from Singularity Python to convert the Dockerfile to a Singularity Definition File. While I ended up converting everything by hand, the utility gave me a good first glance of how the different parts of the build files compared between the two platforms. I’ll go through my changes, then share the completed Definition File below.

Singularity Definition File

The first line of a Singularity Definition File defines the bootstrap agent. This tells Singularity how the container will be constructed. Here we tell it to use an image hosted on Docker Hub, then give it the address of the Docker Hub image in the second line (note that we’ve switched to Tensorflow 2.6.0 for compatibility with some of the other packages).

Bootstrap: docker
From: tensorflow/tensorflow:2.6.0-gpu

Next is the %post section. This is where I download and install the required packages. It starts by telling the operating system to install itself without requiring any input from the user.

%post
    export DEBIAN_FRONTEND=noninteractive

It then installs all the dependencies that require the Debian apt package manager. Note that I added x11-apps and xauth to the dependencies. These will enable X11 forwarding when we run the container on the cluster.

    apt-get update && apt-get install -y \
    git \
    gpg-agent \
    python3-cairocffi \
    protobuf-compiler \
    python3-pil \
    python3-lxml \
    python3-tk \
    wget \
    x11-apps \
    xauth

Now comes the download and installation of the Object Detection API (still in the %post section). First the script downloads gcloud and gsutil, associated utilities that are required for some parts of the API. Next, the script clones the API repository into the container using git. It then compiles the configuration for the installation of the API using protocol buffer (protobuf) files. These files are an efficient way to store data, similar to XML. They can be used for lots of things, but in this case they store the instructions for the installation of the API, including a list of Python packages to be installed with pip.

    export CLOUD_SDK_REPO="cloud-sdk-$(lsb_release -c -s)" && \
    echo "deb http://packages.cloud.google.com/apt $CLOUD_SDK_REPO main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list && \
    curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - && \
    apt-get update -y && apt-get install google-cloud-sdk -y

    # Download the git version of the model garden
    cd /opt
    git clone https://github.com/tensorflow/models.git

    # Compile protobuf configs
    (cd /opt/models/research/ && protoc object_detection/protos/*.proto --python_out=.)
    cd /opt/models/research/

    cp object_detection/packages/tf2/setup.py ./
    python -m pip install -U pip	
    python -m pip install .

In the next part of the %post section, I install three Python packages that are required for the API tutorial but are not included in the standard installation. If you are making your own Singularity Definition File, you can add any additional apt or Python packages that you require by adding them to the relevant sections of %post.

    python -m pip install imageio ipython google-colab

Finally, I download the model checkpoint for the tutorial and make sure the permissions allow it to be accessed.

    # Downloading the model checkpoint for the demo (optional if not running demo)
    wget http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.tar.gz
    tar -xf ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.tar.gz
    mv ssd_resnet50_v1_fpn_640x640_coco17_tpu-8/checkpoint /opt/models/research/object_detection/test_data/
	
    # Fix permissions problem
    chmod -R 777 /opt/models/research/object_detection/test_data/checkpoint

The next section of the Definition File is called %environment. This is where you put environmental variables that will be set when you run the container. The original Dockerfile has TF_CPP_MIN_LOG_LEVEL set to 3, so I’ll include it here as well. This Tensorflow option suppresses log outputs. 0 logs all messages, 1 omits INFO, 2 omits INFO and WARNING, and 3 omits INFO, WARNING, and ERROR. Since I want to see error messages for troubleshooting, I’ll comment out this line.

%environment
    # Makes Tensorflow warnings quiet (optional)
    # export TF_CPP_MIN_LOG_LEVEL=3

The following two sections are %runscript and %startscript. These sections write to a file that’s executed when the container is run and at container build time, respectively. %startscript is designed to run when the container is run as an instance, while %runscript is executed when the container is run with the singularity run command. In this case, I’ll include the same two commands for both sections: first changing the directory to the base directory of the script and second executing any arguments that were included when the container was run.

%runscript
    cd /opt/models/research/ && \
    exec /bin/bash "$@"

%startscript
    cd /opt/models/research/ && \
    exec /bin/bash "$@"

And finally, the help section describes the purpose of the container.

%help
    This container runs the Tensorflow 2 Object Detection API. It was modified for Singularity from the Dockerfile contained in the Object Detection API repository: https://github.com/tensorflow/models/blob/master/research/object_detection/dockerfiles/tf2/Dockerfile

Building the container

Now that we have a Singularity Definition File, we can build the container. On the University of Arizona cluster, users don’t have the root authority required to build containers. However, Singularity has the option to use Remote Builder to get around root requirements by building the container in the cloud. After creating a Remote Builder account, set up the authentication token by running singularity remote login. Now you can build the container using the following command (from the base directory of the repository):

singularity -v build --remote ./singularity/tf_od.sif ./singularity/Singularity

The build process takes about 15 minutes and results in a SIF formatted container.

Running the container with X11 forwarding

To run the demo, first sign in to your HPC environment with ssh and option -X to enable X11 forwarding. At the University of Arizona, once you have signed in to the bastion host, you must also use the -X option when signing into the login node, e.g. shell -X. You can then run the demo from an interactive GPU node or using a Slurm script.

singularity exec --nv -B ~/.Xauthority ./singularity/tf_od.sif python3 ./python/eager_few_shot_od_training_tf2_singularity.py &>/dev/null &

The above command runs the demo using the newly created Singularity container. The --nv option properly configures the container for Nvidia GPUs, while the -B option adds a file to the container that’s necessary for X11 forwarding. Here I’ve silenced output messages with &>/dev/null and put the process in the background with the final &. With the process in the background you can monitor GPU activity with nvtop or nvidia-smi. The demo displays images at several steps through X11 and finally outputs images with bounding boxes and an animated gif. The training and inference took less than five minutes using an Nvidia V100S GPU.

Conclusion

Hopefully this guide will help you run the Object Detection API in a Singularity container. While all clusters are unique, my goal is to at least provide a starting point from running containerized object detection. If you have any questions please feel free to reach out!

Repository containing all code described in this post.

singularity object-detection tensorflow hpc

Published: Sep 23, 2021 by Cedar Warman