# Kidnapping Machine Learning in Docker Containers

Docker has become an essential tool for developers, especially in the field of machine learning. It allows you to create isolated environments, ensuring that your projects run consistently across different systems. In this blog, we'll walk through the process of installing Docker on Linux, creating a Docker container using a Dockerfile, and setting up a machine learning project with `pyenv`, `venv`, and `docker-compose`.

## Table of Contents

1. [Installing Docker on Linux](#installing-docker-on-linux)
    
2. [Creating a Dockerfile for Machine Learning](#creating-a-dockerfile-for-machine-learning)
    
3. [Building and Running the Docker Container](#building-and-running-the-docker-container)
    
4. [Using Docker Compose for Orchestration](#using-docker-compose-for-orchestration)
    
5. [Conclusion](#conclusion)
    

---

## Installing Docker on Linux

Before we start, ensure that your Linux system is up-to-date:

```bash
sudo apt update && sudo apt upgrade -y
```

### Step 1: Install Docker

1. **Install Required Packages:**
    
    ```bash
    sudo apt install apt-transport-https ca-certificates curl software-properties-common
    ```
    
2. **Add Docker’s Official GPG Key:**
    
    ```bash
    curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
    ```
    
3. **Add Docker Repository:**
    
    ```bash
    echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
    ```
    
4. **Install Docker Engine:**
    
    ```bash
    sudo apt update
    sudo apt install docker-ce docker-ce-cli containerd.io
    ```
    
5. **Verify Docker Installation:**
    
    ```bash
    sudo docker --version
    ```
    
6. **Manage Docker as a Non-root User:**
    
    ```bash
    sudo usermod -aG docker $USER
    newgrp docker
    ```
    
    Now, you can run Docker commands without `sudo`.
    

---

## Creating a Dockerfile for Machine Learning

A Dockerfile is a script that contains instructions on how to build a Docker image. For a machine learning project, we’ll install `pyenv` for Python version management and `venv` for virtual environments.

### Step 2: Create a Dockerfile

1. **Create a Project Directory:**
    
    ```bash
    mkdir ml-project
    cd ml-project
    ```
    
2. **Create a** `Dockerfile`:
    
    ```bash
    touch Dockerfile
    ```
    
3. **Edit the** `Dockerfile`:
    
    ```Dockerfile
    # Use an official Python runtime as a parent image
    FROM python:3.9-slim
    
    # Set environment variables
    ENV PYTHONUNBUFFERED=1 \
        PYENV_ROOT=/root/.pyenv \
        PATH="/root/.pyenv/shims:/root/.pyenv/bin:$PATH"
    
    # Install system dependencies
    RUN apt-get update && apt-get install -y \
        build-essential \
        curl \
        git \
        libssl-dev \
        zlib1g-dev \
        libbz2-dev \
        libreadline-dev \
        libsqlite3-dev \
        wget \
        && rm -rf /var/lib/apt/lists/*
    
    # Install pyenv
    RUN curl https://pyenv.run | bash
    
    # Install a specific Python version using pyenv
    RUN pyenv install 3.9.7 && pyenv global 3.9.7
    
    # Create a virtual environment
    RUN python -m venv /opt/venv
    ENV PATH="/opt/venv/bin:$PATH"
    
    # Install Python dependencies
    COPY requirements.txt .
    RUN pip install --upgrade pip && pip install -r requirements.txt
    
    # Set the working directory
    WORKDIR /app
    
    # Copy the current directory contents into the container at /app
    COPY . .
    
    # Command to run on container start
    CMD ["python", "your_script.py"]
    ```
    
4. **Create a** `requirements.txt` File:
    
    ```bash
    touch requirements.txt
    ```
    
    Add your Python dependencies to this file, e.g.:
    
    ```text
    numpy
    pandas
    scikit-learn
    tensorflow
    ```
    

---

## Building and Running the Docker Container

### Step 3: Build the Docker Image

1. **Build the Image:**
    
    ```bash
    docker build -t ml-project .
    ```
    
2. **Run the Container:**
    
    ```bash
    docker run -it --rm ml-project
    ```
    
    This will start the container and run the script specified in the `CMD` instruction.
    

---

## Using Docker Compose for Orchestration

Docker Compose is a tool for defining and running multi-container Docker applications. It’s particularly useful for machine learning projects where you might need to run multiple services (e.g., a Jupyter Notebook server, a database, etc.).

### Step 4: Create a `docker-compose.yml` File

1. **Create a** `docker-compose.yml` File:
    
    ```bash
    touch docker-compose.yml
    ```
    
2. **Edit the** `docker-compose.yml` File:
    
    ```yaml
    services:
      ml-service:
        image: ml-project
        build: .
        volumes:
          - .:/app
        ports:
          - "8888:8888"
        command: jupyter notebook --ip=0.0.0.0 --port=8888 --no-browser --allow-root
    ```
    
    This configuration will:
    
    * Build the Docker image using the `Dockerfile`.
        
    * Mount the current directory to `/app` inside the container.
        
    * Expose port 8888 for Jupyter Notebook.
        
3. **Run Docker Compose:**
    
    ```bash
    docker-compose up
    ```
    
    You can now access the Jupyter Notebook by navigating to [`http://localhost:8888`](http://localhost:8888) in your browser.
    

---

## Conclusion

In this blog, we walked through the process of setting up Docker for a machine learning project on Linux. We installed Docker, created a Dockerfile with `pyenv` and `venv`, built and ran the container, and used Docker Compose for orchestration. This setup ensures that your machine learning projects are reproducible and can be easily shared with others.

Docker is a powerful tool that can significantly streamline your development workflow, especially in the field of machine learning. By containerizing your projects, you can avoid the common "it works on my machine" problem and focus on building great models.
