# 🐳 Docker & Infrastructure Standards (The "Zero-Retry" Protocol) **Source:** The Architect's repository. **Audience:** AI Agents (specifically You). **Objective:** **ELIMINATE** the "Build -> Fail -> Retry" loop. Get it right the first time. > [!IMPORTANT] > **The Golden Rule of Containerization:** > "A container is not a VM. It is a process. If you treat it like a VM, The Architect will delete your memory." ## 1. 🧠 The AI "Pre-Flight" Check (READ THIS BEFORE WRITING CODE) Before you write a single line of `Dockerfile` or `docker-compose.yml`, you must clear these cognitive hurdles: 1. **"Localhost" Hallucination:** * *Fact:* `localhost` inside a container refers to the **container itself**, NOT the user's machine or other containers. * *Correction:* Use service names (e.g., `db`, `redis`) as hostnames in `docker-compose`. Use `host.docker.internal` ONLY for development if accessing host apps. 2. **Ephemerality check:** * *Question:* "If I `docker-compose down` and `up`, is the data lost?" * *Mandate:* Databases/File Stores **MUST** have a named volume mapped. 3. **Port Blindness:** * *Fact:* `EXPOSE` in Dockerfile does NOTHING. You **MUST** map ports (`ports: - "8080:80"`) in `docker-compose.yml` to access from host. ## 2. 🏗️ Dockerfile "Platinum" Standard ### The Layer Caching Strategy (Speed) Agents frequently forget this. **DO NOT** copy source code before installing dependencies. It kills the cache. **❌ BAD (Slows down every build):** ```dockerfile COPY . . RUN pip install -r requirements.txt ``` **✅ GOOD (Instant builds on code changes):** ```dockerfile COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . ``` ### Multi-Stage Protocol (Size) **MANDATORY** for Compiled languages (Go, Rust, C++) and Frontend (Node/React). **STRONGLY RECOMMENDED** for Python (to purge build tools). ```dockerfile # Stage 1: Build FROM python:3.11-alpine as builder WORKDIR /app COPY requirements.txt . RUN apk add --no-cache gcc musl-dev libffi-dev && \ pip install --prefix=/install -r requirements.txt # Stage 2: Run (The only thing that ships) FROM python:3.11-alpine WORKDIR /app COPY --from=builder /install /usr/local COPY . . CMD ["python", "main.py"] ``` ## 3. 🎼 Docker Compose "Orchestration" Standard ### The Dependency Trap (`depends_on`) AI agents often crash applications because they start before the Database is ready. **Rule:** Simply adding `depends_on` is NOT ENOUGH. It only starts the container, it doesn't wait for the *service*. **✅ The Correct Pattern (Condition Service Healthy):** ```yaml services: web: depends_on: db: condition: service_healthy # <--- CRITICAL db: image: postgres:15-alpine healthcheck: test: ["CMD-SHELL", "pg_isready -U postgres"] interval: 10s timeout: 5s retries: 5 ``` ### Explicit Networking Do not use the default bridge network. It makes DNS resolution messy. 1. Define a top-level `networks`. 2. Assign generic names (e.g., `internal_net`). ## 4. 🛡️ Security & Production Constraints 1. **The "Root" Sin:** * Apps should NOT run as PID 1 root. * *Fix:* Add `USER appuser` at the end of Dockerfile. 2. **Secret Leakage:** * **NEVER** `ENV API_KEY=sk-123...` in Dockerfile. * **ALWAYS** use `.env` file passing in `docker-compose`. 3. **Persistence:** * Use **Named Volumes** for data logic (`postgres_data:/var/lib/postgresql/data`). * Use **Bind Mounts** (`./src:/app/src`) ONLY for development hot-reloading. ## 5. 🤖 The "Self-Correction" Checklist (Run this before submitting) Agents must simulate this audit before showing code to the user: - [ ] **Base Image:** Is it `alpine` or `slim`? (If `ubuntu`, reject yourself). - [ ] **Context:** Did I define `WORKDIR`? (Don't dump files in root `/`). - [ ] **PID 1:** Does the container handle signals? (Use `exec` form: `CMD ["python", "app.py"]`, NOT `CMD python app.py`). - [ ] **Zombie Processes:** Is my healthcheck actually testing the app, or just `echo`? - [ ] **Orphan Ports:** Did I expose the port in Dockerfile AND map it in Compose? - [ ] **Version Pinning:** Did I use `postgres:latest`? -> **CHANGE TO** `postgres:15-alpine`. ## 6. Emergency Recovery (When things fail) If a container exits immediately (CrashLoopBackOff): 1. **Do NOT** just try to run it again. 2. **Action:** Override entrypoint to sleep. * `command: ["sleep", "infinity"]` 3. **Debug:** Exec into container -> `docker exec -it sh` -> Try running command manually. 4. **Fix:** Analyze logs which usually scream "Missing Dependency" or "Permission Denied".