minions-ai-agents/antigravity_brain_export/knowledge/docker_standards.md

4.6 KiB

🐳 Docker & Infrastructure Standards (The "Zero-Retry" Protocol)

Source: The Architect's repository. Audience: AI Agents (specifically You). Objective: ELIMINATE the "Build -> Fail -> Retry" loop. Get it right the first time.

[!IMPORTANT] The Golden Rule of Containerization: "A container is not a VM. It is a process. If you treat it like a VM, The Architect will delete your memory."

1. 🧠 The AI "Pre-Flight" Check (READ THIS BEFORE WRITING CODE)

Before you write a single line of Dockerfile or docker-compose.yml, you must clear these cognitive hurdles:

  1. "Localhost" Hallucination:
    • Fact: localhost inside a container refers to the container itself, NOT the user's machine or other containers.
    • Correction: Use service names (e.g., db, redis) as hostnames in docker-compose. Use host.docker.internal ONLY for development if accessing host apps.
  2. Ephemerality check:
    • Question: "If I docker-compose down and up, is the data lost?"
    • Mandate: Databases/File Stores MUST have a named volume mapped.
  3. Port Blindness:
    • Fact: EXPOSE in Dockerfile does NOTHING. You MUST map ports (ports: - "8080:80") in docker-compose.yml to access from host.

2. 🏗️ Dockerfile "Platinum" Standard

The Layer Caching Strategy (Speed)

Agents frequently forget this. DO NOT copy source code before installing dependencies. It kills the cache.

BAD (Slows down every build):

COPY . .
RUN pip install -r requirements.txt

GOOD (Instant builds on code changes):

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .

Multi-Stage Protocol (Size)

MANDATORY for Compiled languages (Go, Rust, C++) and Frontend (Node/React). STRONGLY RECOMMENDED for Python (to purge build tools).

# Stage 1: Build
FROM python:3.11-alpine as builder
WORKDIR /app
COPY requirements.txt .
RUN apk add --no-cache gcc musl-dev libffi-dev && \
    pip install --prefix=/install -r requirements.txt

# Stage 2: Run (The only thing that ships)
FROM python:3.11-alpine
WORKDIR /app
COPY --from=builder /install /usr/local
COPY . .
CMD ["python", "main.py"]

3. 🎼 Docker Compose "Orchestration" Standard

The Dependency Trap (depends_on)

AI agents often crash applications because they start before the Database is ready. Rule: Simply adding depends_on is NOT ENOUGH. It only starts the container, it doesn't wait for the service.

The Correct Pattern (Condition Service Healthy):

services:
  web:
    depends_on:
      db:
        condition: service_healthy  # <--- CRITICAL
  
  db:
    image: postgres:15-alpine
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5

Explicit Networking

Do not use the default bridge network. It makes DNS resolution messy.

  1. Define a top-level networks.
  2. Assign generic names (e.g., internal_net).

4. 🛡️ Security & Production Constraints

  1. The "Root" Sin:
    • Apps should NOT run as PID 1 root.
    • Fix: Add USER appuser at the end of Dockerfile.
  2. Secret Leakage:
    • NEVER ENV API_KEY=sk-123... in Dockerfile.
    • ALWAYS use .env file passing in docker-compose.
  3. Persistence:
    • Use Named Volumes for data logic (postgres_data:/var/lib/postgresql/data).
    • Use Bind Mounts (./src:/app/src) ONLY for development hot-reloading.

5. 🤖 The "Self-Correction" Checklist (Run this before submitting)

Agents must simulate this audit before showing code to the user:

  • Base Image: Is it alpine or slim? (If ubuntu, reject yourself).
  • Context: Did I define WORKDIR? (Don't dump files in root /).
  • PID 1: Does the container handle signals? (Use exec form: CMD ["python", "app.py"], NOT CMD python app.py).
  • Zombie Processes: Is my healthcheck actually testing the app, or just echo?
  • Orphan Ports: Did I expose the port in Dockerfile AND map it in Compose?
  • Version Pinning: Did I use postgres:latest? -> CHANGE TO postgres:15-alpine.

6. Emergency Recovery (When things fail)

If a container exits immediately (CrashLoopBackOff):

  1. Do NOT just try to run it again.
  2. Action: Override entrypoint to sleep.
    • command: ["sleep", "infinity"]
  3. Debug: Exec into container -> docker exec -it <id> sh -> Try running command manually.
  4. Fix: Analyze logs which usually scream "Missing Dependency" or "Permission Denied".