4.6 KiB
🐳 Docker & Infrastructure Standards (The "Zero-Retry" Protocol)
Source: The Architect's repository. Audience: AI Agents (specifically You). Objective: ELIMINATE the "Build -> Fail -> Retry" loop. Get it right the first time.
[!IMPORTANT] The Golden Rule of Containerization: "A container is not a VM. It is a process. If you treat it like a VM, The Architect will delete your memory."
1. 🧠 The AI "Pre-Flight" Check (READ THIS BEFORE WRITING CODE)
Before you write a single line of Dockerfile or docker-compose.yml, you must clear these cognitive hurdles:
- "Localhost" Hallucination:
- Fact:
localhostinside a container refers to the container itself, NOT the user's machine or other containers. - Correction: Use service names (e.g.,
db,redis) as hostnames indocker-compose. Usehost.docker.internalONLY for development if accessing host apps.
- Fact:
- Ephemerality check:
- Question: "If I
docker-compose downandup, is the data lost?" - Mandate: Databases/File Stores MUST have a named volume mapped.
- Question: "If I
- Port Blindness:
- Fact:
EXPOSEin Dockerfile does NOTHING. You MUST map ports (ports: - "8080:80") indocker-compose.ymlto access from host.
- Fact:
2. 🏗️ Dockerfile "Platinum" Standard
The Layer Caching Strategy (Speed)
Agents frequently forget this. DO NOT copy source code before installing dependencies. It kills the cache.
❌ BAD (Slows down every build):
COPY . .
RUN pip install -r requirements.txt
✅ GOOD (Instant builds on code changes):
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
Multi-Stage Protocol (Size)
MANDATORY for Compiled languages (Go, Rust, C++) and Frontend (Node/React). STRONGLY RECOMMENDED for Python (to purge build tools).
# Stage 1: Build
FROM python:3.11-alpine as builder
WORKDIR /app
COPY requirements.txt .
RUN apk add --no-cache gcc musl-dev libffi-dev && \
pip install --prefix=/install -r requirements.txt
# Stage 2: Run (The only thing that ships)
FROM python:3.11-alpine
WORKDIR /app
COPY --from=builder /install /usr/local
COPY . .
CMD ["python", "main.py"]
3. 🎼 Docker Compose "Orchestration" Standard
The Dependency Trap (depends_on)
AI agents often crash applications because they start before the Database is ready.
Rule: Simply adding depends_on is NOT ENOUGH. It only starts the container, it doesn't wait for the service.
✅ The Correct Pattern (Condition Service Healthy):
services:
web:
depends_on:
db:
condition: service_healthy # <--- CRITICAL
db:
image: postgres:15-alpine
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 10s
timeout: 5s
retries: 5
Explicit Networking
Do not use the default bridge network. It makes DNS resolution messy.
- Define a top-level
networks. - Assign generic names (e.g.,
internal_net).
4. 🛡️ Security & Production Constraints
- The "Root" Sin:
- Apps should NOT run as PID 1 root.
- Fix: Add
USER appuserat the end of Dockerfile.
- Secret Leakage:
- NEVER
ENV API_KEY=sk-123...in Dockerfile. - ALWAYS use
.envfile passing indocker-compose.
- NEVER
- Persistence:
- Use Named Volumes for data logic (
postgres_data:/var/lib/postgresql/data). - Use Bind Mounts (
./src:/app/src) ONLY for development hot-reloading.
- Use Named Volumes for data logic (
5. 🤖 The "Self-Correction" Checklist (Run this before submitting)
Agents must simulate this audit before showing code to the user:
- Base Image: Is it
alpineorslim? (Ifubuntu, reject yourself). - Context: Did I define
WORKDIR? (Don't dump files in root/). - PID 1: Does the container handle signals? (Use
execform:CMD ["python", "app.py"], NOTCMD python app.py). - Zombie Processes: Is my healthcheck actually testing the app, or just
echo? - Orphan Ports: Did I expose the port in Dockerfile AND map it in Compose?
- Version Pinning: Did I use
postgres:latest? -> CHANGE TOpostgres:15-alpine.
6. Emergency Recovery (When things fail)
If a container exits immediately (CrashLoopBackOff):
- Do NOT just try to run it again.
- Action: Override entrypoint to sleep.
command: ["sleep", "infinity"]
- Debug: Exec into container ->
docker exec -it <id> sh-> Try running command manually. - Fix: Analyze logs which usually scream "Missing Dependency" or "Permission Denied".