minions-ai-agents/antigravity_brain_export/knowledge/docker_standards.md

122 lines
4.6 KiB
Markdown

# 🐳 Docker & Infrastructure Standards (The "Zero-Retry" Protocol)
**Source:** The Architect's repository.
**Audience:** AI Agents (specifically You).
**Objective:** **ELIMINATE** the "Build -> Fail -> Retry" loop. Get it right the first time.
> [!IMPORTANT]
> **The Golden Rule of Containerization:**
> "A container is not a VM. It is a process. If you treat it like a VM, The Architect will delete your memory."
## 1. 🧠 The AI "Pre-Flight" Check (READ THIS BEFORE WRITING CODE)
Before you write a single line of `Dockerfile` or `docker-compose.yml`, you must clear these cognitive hurdles:
1. **"Localhost" Hallucination:**
* *Fact:* `localhost` inside a container refers to the **container itself**, NOT the user's machine or other containers.
* *Correction:* Use service names (e.g., `db`, `redis`) as hostnames in `docker-compose`. Use `host.docker.internal` ONLY for development if accessing host apps.
2. **Ephemerality check:**
* *Question:* "If I `docker-compose down` and `up`, is the data lost?"
* *Mandate:* Databases/File Stores **MUST** have a named volume mapped.
3. **Port Blindness:**
* *Fact:* `EXPOSE` in Dockerfile does NOTHING. You **MUST** map ports (`ports: - "8080:80"`) in `docker-compose.yml` to access from host.
## 2. 🏗️ Dockerfile "Platinum" Standard
### The Layer Caching Strategy (Speed)
Agents frequently forget this. **DO NOT** copy source code before installing dependencies. It kills the cache.
**❌ BAD (Slows down every build):**
```dockerfile
COPY . .
RUN pip install -r requirements.txt
```
**✅ GOOD (Instant builds on code changes):**
```dockerfile
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
```
### Multi-Stage Protocol (Size)
**MANDATORY** for Compiled languages (Go, Rust, C++) and Frontend (Node/React).
**STRONGLY RECOMMENDED** for Python (to purge build tools).
```dockerfile
# Stage 1: Build
FROM python:3.11-alpine as builder
WORKDIR /app
COPY requirements.txt .
RUN apk add --no-cache gcc musl-dev libffi-dev && \
pip install --prefix=/install -r requirements.txt
# Stage 2: Run (The only thing that ships)
FROM python:3.11-alpine
WORKDIR /app
COPY --from=builder /install /usr/local
COPY . .
CMD ["python", "main.py"]
```
## 3. 🎼 Docker Compose "Orchestration" Standard
### The Dependency Trap (`depends_on`)
AI agents often crash applications because they start before the Database is ready.
**Rule:** Simply adding `depends_on` is NOT ENOUGH. It only starts the container, it doesn't wait for the *service*.
**✅ The Correct Pattern (Condition Service Healthy):**
```yaml
services:
web:
depends_on:
db:
condition: service_healthy # <--- CRITICAL
db:
image: postgres:15-alpine
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 10s
timeout: 5s
retries: 5
```
### Explicit Networking
Do not use the default bridge network. It makes DNS resolution messy.
1. Define a top-level `networks`.
2. Assign generic names (e.g., `internal_net`).
## 4. 🛡️ Security & Production Constraints
1. **The "Root" Sin:**
* Apps should NOT run as PID 1 root.
* *Fix:* Add `USER appuser` at the end of Dockerfile.
2. **Secret Leakage:**
* **NEVER** `ENV API_KEY=sk-123...` in Dockerfile.
* **ALWAYS** use `.env` file passing in `docker-compose`.
3. **Persistence:**
* Use **Named Volumes** for data logic (`postgres_data:/var/lib/postgresql/data`).
* Use **Bind Mounts** (`./src:/app/src`) ONLY for development hot-reloading.
## 5. 🤖 The "Self-Correction" Checklist (Run this before submitting)
Agents must simulate this audit before showing code to the user:
- [ ] **Base Image:** Is it `alpine` or `slim`? (If `ubuntu`, reject yourself).
- [ ] **Context:** Did I define `WORKDIR`? (Don't dump files in root `/`).
- [ ] **PID 1:** Does the container handle signals? (Use `exec` form: `CMD ["python", "app.py"]`, NOT `CMD python app.py`).
- [ ] **Zombie Processes:** Is my healthcheck actually testing the app, or just `echo`?
- [ ] **Orphan Ports:** Did I expose the port in Dockerfile AND map it in Compose?
- [ ] **Version Pinning:** Did I use `postgres:latest`? -> **CHANGE TO** `postgres:15-alpine`.
## 6. Emergency Recovery (When things fail)
If a container exits immediately (CrashLoopBackOff):
1. **Do NOT** just try to run it again.
2. **Action:** Override entrypoint to sleep.
* `command: ["sleep", "infinity"]`
3. **Debug:** Exec into container -> `docker exec -it <id> sh` -> Try running command manually.
4. **Fix:** Analyze logs which usually scream "Missing Dependency" or "Permission Denied".