minions-ai-agents/docs/TROUBLESHOOTING.md

247 lines
4.8 KiB
Markdown

# 🔥 Troubleshooting Guide
Common issues and solutions for the Antigravity Brain system.
---
## 🚨 API Errors
### Error: 429 RESOURCE_EXHAUSTED (Quota Exceeded)
**Symptoms:**
- Agents stuck in retry loop
- "You exceeded your current quota" in logs
**Causes:**
- API rate limit hit (especially with Gemini 2.0-flash-exp: only 10 RPM!)
- Memory tools calling API too fast
**Solutions:**
1. **Switch to higher-quota model:**
```env
LLM_MODEL_FAST=gemini-2.5-flash-lite-preview-06-17 # 4000 RPM
```
2. **Wait for quota reset** (typically 1 minute)
3. **Check current quota:**
- Gemini: https://console.cloud.google.com/apis/api/generativelanguage.googleapis.com/quotas
- OpenAI: https://platform.openai.com/usage
---
### Error: OPENAI_API_KEY must be set
**Symptoms:**
- Memory tools fail with "api_key client option must be set"
**Cause:**
- Mem0's LLM configuration missing
**Solution:**
- Ensure `src/config.py` has LLM config in `get_mem0_config()`
- Check that `LLM_PROVIDER` in `.env` matches a supported provider
---
### Error: Template variable 'X' not found
**Symptoms:**
- Agent crashes when loading knowledge
**Cause:**
- `{variable}` syntax in markdown files interpreted as CrewAI template
**Solution:**
- Escape braces in code examples: use `PATH_VAR` instead of `{path}`
- Check `src/knowledge/standards/*.md` for unescaped `{}`
---
## 🐳 Docker Issues
### Container keeps restarting
**Check logs:**
```bash
docker logs antigravity_brain --tail 100
```
**Common causes:**
- Missing `.env` file
- Invalid API key
- Python import errors
---
### Qdrant connection refused
**Symptoms:**
- "Connection refused" to port 6333
**Solutions:**
1. **Check Qdrant is running:**
```bash
docker ps | findstr qdrant
```
2. **Check host configuration:**
```env
QDRANT_HOST=qdrant # Docker service name, NOT localhost
```
3. **Restart Qdrant:**
```bash
docker-compose restart qdrant
```
---
### Changes not reflected
**Problem:** Code changes don't appear after saving
**Solution:**
```bash
docker-compose restart app
```
Or for full rebuild:
```bash
docker-compose build --no-cache app
docker-compose up -d
```
---
## 🤖 Agent Issues
### Agent not found
**Error:** `No persona found matching 'agent-name'`
**Solutions:**
1. **Check filename matches:**
```
src/agents/personas/persona-<exact-name>.md
```
2. **Use correct search pattern:**
```python
# This searches for *agent-name* in filename
AgentFactory.create_agent("agent-name")
```
---
### Agent loads but doesn't respond
**Possible causes:**
1. **LLM API error** - Check logs for 429/401
2. **Empty backstory** - Ensure persona has content
3. **Tool errors** - Check if tools are raising exceptions
**Debug:**
```bash
docker logs antigravity_brain 2>&1 | findstr "Error ERROR Exception"
```
---
### Agent using wrong model
**Check:** Ensure `model_tier` is set correctly:
```python
agent = AgentFactory.create_agent("name", model_tier="smart") # Uses LLM_MODEL_SMART
agent = AgentFactory.create_agent("name", model_tier="fast") # Uses LLM_MODEL_FAST
```
---
## 🧠 Memory Issues
### Memory not saving
**Check Qdrant dashboard:**
```
http://localhost:6333/dashboard
```
**Verify collection exists:** Should see `itguys_antigravity_v1` (or your `MEMORY_PROJECT_ID`)
---
### Memory search returns nothing
**Possible causes:**
1. **Embedding mismatch** - Don't change `MEMORY_EMBEDDING_PROVIDER` after data exists
2. **Wrong project ID** - Check `MEMORY_PROJECT_ID` matches
3. **Qdrant empty** - No memories saved yet
---
## 🌐 Web UI Issues
### Chainlit not loading
**Check port:**
```bash
netstat -an | findstr 8000
```
**Check container:**
```bash
docker logs antigravity_brain | findstr -i "error"
```
---
### "Could not reach server" toast
**Cause:** Usually LLM API timeout or error
**Solution:** Check API key and quota
---
## 📋 Quick Diagnostic Commands
```bash
# Check all containers
docker ps
# View real-time logs
docker logs antigravity_brain -f
# Check for errors only
docker logs antigravity_brain 2>&1 | findstr "ERROR Error error Exception"
# Restart everything
docker-compose down && docker-compose up -d
# Full rebuild
docker-compose build --no-cache && docker-compose up -d
# Check Qdrant health
curl http://localhost:6333/health
# Test Gemini API
curl -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-lite-preview-06-17:generateContent?key=YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"contents":[{"parts":[{"text":"Hello"}]}]}'
```
---
## 🆘 Getting Help
1. **Check logs first** - 90% of issues are in Docker logs
2. **Verify .env** - Most config issues are environment variables
3. **Test API separately** - Ensure your API key works outside the app
4. **Check quotas** - Rate limits are the #1 cause of failures