Agent Safety & Guardrails
Keep your agents from going off the rails
🔑 Key Concepts
- Guardrails in code — Don't rely on system prompts for safety. Implement checks in the execution layer — code can't be convinced to ignore rules.
- Prompt injection defence — Input sanitisation, output validation, least-privilege tools, tool output sanitisation.
- Rate limiting — Max 10 tool calls per task, max 3 retries per tool, max 60 seconds. Prevent runaway costs.
- Human-in-the-loop — For high-stakes actions (email, deploy, delete), pause for human approval before executing.
💡 Practice: Try implementing each concept yourself before moving on. Reading about RAG and building RAG are very different things.