Agent Safety & Guardrails

Keep your agents from going off the rails

🔑 Key Concepts

Guardrails in code — Don't rely on system prompts for safety. Implement checks in the execution layer — code can't be convinced to ignore rules.
Prompt injection defence — Input sanitisation, output validation, least-privilege tools, tool output sanitisation.
Rate limiting — Max 10 tool calls per task, max 3 retries per tool, max 60 seconds. Prevent runaway costs.
Human-in-the-loop — For high-stakes actions (email, deploy, delete), pause for human approval before executing.

💡 Practice: Try implementing each concept yourself before moving on. Reading about RAG and building RAG are very different things.