Tokens & Context Windows
How LLMs process text and why token limits matter
🔑 Key Concepts
- Tokenisation — Text is split into tokens. 'Hello world' → ['Hello', ' world']. One token ≈ 4 characters or ¾ of a word.
- Context window — Maximum tokens the model can process in one request. GPT-4o: 128K tokens (~96K words). Input + output both count.
- Token costs — You pay per token. Input tokens are cheaper than output tokens. GPT-4o: $2.50/1M input, $10/1M output.
- Counting tokens — Use tiktoken (Python) to count tokens before sending. Never guess — exceeding the limit causes errors or silent truncation.
💡 Practice: Try implementing each concept yourself before moving on. Reading about RAG and building RAG are very different things.