Tokens & Context Windows

How LLMs process text and why token limits matter

🔑 Key Concepts

Tokenisation — Text is split into tokens. 'Hello world' → ['Hello', ' world']. One token ≈ 4 characters or ¾ of a word.
Context window — Maximum tokens the model can process in one request. GPT-4o: 128K tokens (~96K words). Input + output both count.
Token costs — You pay per token. Input tokens are cheaper than output tokens. GPT-4o: $2.50/1M input, $10/1M output.
Counting tokens — Use tiktoken (Python) to count tokens before sending. Never guess — exceeding the limit causes errors or silent truncation.

💡 Practice: Try implementing each concept yourself before moving on. Reading about RAG and building RAG are very different things.