What Are Large Language Models?
Understanding the engine that powers everything you'll build
🔑 Key Concepts
- Large Language Models (LLMs) are neural networks trained on massive text datasets to predict the next token in a sequence
- Parameters — the "size" of the model. GPT-4 has ~1.8 trillion. More parameters = more capable but slower and more expensive
- Training — pre-training (learn from internet text), fine-tuning (specialise on specific data), RLHF (align with human preferences)
- Generative — LLMs generate text one token at a time. They don't "retrieve" answers — they predict likely sequences
- Context window — how much text the model can process at once. GPT-4.1: 1M tokens. Claude Sonnet 4.6: 1M tokens
📐 How LLMs Work (Simplified)
- Text is tokenised — split into tokens (words, subwords, characters)
- Tokens become embeddings — numerical vectors capturing meaning
- Transformer architecture processes tokens using self-attention (each token looks at all other tokens)
- Model predicts the next token given all previous tokens
- Repeat step 4 until a stop token is generated
💡 Key Insight: LLMs are probability engines. They don't "know" facts — they generate statistically likely text. This is why they can be confident and wrong (hallucination).
🛠️ The Major LLM Providers
| Provider | Model | Context | Best For |
|---|---|---|---|
| OpenAI | GPT-4o | 128K | General purpose, best quality |
| OpenAI | GPT-4o-mini | 128K | Cheaper, fast, good enough for most tasks |
| Anthropic | Claude Sonnet 4.6 | 1M | Long context, coding, analysis |
| Gemini 2.5 Pro | 2M | Largest context, reasoning | |
| Meta | Llama 4 Scout | 10M | Open-source, massive context |
| Mistral | Mistral Large | 128K | EU-hosted, strong multilingual |