What Are Large Language Models?

Understanding the engine that powers everything you'll build

🔑 Key Concepts

Large Language Models (LLMs) are neural networks trained on massive text datasets to predict the next token in a sequence
Parameters — the "size" of the model. GPT-4 has ~1.8 trillion. More parameters = more capable but slower and more expensive
Training — pre-training (learn from internet text), fine-tuning (specialise on specific data), RLHF (align with human preferences)
Generative — LLMs generate text one token at a time. They don't "retrieve" answers — they predict likely sequences
Context window — how much text the model can process at once. GPT-4.1: 1M tokens. Claude Sonnet 4.6: 1M tokens

📐 How LLMs Work (Simplified)

Text is tokenised — split into tokens (words, subwords, characters)
Tokens become embeddings — numerical vectors capturing meaning
Transformer architecture processes tokens using self-attention (each token looks at all other tokens)
Model predicts the next token given all previous tokens
Repeat step 4 until a stop token is generated

💡 Key Insight: LLMs are probability engines. They don't "know" facts — they generate statistically likely text. This is why they can be confident and wrong (hallucination).

🛠️ The Major LLM Providers

Provider	Model	Context	Best For
OpenAI	GPT-4o	128K	General purpose, best quality
OpenAI	GPT-4o-mini	128K	Cheaper, fast, good enough for most tasks
Anthropic	Claude Sonnet 4.6	1M	Long context, coding, analysis
Google	Gemini 2.5 Pro	2M	Largest context, reasoning
Meta	Llama 4 Scout	10M	Open-source, massive context
Mistral	Mistral Large	128K	EU-hosted, strong multilingual

✅ Check Your Understanding

Next: Tokens & Context →