AI Engineer Roadmap
The complete 6-month curriculum.
The 6-Month Roadmap
From zero to AI engineer β month by month
Coding & Fundamentals
Become a functional Python developer. AI engineering is first and foremost software engineering.
Learn More βLLM Application Development
Go from "I know Python" to "I can build working applications with LLMs."
Learn More βRAG β Retrieval-Augmented Generation
The single most common pattern in production AI. LLMs don't know your data β RAG bridges that gap.
Learn More βAgents, Tools & Workflows
Build AI agents that can take actions, use tools, and handle multi-step workflows.
Learn More βDeployment & Production
Take your AI apps from localhost to production. Level up from "builder" to "professional."
Learn More βSpecialize & Ship
Pick a specialization, polish your portfolio, and start positioning yourself for employment.
Learn More βπ Month 1: Foundations
AI engineering is first and foremost software engineering. Month 1 builds the foundations you'll rely on every single day: writing clean Python, understanding the math behind ML, and using professional developer tools.
ποΈ Month 1 at a Glance
You don't become an AI engineer by memorising API docs. You become one by understanding the building blocks: how to write code that other people can read, how to think in vectors and probabilities, and how to collaborate like a professional developer. Month 1 covers three interlocking skill areas:
Each area feeds into the next. You'll use Git to version your Python scripts, and math concepts when you hit ML in Month 2.
Python Fundamentals
The language of AI β fluency is non-negotiable
Python is the lingua franca of AI. Every framework, every library, every deployment tool speaks Python. You don't need to be a Python expert on day one, but you need to be fluent enough to express any idea without fighting the language.
Core topics:
- Data types and structures β ints, floats, strings, lists, tuples, dicts, sets. Know when to use each. List comprehensions and dict unpacking are daily tools.
- Control flow β loops (
for,while), conditionals,try/except/finally, context managers (withstatements). Exception handling is how real code survives. - Functions β
def, arguments, keyword args,*args/**kwargs, lambda functions, type hints. Type hints aren't optional in professional code β they're documentation that the IDE reads. - Object-oriented programming β classes, inheritance, dunder methods (
__init__,__repr__,__str__), properties, and composition over inheritance. - Decorators and generators β decorators wrap functions (used heavily by FastAPI and Flask). Generators yield values lazily β essential for processing datasets that don't fit in memory.
Math for Machine Learning
Vectors, gradients, and probability β intuition over proofs
You don't need a maths degree to be an AI engineer. But you do need intuition for the three mathematical pillars of machine learning. Focus on understanding why things work, not on memorising formulas.
- Linear algebra β vectors (addition, dot product, magnitude), matrices (multiplication, transpose, inverse), eigenvalues and eigenvectors. The dot product is the single most important operation in ML (it's how attention works, how embeddings compare, how neural networks compute).
- Calculus β derivatives (what a derivative tells you about a function's slope), partial derivatives, the chain rule, gradients. Gradient descent is just "follow the slope downhill" extended to high dimensions.
- Probability β probability distributions (normal, binomial, uniform), conditional probability, Bayes' theorem, expected value. Bayesian thinking is the foundation of uncertainty estimation in ML.
Three essential resources for building intuition:
- 3Blue1Brown's "Essence of Linear Algebra" β best visual explanations on YouTube
- StatQuest for probability and statistics β clear, simple, no fluff
- Khan Academy's calculus course β work through derivatives and gradient sections
Git & Command Line
Every AI engineer lives in the terminal
If Python is your hammer, the terminal is your workbench. Git is how you save your progress and collaborate. These aren't optional β they're the tools you'll use every minute of every workday.
Git essentials:
init,add,commit,push,pullβ the basic rhythmbranch,checkout,mergeβ feature branches prevent chaosrebase,stash,log,diffβ debugging your history- Resolving merge conflicts β they happen daily, learn to handle them calmly
CLI essentials:
- Navigating:
ls,cd,pwd,find,tree - File ops:
cp,mv,rm,cat,head,tail,grep - Processes:
ps,top,kill,nohup - Permissions:
chmod,chown
CLI Data Analysis Tool
Project: Build a real CLI tool end-to-end
Your Month 1 project is a command-line data analysis tool. It reads CSV files, computes statistics (mean, median, std dev), generates simple reports, and outputs formatted tables. This project ties together Python, CLI, and Git.
What you'll build:
- Python script with
argparsefor CLI arguments - Read CSV with Python's
csvmodule (no Pandas yet) - Compute summary statistics per column
- Generate a Markdown report file
- Version-controlled with Git (multiple branches)
Extensions: add filtering, grouping, JSON/Excel output, and basic plotting with matplotlib.
π Weekly Breakdown
Python Crash Course
Data types, control flow, functions, and basic I/O. Write 100 lines of Python a day.
- Variables, lists, dicts, tuples
- If/else, for/while loops
- Functions, scope, imports
- File reading/writing
OOP, Git & Math
Python OOP, Git fundamentals, linear algebra foundations.
- Classes, inheritance, dunder methods
- Git init through push
- Branches, merging, conflicts
- Vectors, dot products, matrices
Advanced Python + Calculus
Decorators, generators, context managers. Derivatives and gradients.
- Decorators in practice
- Generator expressions
- Context managers
- Derivatives, chain rule, gradients
Probability + Project Week
Probability distributions, Bayes. Build the CLI data analysis tool.
- Probability distributions
- Bayes' theorem intuition
- Argparse CLI setup
- CSV parsing, stats, report output
ποΈ Day-by-Day Plan (30 Days)
Python environment
Install Python 3.12+, VS Code, pip. Hello World, basic types.
Lists & dicts
List methods, dict operations. List comprehensions.
Control flow
If/elif/else, for loops with range/enumerate, while loops.
Functions 1
Def, return, default args, keyword args, *args/**kwargs.
Functions 2
Lambda, map/filter/reduce, type hints, docstrings.
File I/O
Open/read/write txt and CSV files. Context managers (with).
Error handling
Try/except/finally, custom exceptions, assertion.
Classes 1
Class definition, __init__, self, attributes, methods.
Classes 2
Inheritance, super(), dunder methods (__repr__, __str__, __len__).
Git basics
Install Git. Init, add, commit, status, log. Push to GitHub.
Git branching
Branch, checkout, merge. Resolve a merge conflict.
Vectors
What is a vector. Addition, scaling, dot product intuition.
Matrices
Matrix multiplication, transpose, identity matrix.
Eigen-stuff
Eigenvalues and eigenvectors β what they represent geometrically.
Decorators
Write decorators: timing, logging, caching.
Generators
Yield keyword, generator expressions vs list comps.
Itertools
Chain, product, combinations, groupby β daily useful.
Derivatives
What is a derivative. Power rule, product rule.
Chain rule
The chain rule β foundation of backpropagation.
Gradients
Partial derivatives, gradient = vector of partials. Gradient descent intuition.
Probability basics
Sample space, events, probability axioms. Conditional probability.
Bayes rule
Bayes' theorem with examples. Prior, likelihood, posterior.
Distributions
Normal, uniform, binomial distributions. Mean, variance, std.
CLI with argparse
Argparse basics. Positional and optional args. Help text.
CSV parsing
CSV module. Read columns, handle headers, missing data.
Stats functions
Implement mean, median, std from scratch. No NumPy yet.
Report generation
Format output as Markdown table. Write to file.
Git workflow
Create branches for features. Merge. Tag a release.
Polish & README
Write README.md with examples. Clean up code.
Review & reflect
Run through all 30 days of notes. Fill gaps. Set Month 2 goals.
π Resources β Month 1
Automate the Boring Stuff
3Blue1Brown β Linear Algebra
StatQuest β Statistics & Probability
Learning Git (Atlassian)
Exercism Python Track
CLI Crash Course (freeCodeCamp)
π Month 1 Quiz
Test your foundations knowledge β 5 questions
π Month 2: Data & ML Basics
Month 2 bridges coding foundations and practical machine learning. You'll learn to wrangle real-world data, understand the core ML algorithms, and build your first complete ML pipeline with scikit-learn.
ποΈ Month 2 at a Glance
Machine learning is 80% data work and 20% model selection. Month 2 teaches both halves: how to clean, explore, and visualise data, then how to choose and apply the right algorithm. By the end, you'll complete an end-to-end ML pipeline on a real dataset.
Pandas & NumPy Deep Dive
Wrangling data is 80% of ML work
NumPy gives you fast numerical operations on arrays and matrices. Pandas adds labelled data structures (DataFrames and Series) on top. Together they handle 95% of data manipulation in the ML workflow.
NumPy essentials:
- Arrays vs Python lists β memory efficiency and vectorised operations
- Indexing, slicing, reshaping (
reshape,flatten,transpose) - Universal functions (ufuncs) β element-wise operations without loops
- Broadcasting β operating on arrays of different shapes
- Linear algebra:
np.dot,np.linalg.inv,np.linalg.eig
Pandas essentials:
- Creating DataFrames from CSV, JSON, dicts
- Selecting rows/columns with
loc,iloc, boolean indexing - Handling missing data:
isna(),fillna(),dropna() - Group operations:
groupby(),agg(),apply() - Merging and joining DataFrames
Data Visualisation
You can't build good models without understanding your data
Visualisation is the fastest way to spot patterns, outliers, and data quality issues. matplotlib and seaborn are the standard tools.
- matplotlib β the foundation. Figure and axes objects, line plots, scatter plots, histograms, subplots. Learn to customise: titles, labels, legends, colours.
- seaborn β higher-level API built on matplotlib. Beautiful defaults. Key plots:
pairplot()(scatter matrix),heatmap()(correlation matrix),boxplot(),violinplot(),countplot(). - What to look for: missing values patterns, skewed distributions, outliers, correlations, class imbalances. Every insight you gain from visualisation is one less assumption that breaks in production.
ML Fundamentals
The core concepts that apply to every algorithm
Before you touch any algorithm, understand the principles that govern them all. These concepts are the foundation you'll refer back to for your entire career.
- Supervised vs unsupervised vs reinforcement learning β labelled data vs patterns without labels vs learning through interaction. Most of your early work will be supervised.
- Train/test split β you never evaluate on data the model has seen. The split is sacred. Typical splits: 80/20 or 70/30.
- Overfitting and underfitting β a model that memorises the training data (high variance) vs one that's too simple to capture the pattern (high bias). The bias-variance tradeoff is the central tension in ML.
- Cross-validation β splitting data into K folds, training on K-1 and evaluating on the held-out fold. K=5 or K=10 are standard. CV gives a more reliable performance estimate than a single split.
- Feature engineering and scaling β the quality of your features determines the ceiling of your model's performance. Normalise or standardise numeric features for most algorithms.
Classic ML Algorithms
The workhorses of practical machine learning
These algorithms are the foundation of classical ML. Some (random forests) still win Kaggle competitions. Others (linear/logistic regression) are essential baselines for any problem.
- Linear regression β predicts a continuous value. Assumes linear relationship between features and target. Simple, interpretable, fast. The first thing you try on any regression problem.
- Logistic regression β despite the name, it's for classification. Applies sigmoid function to linear output to produce probabilities between 0 and 1. Great baseline for binary classification.
- Decision trees β hierarchical if/else rules learned from data. Interpretable but prone to overfitting. The building block for ensemble methods.
- Random forests β hundreds of decision trees trained on random subsets of data and features. Averages their predictions. Robust, handles non-linear relationships, no feature scaling needed.
- Support vector machines (SVMs) β find the hyperplane that maximises the margin between classes. Works well with kernel tricks for non-linear boundaries. Good for medium-sized datasets.
scikit-learn Workflow
One consistent API for all algorithms
scikit-learn is the standard library for classical ML in Python. Its consistent API β fit(), predict(), transform() β means once you learn one model, you know them all. The standard workflow:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))
Pipeline essentials: Pipeline chains preprocessing and modelling into one object. GridSearchCV automates hyperparameter tuning with cross-validation. ColumnTransformer applies different transformations to different columns.
These three tools β Pipeline, GridSearchCV, ColumnTransformer β are what separate beginner scikit-learn code from professional code.
Evaluation Metrics
Accuracy is not enough
Choosing the right metric is as important as choosing the right algorithm. The wrong metric can make a bad model look good.
- Accuracy β correct predictions / total predictions. Misleading on imbalanced datasets (99% accuracy on data with 99% one class).
- Precision β true positives / (true positives + false positives). "When we predict positive, how often are we right?"
- Recall β true positives / (true positives + false negatives). "Of all actual positives, how many did we find?"
- F1 score β harmonic mean of precision and recall. Balances both. Good default for classification.
- ROC-AUC β area under the receiver operating characteristic curve. Measures the model's ability to distinguish between classes at all thresholds. 0.5 = random, 1.0 = perfect.
π Weekly Breakdown
NumPy & Pandas
Master the data science stack. All daily coding from now on uses NumPy and Pandas.
- NumPy arrays, indexing, broadcasting
- Pandas DataFrames, reading CSV
- Filtering, grouping, merging
- Missing data handling
Data Exploration & Visualisation
Explore real datasets. Build intuition through visualisation.
- matplotlib: line, scatter, histogram
- seaborn: pairplot, heatmap, boxplot
- Correlation analysis
- Data quality assessment
Classic ML Algorithms
Implement and understand each algorithm. Focus on intuition, not formulas.
- Linear & logistic regression
- Decision trees & random forests
- Support vector machines
- scikit-learn Pipeline + GridSearchCV
ML Pipeline Project
Build an end-to-end ML pipeline on a real dataset. Document everything.
- Pick a dataset (Titanic, housing, etc.)
- EDA, feature engineering
- Model comparison (3+ algorithms)
- Evaluation report with metrics
ποΈ Day-by-Day Plan (30 Days)
NumPy intro
Arrays, arange, zeros, ones, reshape. Basic operations.
NumPy advanced
Broadcasting, masking, universal functions, random module.
Pandas intro
Series, DataFrame. Read CSV. Head, info, describe.
Pandas selection
loc, iloc, boolean indexing, setting values.
Pandas cleaning
isna, fillna, dropna, duplicated, replace. Data types.
Pandas grouping
groupby, agg, apply, transform. Pivot tables.
Pandas merging
Merge, join, concat. Inner vs outer vs left joins.
matplotlib basics
Figure, axes, plot, scatter. Titles, labels, legends.
matplotlib advanced
Subplots, histograms, custom styles, saving figures.
seaborn intro
pairplot, heatmap, boxplot, countplot, violinplot.
EDA practice
Load Titanic dataset. Full EDA with visualisations. Write findings.
ML concepts 1
Supervised vs unsupervised. Train/test split. Overfitting.
ML concepts 2
Cross-validation (K-fold, stratified). Bias-variance tradeoff.
Feature scaling
StandardScaler, MinMaxScaler. When and why to scale.
Linear regression
sklearn LinearRegression. MSE, RΒ². Interpret coefficients.
Logistic regression
sklearn LogisticRegression. Sigmoid function. Decision boundary.
Decision trees
DecisionTreeClassifier/Regressor. Visualise the tree. Feature importance.
Random forests
RandomForestClassifier. n_estimators, max_depth tuning.
Support vector machines
SVC, kernel tricks (rbf, poly, linear). C and gamma parameters.
Model comparison
Compare 4+ algorithms on same dataset. Which wins and why?
Metrics: accuracy
When accuracy works and when it lies. Confusion matrix.
Precision & recall
Precision, recall, F1. Precision-recall curves.
ROC-AUC
ROC curve, AUC score. Threshold selection.
Pipelines
sklearn Pipeline. Chain preprocessing + model. Make column transformer.
GridSearchCV
Hyperparameter tuning with cross-validation. Parallel search.
Project: EDA
Pick a dataset. Load, clean, explore. Visualise everything.
Project: features
Create features. Handle categoricals (OneHot/Ordinal). Scale.
Project: models
Train 3+ models. Tune hyperparameters. Compare results.
Project: report
Write findings. Make visualisations. Document in README.
Review
Revisit all metrics. Can you explain when to use each? Set Month 3 goals.
π Resources β Month 2
Python Data Science Handbook
StatQuest β ML Algorithms
Kaggle Learn β Pandas & ML
scikit-learn User Guide
Hands-On ML (GΓ©ron)
Kaggle Competitions
π Month 2 Quiz
Test your ML basics knowledge β 5 questions
π§ Month 3: Deep Learning
Deep learning powers everything from ChatGPT to self-driving cars. Month 3 takes you from the perceptron to modern architectures β building neural networks from scratch, then using PyTorch to train them at scale. You'll end with a deployed image classifier.
ποΈ Month 3 at a Glance
Deep learning is not magic β it's calculus applied to layered function approximation. Month 3 builds up from the simplest building block (the perceptron) through full-scale architectures (CNNs, RNNs, Transformers). You'll implement everything in PyTorch, the industry-standard deep learning framework.
Neural Network Fundamentals
From perceptron to multi-layer networks
A neural network is a stack of matrix multiplications with non-linear activation functions in between. Every neuron computes a weighted sum of its inputs, passes it through an activation function, and sends the result forward. Stack enough of these layers and you can approximate any function β this is the Universal Approximation Theorem.
- The perceptron β a single neuron with a step activation. Linearly separable problems only. The foundation everything builds on.
- Activation functions β ReLU (most common), sigmoid (for binary classification output), tanh, softmax (for multi-class). Each has a specific purpose.
- Forward propagation β input β hidden layers β output. Each layer is WΒ·x + b, then activation.
- Backpropagation β the chain rule applied to compute gradients through all layers. The "learning" in deep learning. You don't implement it by hand (PyTorch does it for you), but you must understand what it does: it tells each weight how much it contributed to the error.
PyTorch: Tensors & Autograd
The foundation of everything you'll build
PyTorch is the most popular deep learning framework for research and production. Its key features are tensors (like NumPy arrays but GPU-accelerated) and autograd (automatic differentiation).
- Tensors β create from lists or NumPy arrays. Move to GPU with
.cuda()or.to('cuda'). - Tensor operations β
.view()(reshape),.permute(), indexing, broadcasting. All familiar from NumPy but GPU-accelerated. - Autograd β set
requires_grad=True. Every tensor operation builds a computation graph. Call.backward()to compute all gradients. This is the magic that makes training possible. - Gradients β after
loss.backward(), each tensor has.gradpopulated with the gradient of the loss with respect to that tensor.
import torch
x = torch.tensor([1., 2., 3.], requires_grad=True)
y = (x ** 2).sum()
y.backward()
print(x.grad) # tensor([2., 4., 6.])
PyTorch: Datasets & Training Loops
The standard patterns you'll use in every project
PyTorch provides abstractions for data loading and model building that make training repeatable and scalable. These patterns are used in every PyTorch project.
- Dataset and DataLoader β subclass
torch.utils.data.Datasetto define your data. Wrap inDataLoaderfor batching, shuffling, and parallel loading with multiple workers. - nn.Module β subclass to define your model. Define layers in
__init__, implement forward pass inforward(). Compose layers withnn.Sequentialfor simple networks. - Loss functions and optimisers β
nn.CrossEntropyLossfor classification,nn.MSELossfor regression.optim.Adamis the default optimiser (adaptive learning rate, works well out of the box).
model = nn.Sequential(
nn.Linear(784, 256),
nn.ReLU(),
nn.Linear(256, 10)
)
optimizer = optim.Adam(model.parameters(), lr=0.001)
for epoch in range(10):
for xb, yb in dataloader:
pred = model(xb)
loss = loss_fn(pred, yb)
optimizer.zero_grad()
loss.backward()
optimizer.step()
Convolutional Neural Networks
The architecture that sees the world
CNNs are designed for grid-structured data β images above all. Instead of dense connections (every neuron to every other), CNNs use sliding filters (kernels) that learn local patterns.
- Convolution β a filter slides over the input, computing dot products at each position. Produces a feature map showing where patterns were detected. Multiple filters β multiple feature maps.
- Pooling β downsampling operation (usually max pooling) that reduces spatial dimensions and adds translation invariance. Common: 2Γ2 max pool with stride 2 halves the size.
- LeNet-5 β the original CNN (1998). Conv β Pool β Conv β Pool β FC. Still a valid pattern for simple problems.
- ResNet β introduced skip connections (residual connections) that allow training very deep networks (50+ layers) by letting gradients flow directly through the identity path. The breakthrough that made deep networks practical.
Start with a simple ConvNet on MNIST or CIFAR-10, then graduate to using a pretrained ResNet.
RNNs, LSTMs & Transformers
Processing sequences β text, time series, audio
When data has a temporal or sequential structure (text, speech, stock prices), you need models that process one element at a time while maintaining state.
- RNNs β the simplest sequence model. Pass a hidden state from one timestep to the next. Suffers from vanishing gradients for long sequences. Good for short sequences only.
- LSTMs β Long Short-Term Memory. Adds forget, input, and output gates. Controls what to remember and forget. Handles long sequences (up to hundreds of tokens). Standard for sequence tasks before Transformers.
- Attention mechanism β "which part of the input should I focus on?" Attention computes weighted sums of input elements, where weights depend on the current context. The core idea behind Transformers.
- Transformers β "Attention is All You Need" (2017). No recurrence β process all tokens in parallel using self-attention. Positional encodings inject order information. The foundation of GPT, BERT, and every modern LLM.
For Month 3, focus on understanding the attention mechanism conceptually and using PyTorch's built-in transformer layers.
Transfer Learning & Fine-Tuning
Standing on the shoulders of giants
Training a deep network from scratch requires massive data and compute. Transfer learning lets you take a model pretrained on a huge dataset (like ImageNet with 14 million images) and adapt it to your specific task with minimal data.
- Feature extraction β freeze the pretrained model's weights (except the final classification layer). The pretrained model acts as a sophisticated feature extractor. Replace the last layer with one for your task. Train only that layer.
- Fine-tuning β unfreeze some or all of the pretrained layers and train everything with a low learning rate. Adapts the pretrained features to your specific domain. Requires more data but often gives better results.
- Common pretrained models β ResNet, EfficientNet, ViT (Vision Transformer) for images. BERT, RoBERTa for text. Available in
torchvision.modelsandtransformerslibrary.
Training Best Practices
Making models that actually converge
Knowing the architecture is half the battle. Knowing how to train it is the other half. These techniques separate working models from stuck models.
- Learning rate scheduling β start with a higher LR, then decrease over time. Common schedules: StepLR (reduce by factor every N epochs), CosineAnnealing, ReduceLROnPlateau (reduce when validation loss plateaus).
- Regularisation β dropout (randomly zero out neurons during training), weight decay (L2 penalty on weights), data augmentation (create synthetic variations of training data). Each reduces overfitting.
- Batch normalisation β normalise layer inputs to have zero mean and unit variance. Stabilises training, allows higher learning rates, and provides some regularisation.
- Early stopping β monitor validation loss. Stop training when it hasn't improved for N epochs (patience). Saves time and prevents overfitting.
- Gradient clipping β cap gradient values to prevent exploding gradients in RNNs and deep networks. Simple but critical.
The combo that works for most projects: Adam optimiser + ReduceLROnPlateau + early stopping + moderate dropout (0.3β0.5).
π Weekly Breakdown
PyTorch & Neural Net Basics
PyTorch tensors, autograd, building and training simple networks.
- Tensors, GPU acceleration
- Autograd computation graph
- nn.Module, linear layers
- Training loop from scratch
CNN & Computer Vision
Build CNNs for image classification. Learn the vision pipeline.
- Convolution, pooling layers
- torchvision datasets & transforms
- Build LeNet, study ResNet
- Transfer learning with pretrained models
Sequence Models & Attention
RNNs, LSTMs, and the attention mechanism.
- RNN with nn.RNN
- LSTM for sentiment analysis
- Attention mechanism intuition
- Transformer architecture basics
Image Classifier Project
Full project: train and evaluate a custom image classifier.
- Custom dataset (torchvision or web scrape)
- Data augmentation pipeline
- Transfer learning + fine-tuning
- Evaluation, visualisation, export
ποΈ Day-by-Day Plan (30 Days)
PyTorch install
Install PyTorch. Tensor basics: creation, operations, shapes.
GPU tensors
Move tensors to GPU. Check CUDA availability. Benchmark speed.
Autograd
requires_grad, backward, grad. Computation graph intuition.
nn.Module
Subclass Module. Linear, ReLU layers. Forward method.
nn.Sequential
Sequential API. Build a 3-layer network in 5 lines.
Loss & optimiser
CrossEntropyLoss, MSELoss. SGD, Adam optimiser.
Training loop
Write training loop from scratch. Epochs, batches, zero_grad.
Datasets & DataLoader
torchvision.datasets. Custom Dataset class. DataLoader with batching.
MNIST classifier
Train a fully-connected net on MNIST. Achieve >97% accuracy.
Perceptron code
Implement a single perceptron from scratch with NumPy.
Backprop from scratch
Watch Karpathy's micrograd. Implement simple backprop.
Conv2d layer
nn.Conv2d parameters: in_channels, out_channels, kernel_size, stride, padding.
Pooling layer
MaxPool2d, AvgPool2d. How pooling affects spatial dimensions.
LeNet-5 from scratch
Implement LeNet-5 in PyTorch. Train on MNIST with >99% accuracy.
CIFAR-10 CNN
Build a simple CNN for CIFAR-10. Target >75% accuracy.
ResNet study
Load pretrained ResNet18. Understand residual blocks. Visualise.
Transfer learning 1
Load ResNet18, freeze features, replace classifier. Train on custom data.
Transfer learning 2
Fine-tuning: unfreeze layers, differential learning rates.
Data augmentation
torchvision.transforms: RandomCrop, RandomHorizontalFlip, ColorJitter, Normalize.
LR scheduling
StepLR, ReduceLROnPlateau, CosineAnnealing. Plot learning rate curves.
Regularisation
Dropout, weight decay. Compare overfitting with and without.
Batch norm & early stopping
nn.BatchNorm1d/2d. Implement early stopping callback.
RNN basics
nn.RNN. Input shape (seq_len, batch, features). Hidden state.
LSTM
nn.LSTM. Sentiment classification with LSTMs on text data.
Attention mechanism
Understand attention: query, key, value. Implement simple attention.
Transformers intro
nn.Transformer. Encoder-decoder architecture. Positional encoding.
Project: dataset
Collect or download custom image dataset. Organise into folders. Visualise samples.
Project: training
Build pipeline with aug, transfer learning. Train, validate, early stop.
Project: eval
Confusion matrix, per-class metrics, t-SNE visualisation. Save model.
Review & celebrate
Review all 3 months. You've built foundations most self-taught engineers never master.
π Resources β Month 3
Karpathy's Neural Networks Lectures
PyTorch Official Tutorials
d2l.ai β Dive into Deep Learning
3Blue1Brown β Neural Networks
DeepLearning.AI Specialization
The Annotated Transformer
π Month 3 Quiz
Test your deep learning knowledge β 5 questions
π€ Month 4: Agents β How the Sausage Is Made
AI agents are the hottest topic in 2026. But what do they actually look like under the hood? Here's the real engineering.
ποΈ Agent Architecture at a Glance
An agent isn't magic β it's a loop. A large language model generates structured tool calls, your code executes them, and the results feed back into the model's context for the next reasoning step. That loop is the entire game.
This is known as the ReAct pattern (Reasoning + Acting). It was introduced in a 2023 paper and has since become the foundation of virtually every agent framework.
The ReAct Pattern (Deep)
Reason β Act β Observe β the agent loop
Every agent starts with a prompt that defines its personality, tools, and constraints. Every user message gets appended to a conversation. The LLM generates either a final answer or a structured tool call. Your framework intercepts structured outputs, calls the appropriate function with parsed arguments, and feeds the result back into the conversation. The LLM then decides what to do next.
Key implementation details from the training modules:
- LangChain's
create_react_agenthandles the prompt template and parsing. You provide tools and LLM. - Always set max_iterations (10-15). Unbounded agents loop forever and burn tokens at $0.15/minute.
- Verbose mode (
verbose=True) shows the thought trace. Essential for debugging agent decisions.
The prompt template typically looks like this pseudocode:
You are a helpful AI assistant with access to tools.
To use a tool, respond with:
Thought: [your reasoning]
Action: [tool_name]
Action Input: [tool_parameters]
To give a final answer:
Thought: [your reasoning]
Final Answer: [your response]
Function Calling (Deep)
Giving agents structured ways to interact with the world
Function calling is how the model tells your code "call this function with these arguments." The model outputs JSON β your code executes it and returns the result. This is the critical bridge between "thinking" and "doing."
From the training:
- Tool schemas need name, description (when to use it, not just what it does), and parameter schema with types. Bad descriptions cause wrong tool selection.
- Error handling matters β feed tool errors back to the model. If the API returns 404, let the agent adapt and retry with different parameters.
- Safety patterns: sandbox code execution, read-only for databases, human approval for destructive actions.
A function call from the model looks like this JSON:
{
"function": "search_web",
"arguments": {
"query": "latest AI agent frameworks 2026"
}
}
Your code parses this, calls search_web("latest AI agent frameworks 2026"), and returns the result text. The model then uses that result to continue reasoning.
Building Great Tools
Tool design is agent architecture
Tools are the most important design decision in any agent system. A well-designed tool can reduce token usage by 50% and dramatically improve success rates. The key principles:
- One responsibility per tool β a tool that does one thing well is always better than a Swiss Army knife.
- Composite tools β instead of "get_user" + "get_orders" + "format_email", create "send_order_summary_email." Fewer steps means fewer error opportunities.
- Input validation β the model might pass wrong types or out-of-range values. Always validate in the tool function before executing business logic.
- Caching β cache tool results for identical inputs. An agent exploring options might call the same tool with the same parameters multiple times. Cache saves time and money.
Agent Memory Systems
Agents that remember across conversations
LLMs are stateless β every call is a fresh start. Memory is what gives agents continuity. There are three types every AI engineer should understand:
- Short-term memory β the current conversation's message history. Managed with a buffer window (last N messages) or summarization (compress older context into a summary). The simplest and most common approach.
- Long-term memory β facts, preferences, and learned patterns stored in a vector database. Retrieved via semantic similarity when relevant conversations occur.
- Episodic memory β past interactions stored as episodes. "Last time the user asked about X, we found Y." This is the most advanced form and closest to human memory.
The most effective pattern: give agents save_memory and recall_memory tools. Let the agent decide what's worth remembering. You provide the infrastructure β the agent handles the curation.
Multi-Agent Architectures
Teams of specialists outperform a single generalist
When a single agent tries to handle too many responsibilities, quality degrades. The solution: multiple specialist agents that each own one domain.
Common patterns:
- Supervisor pattern β one orchestrator delegates to specialists (researcher, writer, critic). The supervisor monitors progress and handles failures.
- Frameworks: CrewAI (role-based teams with simple YAML config), AutoGen (Microsoft, conversation-based), LangGraph (stateful orchestration with graphs).
- Cost optimization β use cheap models (GPT-4o-mini, Claude Haiku) for routine agents. Save expensive models only for quality-critical roles like final output generation.
- Coordination overhead β every message between agents adds latency and cost. Design your agent graph to minimize hops.
Agent Safety & Guardrails
Security isn't optional for autonomous systems
An agent that can call tools can also cause damage. Safety is not something you add at the end β it's the foundation. Key principles:
- Guardrails in code, not prompts β code can't be convinced to ignore rules. A system prompt can be jailbroken, but a rate limiter in the execution layer is bulletproof.
- Prompt injection defence β three layers: input sanitization, output validation, and least-privilege tool access. Assume any user input is malicious.
- Rate limiting β max 10 tool calls per task, max 3 retries per tool, max 60 seconds execution time. Prevent runaway costs from buggy agents.
- Human-in-the-loop β for high-stakes actions (sending emails, deploying code, deleting data), pause for human approval before executing. This is non-negotiable in production.
Training module 07-safety covers all of this in depth with practical code examples.
π― Practice Project: Build a Research Agent
Your Month 4 project is to build an autonomous research agent with tools and memory. It takes a research question, searches the web, reads pages, synthesises findings, and produces a cited report.
π Month 5: Production & Deployment β How the Sausage Is Made
Taking AI apps from your laptop to serving real users. This is where you become a professional, not just a builder.
ποΈ The Production AI Stack
Production AI has moving parts that most tutorials skip. Here's what a real deployment looks like from top to bottom:
Each layer is independently deployable, testable, and scalable. That's the whole point of production engineering.
FastAPI for AI Services
Your AI needs a robust, async API layer
FastAPI is the standard for Python AI backends. Async endpoints handle concurrent requests efficiently, Pydantic models validate request/response schemas, and automatic OpenAPI docs make integration a breeze.
- Async endpoints β LLM calls are I/O bound. Async handlers let your server handle other requests while waiting for the model.
- Dependency injection β use FastAPI's
Depends()for auth, DB connections, and LLM clients. Clean, testable, and composable. - Middlewares β logging, CORS, rate limiting, and request ID tracking. Essential for production observability.
- Background tasks β offload non-critical work (logging, telemetry) with
BackgroundTasks.
The training lesson on FastAPI walks through building a complete AI API endpoint step by step.
Streaming API Responses
Real-time token delivery via Server-Sent Events
Users don't want to stare at a spinner for 10 seconds waiting for the entire response. Streaming delivers tokens as they're generated, dramatically improving perceived performance.
- SSE vs WebSocket β SSE is simpler (plain HTTP), one-directional (serverβclient), and auto-reconnects. Perfect for LLM streaming.
- Implementation β use
StreamingResponsewithmedia_type='text/event-stream'. Yield'data: {token}\n\n'for each chunk. - Client side β
EventSourceAPI in JavaScript.onmessagecallback fires for each token. Trivial to implement. - Error handling β send
'data: [ERROR] message\n\n'on failure. The SSE client auto-reconnects.
Docker & Containers
Consistent environments from dev to prod
"It works on my machine" is not a deployment strategy. Docker gives you reproducible environments across all stages.
- Multi-stage builds β builder stage installs all dependencies, runtime stage copies only what's needed. Smaller images = faster deploys.
- Docker Compose β define your full stack in one YAML file: API + vector DB + Redis + Postgres.
docker compose upstarts everything. - Health checks β
HEALTHCHECKin Dockerfile. Load balancers and orchestrators use this to route traffic away from broken containers. - Secrets management β API keys via environment variables, never baked into images. Use Docker secrets or mounted
.envfiles.
CI/CD Pipelines
Automate everything. Trust nothing manual.
A proper CI/CD pipeline automates the entire path from git push to production deployment. GitHub Actions is the standard choice.
- Pipeline stages β Lint β Test β Build Docker β Push to registry β Deploy. Every merge to main triggers the full pipeline.
- AI quality gates β run evaluation metrics in CI. If RAGAS faithfulness score drops below threshold, block the deployment.
- Docker layer caching β copy
requirements.txtfirst (changes rarely), install deps, then copy source (changes often). Drastically speeds up build times. - Smoke tests β hit your deployed API with a simple request immediately after deployment. Catch breakage in seconds, not hours.
Observability & Cost Tracking
You can't manage what you don't measure
Production AI systems are opaque by default. Observability makes them transparent. Cost tracking keeps your budget in check.
- Structured logging β every request logs: request ID, model used, token count, latency, cost, and success/failure. JSON format for easy ingestion.
- LLM cost tracking β per feature, per user, per day. Break down costs: RAG costs vs chat costs vs summary costs. You can't optimise what you don't measure.
- Dashboards β Grafana or Langfuse for latency, error rate, token usage, and cost trends. Set alerts on cost spikes (+50% in a day = investigate).
- LiteLLM proxy β route all LLM calls through a single proxy. One dashboard for all providers. Logs every call automatically with cost calculation.
Testing AI Systems
Non-deterministic doesn't mean untestable
"LLMs give random outputs" is not an excuse to skip testing. AI systems need a different approach to quality assurance.
- Separate concerns β deterministic code (routing, formatting, DB queries) gets exact assertions. LLM outputs get semantic assertions with similarity checks.
- Mock LLM calls β use
respxorpytest-httpserverin unit tests. Real API calls only in scheduled integration tests. - Semantic assertions β
assert is_similar(response, expected, threshold=0.85). Use LLM-as-judge for automated evaluation. - Regression suite β maintain 50β100 representative queries with expected behaviour patterns. Re-run on every pipeline change.
Security & Compliance
PII leaks and prompt injection can kill companies
Security in AI systems is different from traditional security. The attack surface includes the model itself, the data pipeline, and the tool execution layer.
- PII detection β use Microsoft Presidio for automated PII scanning and redaction. Redact before sending data to any LLM.
- GDPR requirements β right to deletion must cover all data stores including vector databases. Plan for data portability and consent management from day one.
- Prompt injection β input sanitization, output validation, least-privilege tool access. Guardrails in code are the only reliable defence.
- Data residency β EU user data must stay on EU-hosted models/services. Check Data Processing Agreements (DPAs) with LLM providers. Use zero-retention APIs for sensitive data.
π― Practice Project: Production AI API
Your Month 5 capstone: build a production-grade AI API with the full stack β FastAPI + RAG + Docker + CI/CD + observability + security. This project goes on your resume as proof you can ship real systems.
π― Month 6: Specialise & Career β How the Sausage Is Made
The final stretch. Pick a lane, build your portfolio, and position yourself for that first AI engineering role.
ποΈ Your Career Options
Not all AI engineering roles are the same. Choose your specialization based on your interests and market demand:
AI Product Engineer
Build AI-powered features into products. Most common role. Combines full-stack skills with LLM integration.
π₯ HottestApplied ML Engineer
Fine-tune, train, and deploy custom models. More math, more GPU, more research. Closer to traditional ML.
GrowingAI Platform Engineer
Build infrastructure for AI teams. ML pipelines, model serving, feature stores, monitoring. DevOps + AI.
In DemandConversational AI Engineer
Voice assistants, chatbots, customer service automation. Rasa, Voiceflow, custom dialogue systems.
NicheChoosing Your Specialisation
Go deep, not wide
Generalists get generic jobs. Specialists get calls from recruiters. Month 6 is about picking one area and going deep β and making that decision visible to employers.
Considerations when choosing:
- Market demand β which roles have the most openings in your location or for remote work?
- Your interests β do you enjoy building products (AI Product Engineer) or understanding model internals (Applied ML)?
- Salary ceiling β some specialisations pay more than others. Check levels.fyi for current data.
- Growth trajectory β some paths (AI Product Engineer) offer broader career options. Others (Conversational AI) are more niche but less competitive.
Whatever you choose, commit for 6-12 months. Switching specialisations too often means you never build depth.
Building Your Portfolio
3 deployed projects beat 30 tutorial repos
Your portfolio is everything. Recruiters look at GitHub before they look at your resume. Here's what works:
- The 3-project rule: RAG system (with evaluation metrics), Agent system (with tools and memory), Production system (with CI/CD and observability). Deploy all three.
- Killer READMEs β problem statement, architecture diagram, tech stack, live demo link, key challenges faced, measurable results. This is your project's sales pitch.
- Deploy everything β a GitHub repo with no live link is homework, not a portfolio project. Deploy on Railway, Render, Fly.io, or your own VPS.
- Impact metrics β "Built a RAG system" says nothing. "Built a RAG system that reduced support ticket resolution time by 40%" says everything.
Personal Branding
Be findable. Be memorable. Be the expert in X.
You can be the best AI engineer in the world, but if nobody knows you exist, you're invisible. Personal brand is the multiplier on your technical skills.
- Write online β blog posts, Twitter threads, LinkedIn articles. Build in public. Share what you're learning. Visibility compounds over months.
- Open source contributions β contribute to LangChain, LlamaIndex, Chroma, or any tool you use. Even documentation fixes count. Your GitHub contribution graph is your living resume.
- Demo videos β 2-minute Loom or screen recordings walk through your deployed projects. Recruiters watch videos before they read READMEs.
- Consistency beats intensity β one post per week for 6 months beats 20 posts in a week followed by silence.
Interview Preparation
System design, coding, and AI-specific questions
AI engineering interviews are a mix of traditional software engineering and AI-specific knowledge. They typically cover:
- System design β "Design a RAG system for 10K documents." Draw architecture, explain trade-offs (chunk size, embedding model, vector DB choice), discuss failure modes, and justify your decisions.
- AI coding challenges β implement basic vector search, build a simple RAG pipeline from scratch, write a function calling loop.
- Conceptual questions β "Fine-tune vs RAG β when would you choose each?" "How do you handle hallucinations?" "How do you evaluate LLM output quality?" Have structured, rehearsed answers.
- STAR stories β prepare 5 go-to stories: a production incident you resolved, a failed experiment you learned from, an optimisation you made, a design decision you drove, a moment of significant learning.
Practice system design out loud. Record yourself. Refine your explanations until they're clear and confident.
Salary & Negotiation
Know your worth. Don't leave money on the table.
Salary negotiation is a skill, and the worst-case scenario of asking is getting a "no." The best-case is significantly more comp.
- UK AI salaries (2026): Junior Β£50KβΒ£70K, Mid-level Β£70KβΒ£110K, Senior Β£110KβΒ£180K+. Remote roles to US companies pay significantly more.
- Total compensation β base salary + equity + bonus + benefits. A startup offering Β£80K + 0.1% equity is a very different proposition from Β£90K at Big Tech.
- Never anchor first β when asked about salary expectations, reply: "I'm flexible β what's the range for this role?" Let them set the anchor.
- Competing offers β the single strongest negotiation position is having 2+ offers simultaneously. Start the job search process at multiple companies concurrently.
Training module 05-salary has full scripts for negotiation calls at different seniority levels.
Continued Learning System
AI moves too fast to learn reactively
Getting the job is the start, not the finish. AI engineering evolves weekly. You need a learning system to stay current:
- Information diet β follow 20-30 key people, subscribe to 5 quality newsletters. Recommended: Simon Willison's blog, Interconnects, Ahead of AI, The Batch (Andrew Ng).
- 1 paper, 1 project rule β every research paper you read should produce one small experiment. Reading without implementing is entertainment, not education.
- 30 minutes daily β a daily habit of learning beats weekend crash courses. Block the time, protect it, show up.
- Accountability partner β find someone at a similar level. Share weekly goals, review each other's work, stay motivated together.
Building Your Network
80% of jobs come through connections
Networking isn't schmoozing β it's building genuine professional relationships with people in your field. And it's the most effective job search strategy:
- Communities β AI Discord servers (OpenAI, LangChain, LlamaIndex), Slack groups (Rocket.Chat AI, MLOps.community), local meetups. Find where your people gather and participate consistently.
- Speaking β start with 10-minute lightning talks at local meetups. Speaking positions you as someone worth paying attention to. The effect compounds.
- Help others β answer questions in community forums, review others' code, mentor people junior to you. Generosity builds reputation. Reputation attracts opportunities.
- Conferences β attend 1-2 per year. NeurIPS (research), AI Engineer World's Fair (practical engineering), and local meetups (networking).
π Capstone Project: Your Signature Work
Month 6 is your capstone β one impressive project in your chosen specialisation. The project that people Google and find YOU. Not a tutorial, not a copy. Original work that showcases your unique skills.
π Full Training Module Quick-Reference
π€ Month 4: Agents
π Month 5: Production
AI Engineer vs ML Engineer
Different jobs, different skill sets, different career paths
AI Engineer
ML Engineer
Salary & Career Progression
The field is new β fast promotion potential
| Level | Experience | Focus | Salary Range |
|---|---|---|---|
| Junior AI Engineer | 0β2 years | Build & maintain RAG pipelines | $100Kβ$140K |
| Mid-Level | 2β4 years | Design RAG architectures, build agents | $140Kβ$200K |
| Senior | 4β7 years | Architecture decisions, team leadership | $200Kβ$350K+ |
| Staff / Architect | 7+ years | Company-wide AI strategy, mentoring | $300Kβ$500K+ |
| Head of AI / VP | 8+ years | AI roadmap, budget, executive stakeholders | $350Kβ$600K+ |
7-Day Quick Start
Don't just bookmark β start today
First API Call
Sign up for OpenAI API. Write a Python script to call GPT-4.1 mini. Working chatbot in an hour.
Second Provider
Sign up for Anthropic. Rewrite for Claude. Compare APIs and responses.
Local Models
Install Ollama. Pull llama3. Compare quality & speed vs cloud APIs.
First Vector Search
pip install chromadb. Embed 20 paragraphs. Query with natural language. ~50 lines.
Job Market
Browse 10 AI engineer job postings. Map skills to roadmap. Find your top 3 gaps.
Build Intuition
Watch Karpathy's "Let's build GPT". Understand transformers and attention.
Start Portfolio
Create ai-engineering-portfolio repo. Write README with 3 project ideas. Block 1hr/day.