← Roadmap 🐍 Month 1: Python
0/10

Data Structures

Lists, dicts, sets, tuples — the building blocks of every Python program. You'll use dicts constantly when working with JSON APIs.

Learning Objectives

Lists — Ordered, Mutable Sequences

python
# Create a list
models = ["gpt-4o", "claude-sonnet-4.6", "gemini-2.5-pro"]

# Access by index (0-based)
print(models[0])     # "gpt-4o"
print(models[-1])    # "gemini-2.5-pro" (last item)

# Modify
models.append("llama3")      # add to end
models.insert(0, "mistral")   # insert at position
models.remove("gemini-2.5-pro") # remove by value
popped = models.pop()       # remove & return last

# Check membership
print("gpt-4o" in models)    # True

# Slicing
print(models[1:3])    # items at index 1 and 2
print(models[:2])     # first two items
print(models[::2])    # every other item

# Useful methods
len(models)           # length
sorted(models)        # sorted copy
" | ".join(models)    # "gpt-4o | claude-sonnet-4.6 | gemini-2.5-pro"

Dictionaries — Key-Value Pairs (Most Used!)

APIs return JSON which becomes Python dicts. You'll work with nested dicts constantly in AI engineering.

python
# Create a dict
config = {
    "model": "gpt-4o",
    "temperature": 0.7,
    "max_tokens": 1000,
}

# Access values
print(config["model"])           # "gpt-4o"
print(config.get("top_p", 1.0))  # 1.0 (default if missing — safe!)

# ⚠️ KeyError if key doesn't exist:
# config["top_p"]  → KeyError!

# Modify
config["temperature"] = 0.9   # update
config["stream"] = True       # add new key
del config["max_tokens"]       # delete key

# Iterate
for key, value in config.items():
    print(f"{key}: {value}")

# Check if key exists
if "model" in config:
    print(f"Using model: {config['model']}")

Nested Dicts — API Response Pattern

python
# Typical OpenAI API response (nested dicts + lists)
response = {
    "choices": [
        {
            "message": {"role": "assistant", "content": "Hello!"},
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 10,
        "completion_tokens": 2,
        "total_tokens": 12
    },
    "model": "gpt-4o-mini"
}

# Access nested data
reply = response["choices"][0]["message"]["content"]
tokens = response["usage"]["total_tokens"]
model = response["model"]

# Safe access with .get() for optional fields
finish = response["choices"][0].get("finish_reason", "unknown")

Sets — Unique Items, Fast Lookup

python
# Sets: unordered, no duplicates, fast membership testing
supported = {"gpt-4o", "claude-sonnet-4.6", "gemini-2.5-pro"}

# Add and remove
supported.add("llama3")
supported.discard("gemini-2.5-pro")  # safe remove (no error if missing)

# Set operations (perfect for comparing lists!)
available = {"gpt-4o", "mistral", "llama3"}
can_use = supported & available         # intersection
all_models = supported | available      # union
only_supported = supported - available  # difference

# Remove duplicates from a list
models = ["gpt-4o", "gpt-4o", "claude-sonnet-4.6", "gpt-4o"]
unique = list(set(models))  # ["gpt-4o", "claude-sonnet-4.6"]

Tuples — Immutable Sequences

python
# Tuples: like lists but can't be modified after creation
pricing = ("gpt-4o-mini", 0.00015, 0.0006)  # model, input, output per 1K tokens

# Unpacking
model, input_price, output_price = pricing
print(f"{model}: ${input_price}/1K in, ${output_price}/1K out")

# Useful as dict keys (lists can't be dict keys)
coordinates = {(37.77, -122.42): "San Francisco", (40.71, -74.01): "New York"}

List Comprehensions — Pythonic Data Transforms

python
# Instead of this:
squares = []
for i in range(10):
    squares.append(i ** 2)

# Write this:
squares = [i ** 2 for i in range(10)]
# [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

# With condition (filter)
pricing = {"gpt-4o": 0.005, "gpt-4o-mini": 0.00015, "claude-opus-4-7": 0.015}
cheap = [model for model, price in pricing.items() if price < 0.001]
# ["gpt-4o-mini"]

# Dict comprehension
cost_per_1k = {m: p * 1000 for m, p in pricing.items()}
# {"gpt-4o": 5.0, "gpt-4o-mini": 0.15, "claude-opus-4-7": 15.0}

# Nested comprehension (flatten)
matrix = [[1, 2], [3, 4], [5, 6]]
flat = [x for row in matrix for x in row]
# [1, 2, 3, 4, 5, 6]

Which Data Structure to Use?

textCheat Sheet
list    → ordered, mutable, duplicates OK   → [1, 2, 2, 3]
dict    → key-value, fast lookup            → {"name": "Alice"}
set     → unique items, fast membership      → {1, 2, 3}
tuple   → immutable, safe as dict key        → (37.77, -122.42)

AI Engineering patterns:
- API responses    → nested dicts (response["choices"][0]["message"])
- Model configs    → dicts ({"model": "gpt-4o", "temp": 0.7})
- Token lists      → lists (["Hello", "world", "!"])
- Unique IDs       → sets (check if doc_id already processed)
- Coordinates      → tuples ((lat, lon) as dict keys)
📝

Exercise — Parse an API Response

Given the nested dict from the "Nested Dicts" section above, write code to: (1) extract the assistant's reply, (2) calculate the cost assuming $0.00015/1K input tokens and $0.0006/1K output tokens, (3) create a list of (model, cost) tuples from a dict of pricing.

✅ You've completed this step when you can confirm: