Data Structures

Lists, dicts, sets, tuples — the building blocks of every Python program. You'll use dicts constantly when working with JSON APIs.

Learning Objectives

Create and manipulate lists, dicts, sets, and tuples
Use list comprehensions for clean, fast data transforms
Choose the right data structure for each task
Nest data structures for complex API responses
Understand mutability vs immutability

Lists — Ordered, Mutable Sequences

python

# Create a list
models = ["gpt-4o", "claude-sonnet-4.6", "gemini-2.5-pro"]

# Access by index (0-based)
print(models[0])     # "gpt-4o"
print(models[-1])    # "gemini-2.5-pro" (last item)

# Modify
models.append("llama3")      # add to end
models.insert(0, "mistral")   # insert at position
models.remove("gemini-2.5-pro") # remove by value
popped = models.pop()       # remove & return last

# Check membership
print("gpt-4o" in models)    # True

# Slicing
print(models[1:3])    # items at index 1 and 2
print(models[:2])     # first two items
print(models[::2])    # every other item

# Useful methods
len(models)           # length
sorted(models)        # sorted copy
" | ".join(models)    # "gpt-4o | claude-sonnet-4.6 | gemini-2.5-pro"

Dictionaries — Key-Value Pairs (Most Used!)

APIs return JSON which becomes Python dicts. You'll work with nested dicts constantly in AI engineering.

python

# Create a dict
config = {
    "model": "gpt-4o",
    "temperature": 0.7,
    "max_tokens": 1000,
}

# Access values
print(config["model"])           # "gpt-4o"
print(config.get("top_p", 1.0))  # 1.0 (default if missing — safe!)

# ⚠️ KeyError if key doesn't exist:
# config["top_p"]  → KeyError!

# Modify
config["temperature"] = 0.9   # update
config["stream"] = True       # add new key
del config["max_tokens"]       # delete key

# Iterate
for key, value in config.items():
    print(f"{key}: {value}")

# Check if key exists
if "model" in config:
    print(f"Using model: {config['model']}")

Nested Dicts — API Response Pattern

python

# Typical OpenAI API response (nested dicts + lists)
response = {
    "choices": [
        {
            "message": {"role": "assistant", "content": "Hello!"},
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 10,
        "completion_tokens": 2,
        "total_tokens": 12
    },
    "model": "gpt-4o-mini"
}

# Access nested data
reply = response["choices"][0]["message"]["content"]
tokens = response["usage"]["total_tokens"]
model = response["model"]

# Safe access with .get() for optional fields
finish = response["choices"][0].get("finish_reason", "unknown")

Sets — Unique Items, Fast Lookup

python

# Sets: unordered, no duplicates, fast membership testing
supported = {"gpt-4o", "claude-sonnet-4.6", "gemini-2.5-pro"}

# Add and remove
supported.add("llama3")
supported.discard("gemini-2.5-pro")  # safe remove (no error if missing)

# Set operations (perfect for comparing lists!)
available = {"gpt-4o", "mistral", "llama3"}
can_use = supported & available         # intersection
all_models = supported | available      # union
only_supported = supported - available  # difference

# Remove duplicates from a list
models = ["gpt-4o", "gpt-4o", "claude-sonnet-4.6", "gpt-4o"]
unique = list(set(models))  # ["gpt-4o", "claude-sonnet-4.6"]

Tuples — Immutable Sequences

python

# Tuples: like lists but can't be modified after creation
pricing = ("gpt-4o-mini", 0.00015, 0.0006)  # model, input, output per 1K tokens

# Unpacking
model, input_price, output_price = pricing
print(f"{model}: ${input_price}/1K in, ${output_price}/1K out")

# Useful as dict keys (lists can't be dict keys)
coordinates = {(37.77, -122.42): "San Francisco", (40.71, -74.01): "New York"}

List Comprehensions — Pythonic Data Transforms

python

# Instead of this:
squares = []
for i in range(10):
    squares.append(i ** 2)

# Write this:
squares = [i ** 2 for i in range(10)]
# [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

# With condition (filter)
pricing = {"gpt-4o": 0.005, "gpt-4o-mini": 0.00015, "claude-opus-4-7": 0.015}
cheap = [model for model, price in pricing.items() if price < 0.001]
# ["gpt-4o-mini"]

# Dict comprehension
cost_per_1k = {m: p * 1000 for m, p in pricing.items()}
# {"gpt-4o": 5.0, "gpt-4o-mini": 0.15, "claude-opus-4-7": 15.0}

# Nested comprehension (flatten)
matrix = [[1, 2], [3, 4], [5, 6]]
flat = [x for row in matrix for x in row]
# [1, 2, 3, 4, 5, 6]

Which Data Structure to Use?

textCheat Sheet

list    → ordered, mutable, duplicates OK   → [1, 2, 2, 3]
dict    → key-value, fast lookup            → {"name": "Alice"}
set     → unique items, fast membership      → {1, 2, 3}
tuple   → immutable, safe as dict key        → (37.77, -122.42)

AI Engineering patterns:
- API responses    → nested dicts (response["choices"][0]["message"])
- Model configs    → dicts ({"model": "gpt-4o", "temp": 0.7})
- Token lists      → lists (["Hello", "world", "!"])
- Unique IDs       → sets (check if doc_id already processed)
- Coordinates      → tuples ((lat, lon) as dict keys)

📝

Exercise — Parse an API Response

Given the nested dict from the "Nested Dicts" section above, write code to: (1) extract the assistant's reply, (2) calculate the cost assuming $0.00015/1K input tokens and $0.0006/1K output tokens, (3) create a list of (model, cost) tuples from a dict of pricing.

✅ You've completed this step when you can confirm:

You can create and manipulate lists and dicts fluently You can navigate nested API response dicts You write comprehensions instead of loops for simple transforms You know when to use sets vs lists