Data Structures
Lists, dicts, sets, tuples — the building blocks of every Python program. You'll use dicts constantly when working with JSON APIs.
Learning Objectives
- Create and manipulate lists, dicts, sets, and tuples
- Use list comprehensions for clean, fast data transforms
- Choose the right data structure for each task
- Nest data structures for complex API responses
- Understand mutability vs immutability
Lists — Ordered, Mutable Sequences
python
# Create a list
models = ["gpt-4o", "claude-sonnet-4.6", "gemini-2.5-pro"]
# Access by index (0-based)
print(models[0]) # "gpt-4o"
print(models[-1]) # "gemini-2.5-pro" (last item)
# Modify
models.append("llama3") # add to end
models.insert(0, "mistral") # insert at position
models.remove("gemini-2.5-pro") # remove by value
popped = models.pop() # remove & return last
# Check membership
print("gpt-4o" in models) # True
# Slicing
print(models[1:3]) # items at index 1 and 2
print(models[:2]) # first two items
print(models[::2]) # every other item
# Useful methods
len(models) # length
sorted(models) # sorted copy
" | ".join(models) # "gpt-4o | claude-sonnet-4.6 | gemini-2.5-pro"Dictionaries — Key-Value Pairs (Most Used!)
APIs return JSON which becomes Python dicts. You'll work with nested dicts constantly in AI engineering.
python
# Create a dict
config = {
"model": "gpt-4o",
"temperature": 0.7,
"max_tokens": 1000,
}
# Access values
print(config["model"]) # "gpt-4o"
print(config.get("top_p", 1.0)) # 1.0 (default if missing — safe!)
# ⚠️ KeyError if key doesn't exist:
# config["top_p"] → KeyError!
# Modify
config["temperature"] = 0.9 # update
config["stream"] = True # add new key
del config["max_tokens"] # delete key
# Iterate
for key, value in config.items():
print(f"{key}: {value}")
# Check if key exists
if "model" in config:
print(f"Using model: {config['model']}")Nested Dicts — API Response Pattern
python
# Typical OpenAI API response (nested dicts + lists)
response = {
"choices": [
{
"message": {"role": "assistant", "content": "Hello!"},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 2,
"total_tokens": 12
},
"model": "gpt-4o-mini"
}
# Access nested data
reply = response["choices"][0]["message"]["content"]
tokens = response["usage"]["total_tokens"]
model = response["model"]
# Safe access with .get() for optional fields
finish = response["choices"][0].get("finish_reason", "unknown")Sets — Unique Items, Fast Lookup
python
# Sets: unordered, no duplicates, fast membership testing
supported = {"gpt-4o", "claude-sonnet-4.6", "gemini-2.5-pro"}
# Add and remove
supported.add("llama3")
supported.discard("gemini-2.5-pro") # safe remove (no error if missing)
# Set operations (perfect for comparing lists!)
available = {"gpt-4o", "mistral", "llama3"}
can_use = supported & available # intersection
all_models = supported | available # union
only_supported = supported - available # difference
# Remove duplicates from a list
models = ["gpt-4o", "gpt-4o", "claude-sonnet-4.6", "gpt-4o"]
unique = list(set(models)) # ["gpt-4o", "claude-sonnet-4.6"]Tuples — Immutable Sequences
python
# Tuples: like lists but can't be modified after creation
pricing = ("gpt-4o-mini", 0.00015, 0.0006) # model, input, output per 1K tokens
# Unpacking
model, input_price, output_price = pricing
print(f"{model}: ${input_price}/1K in, ${output_price}/1K out")
# Useful as dict keys (lists can't be dict keys)
coordinates = {(37.77, -122.42): "San Francisco", (40.71, -74.01): "New York"}List Comprehensions — Pythonic Data Transforms
python
# Instead of this:
squares = []
for i in range(10):
squares.append(i ** 2)
# Write this:
squares = [i ** 2 for i in range(10)]
# [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
# With condition (filter)
pricing = {"gpt-4o": 0.005, "gpt-4o-mini": 0.00015, "claude-opus-4-7": 0.015}
cheap = [model for model, price in pricing.items() if price < 0.001]
# ["gpt-4o-mini"]
# Dict comprehension
cost_per_1k = {m: p * 1000 for m, p in pricing.items()}
# {"gpt-4o": 5.0, "gpt-4o-mini": 0.15, "claude-opus-4-7": 15.0}
# Nested comprehension (flatten)
matrix = [[1, 2], [3, 4], [5, 6]]
flat = [x for row in matrix for x in row]
# [1, 2, 3, 4, 5, 6]Which Data Structure to Use?
textCheat Sheet
list → ordered, mutable, duplicates OK → [1, 2, 2, 3]
dict → key-value, fast lookup → {"name": "Alice"}
set → unique items, fast membership → {1, 2, 3}
tuple → immutable, safe as dict key → (37.77, -122.42)
AI Engineering patterns:
- API responses → nested dicts (response["choices"][0]["message"])
- Model configs → dicts ({"model": "gpt-4o", "temp": 0.7})
- Token lists → lists (["Hello", "world", "!"])
- Unique IDs → sets (check if doc_id already processed)
- Coordinates → tuples ((lat, lon) as dict keys)Exercise — Parse an API Response
Given the nested dict from the "Nested Dicts" section above, write code to: (1) extract the assistant's reply, (2) calculate the cost assuming $0.00015/1K input tokens and $0.0006/1K output tokens, (3) create a list of (model, cost) tuples from a dict of pricing.