Block 2: Programming and Database Skills
Topic 2.2 · 2 Objectives
Python's power is amplified by its extensive ecosystem of modules and packages. A module is a single .py file containing definitions and statements. A package is a directory of modules organized with an __init__.py file. Understanding how to import, manage, and create modules is fundamental to effective Python development.
Python provides several import styles, each suited for different scenarios:
import moduleImports the entire module. You access its contents with dot notation.
With a standard import, you must always prefix function calls with the module name (e.g., math.sqrt()). Calling sqrt() alone will raise a NameError.
from module import nameImports specific functions, classes, or variables directly into the current namespace.
Avoid from module import * in production code. It pollutes the namespace and makes it unclear where names originate, which can lead to subtle bugs when two modules export identically named functions.
import module as aliasCreates a shorter alias for frequently used modules. This is standard practice in the data science ecosystem.
| Style | Syntax | Usage | Access |
|---|---|---|---|
| Standard | import math |
Full module needed | math.sqrt() |
| Selective | from math import sqrt |
Few specific names | sqrt() |
| Aliased | import numpy as np |
Shorten long names | np.array() |
| Selective + Alias | from datetime import datetime as dt |
Avoid name conflicts | dt.now() |
| Wildcard | from math import * |
Quick scripts only | sqrt() |
Python ships with a rich standard library. For the exam, you should know these key modules and their purposes:
| Module | Purpose | Key Functions / Classes |
|---|---|---|
csv |
Read/write CSV files | reader(), writer(), DictReader(), DictWriter() |
os |
OS interaction, file paths | getcwd(), listdir(), path.join(), path.exists() |
math |
Mathematical functions | sqrt(), ceil(), floor(), log(), pi |
statistics |
Statistical calculations | mean(), median(), stdev(), mode() |
datetime |
Date/time manipulation | datetime.now(), timedelta, strftime(), strptime() |
collections |
Specialized data structures | Counter, defaultdict, OrderedDict, namedtuple |
json |
JSON encoding/decoding | load(), dump(), loads(), dumps() |
PIP (Pip Installs Packages) is Python's standard package manager. It downloads and installs packages from the Python Package Index (PyPI).
| Command | Description | Example |
|---|---|---|
pip install |
Install a package | pip install pandas |
pip install == |
Install specific version | pip install pandas==2.1.0 |
pip uninstall |
Remove a package | pip uninstall pandas |
pip list |
Show installed packages | pip list |
pip freeze |
Output installed packages in requirements format | pip freeze > requirements.txt |
pip install -r |
Install from requirements file | pip install -r requirements.txt |
pip show |
Display package info | pip show numpy |
pip install --upgrade |
Upgrade a package | pip install --upgrade pandas |
pip list vs. pip freeze:
pip list displays packages in a human-readable table. pip freeze outputs in package==version format, ideal for generating requirements.txt files that can recreate an environment.
Any Python file can serve as a module. You can organize reusable code into your own modules and packages.
When you import data_utils, Python searches in this order: (1) the current directory, (2) directories listed in the PYTHONPATH environment variable, (3) the standard library, (4) site-packages (where pip installs packages). You can inspect the search path with import sys; print(sys.path).
Robust data scripts anticipate and gracefully handle errors. Python's exception handling mechanism lets you catch errors at runtime and respond appropriately, rather than letting the entire program crash.
The full exception handling structure has four clauses:
| Clause | When It Runs | Required? |
|---|---|---|
try |
Code that might raise an exception | Yes |
except |
Runs if the specified exception occurs | Yes (at least one) |
else |
Runs only if no exception was raised | No |
finally |
Always runs, exception or not | No |
try → if exception, jump to matching except; if no exception, run else. In both cases, finally always executes last. This makes finally perfect for releasing resources like file handles and database connections.
Knowing which exception corresponds to which error is essential for the exam and for writing robust data code.
| Exception | Raised When | Typical Data Scenario |
|---|---|---|
ValueError |
Right type, wrong value | int("abc"), float("N/A") |
TypeError |
Wrong type for operation | "5" + 3, passing wrong arg type |
KeyError |
Missing dictionary key | row["nonexistent_col"] |
IndexError |
Index out of range | data[100] on a 50-item list |
FileNotFoundError |
File does not exist | open("missing.csv") |
ZeroDivisionError |
Division by zero | Computing a ratio with a zero denominator |
ImportError |
Module cannot be imported | import nonexistent_lib |
AttributeError |
Object has no such attribute | Calling .append() on a tuple |
NameError |
Variable not defined | Using a variable before assignment |
Use the raise statement to intentionally trigger exceptions when your code detects invalid conditions. This is critical for input validation in data pipelines.
Python tracebacks are read bottom to top. The last line shows the exception type and message; lines above show the call stack with the most recent call at the bottom.
ValueError) and the message (could not convert string to float: 'N/A').return float(value).Data processing scripts frequently encounter malformed data, missing files, and network issues. Here are practical patterns for robust data code.
except
A bare except: catches everything, including KeyboardInterrupt and SystemExit, which makes it impossible to stop your program. Always specify exception types.
finally for Resource Cleanup
Always close files, database connections, and network sockets in a finally block (or use with statements which handle this automatically).
Check data types, ranges, and required fields at the start of functions. Raise descriptive exceptions for invalid inputs rather than letting cryptic errors surface later in the pipeline.
Swallowing exceptions with pass masks bugs. At minimum, log the error so it can be diagnosed later.
else to Separate Success Logic
Code in the else block only runs when try succeeds. This keeps error-handling code separate from normal flow, improving readability.
Test your knowledge with 10 multiple-choice questions. Click an option to see if it is correct.
sqrt function from the math module?import sqrt from mathfrom math import sqrtimport math.sqrtmath import sqrtfrom module import name imports a specific name from a module. Option A reverses the order. Option C would import the math module (not sqrt directly). Option D is invalid syntax.pip list and pip freeze?pip list shows only standard library modules; pip freeze shows allpip list shows a human-readable table; pip freeze outputs in package==version format for requirements filespip freeze locks packages so they cannot be upgradedpip list displays installed packages in a formatted table. pip freeze outputs them in package==version format, which can be redirected to a requirements.txt file for environment reproducibility.int("hello")?TypeErrorValueErrorNameErrorSyntaxErrorint() receives a string (correct type), but "hello" is not a valid integer representation (wrong value). This triggers a ValueError. A TypeError would occur if you passed a type that int() cannot convert at all (e.g., a list).else block execute in a try/except/else/finally structure?excepttry blockfinally block is also presentelse clause runs only if the try block completes without raising any exception. It is useful for code that should run on success but should not be inside the try block (to avoid accidentally catching its exceptions).import numpy as npprint(type(np))<class 'numpy'><class 'module'><class 'package'><class 'alias'>module object. The alias np is simply an alternate name for the same module object. type(np) returns <class 'module'>.pip command generates a file that can recreate the current environment?pip list > requirements.txtpip freeze > requirements.txtpip export > requirements.txtpip save > requirements.txtpip freeze outputs packages in the package==version format that pip install -r requirements.txt expects. pip list produces a formatted table that is not directly usable by pip install -r. Options C and D are not valid pip commands.try: x = 10 / 0except ZeroDivisionError: print("A")else: print("B")finally: print("C")try block raises ZeroDivisionError, so the except block runs and prints "A". Since an exception occurred, the else block is skipped. The finally block always runs, printing "C". Result: A C.from module import *?def set_quantity(qty): if qty <= 0: raise ???("Quantity must be positive.") return qtyTypeErrorValueErrorKeyErrorIndexErrorValueError is appropriate when the type is correct but the value does not meet requirements. TypeError would be used if the argument was the wrong type entirely (e.g., a string instead of a number).Counter and defaultdict?statisticsitertoolscollectionsfunctoolscollections module provides specialized container types including Counter (for counting hashable objects), defaultdict (dict with default factory), OrderedDict, namedtuple, and deque.