Topic 2.2: Module Management and Exception Handling

2.2.1 Import Modules and Manage Packages with PIP

Python's power is amplified by its extensive ecosystem of modules and packages. A module is a single .py file containing definitions and statements. A package is a directory of modules organized with an __init__.py file. Understanding how to import, manage, and create modules is fundamental to effective Python development.

Standard, Selective, and Aliased Imports

Python provides several import styles, each suited for different scenarios:

1. Standard Import: `import module`

Imports the entire module. You access its contents with dot notation.

# Standard import — import the entire module
import math

# Access functions via dot notation
result = math.sqrt(144)    # 12.0
pi_val = math.pi             # 3.141592653589793
log_val = math.log(100, 10)  # 2.0

import os
cwd = os.getcwd()            # Current working directory

Exam Tip:

With a standard import, you must always prefix function calls with the module name (e.g., math.sqrt()). Calling sqrt() alone will raise a NameError.

2. Selective Import: `from module import name`

Imports specific functions, classes, or variables directly into the current namespace.

# Selective import — import specific names
from math import sqrt, pi, ceil

result = sqrt(144)   # 12.0 (no prefix needed)
rounded = ceil(4.2)   # 5

# Import multiple items from collections
from collections import Counter, defaultdict

word_counts = Counter(["apple", "banana", "apple", "cherry"])
# Counter({'apple': 2, 'banana': 1, 'cherry': 1})

# Wildcard import (generally discouraged)
from math import *
# Brings everything into namespace — can cause name collisions

Warning:

Avoid from module import * in production code. It pollutes the namespace and makes it unclear where names originate, which can lead to subtle bugs when two modules export identically named functions.

3. Aliased Import: `import module as alias`

Creates a shorter alias for frequently used modules. This is standard practice in the data science ecosystem.

# Aliased imports — data science conventions
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Use the alias for all calls
arr = np.array([1, 2, 3, 4, 5])
df = pd.DataFrame({"col": [10, 20, 30]})

# You can also alias selective imports
from datetime import datetime as dt
now = dt.now()

Summary of Import Styles

Style	Syntax	Usage	Access
Standard	`import math`	Full module needed	`math.sqrt()`
Selective	`from math import sqrt`	Few specific names	`sqrt()`
Aliased	`import numpy as np`	Shorten long names	`np.array()`
Selective + Alias	`from datetime import datetime as dt`	Avoid name conflicts	`dt.now()`
Wildcard	`from math import *`	Quick scripts only	`sqrt()`

Python Standard Library Modules

Python ships with a rich standard library. For the exam, you should know these key modules and their purposes:

Module	Purpose	Key Functions / Classes
`csv`	Read/write CSV files	`reader()`, `writer()`, `DictReader()`, `DictWriter()`
`os`	OS interaction, file paths	`getcwd()`, `listdir()`, `path.join()`, `path.exists()`
`math`	Mathematical functions	`sqrt()`, `ceil()`, `floor()`, `log()`, `pi`
`statistics`	Statistical calculations	`mean()`, `median()`, `stdev()`, `mode()`
`datetime`	Date/time manipulation	`datetime.now()`, `timedelta`, `strftime()`, `strptime()`
`collections`	Specialized data structures	`Counter`, `defaultdict`, `OrderedDict`, `namedtuple`
`json`	JSON encoding/decoding	`load()`, `dump()`, `loads()`, `dumps()`

# csv module — reading CSV files
import csv

with open("sales.csv", "r") as f:
    reader = csv.DictReader(f)
    for row in reader:
        print(row["product"], row["revenue"])

# statistics module — quick stats
import statistics

data = [85, 90, 78, 92, 88, 76, 95]
print(statistics.mean(data))    # 86.28571428571429
print(statistics.median(data))  # 88
print(statistics.stdev(data))   # 7.1586...

# datetime module — working with dates
from datetime import datetime, timedelta

now = datetime.now()
print(now.strftime("%Y-%m-%d %H:%M"))  # 2026-04-25 14:30

one_week_ago = now - timedelta(days=7)
print(one_week_ago.strftime("%B %d, %Y"))  # April 18, 2026

# collections module — Counter for frequency analysis
from collections import Counter, defaultdict

colors = ["red", "blue", "red", "green", "blue", "red"]
freq = Counter(colors)
print(freq.most_common(2))  # [('red', 3), ('blue', 2)]

# defaultdict provides default values for missing keys
scores = defaultdict(list)
scores["Alice"].append(95)
scores["Bob"].append(87)
scores["Alice"].append(91)
print(scores)  # defaultdict(list, {'Alice': [95, 91], 'Bob': [87]})

Managing Packages with PIP

PIP (Pip Installs Packages) is Python's standard package manager. It downloads and installs packages from the Python Package Index (PyPI).

Essential PIP Commands

Command	Description	Example
`pip install`	Install a package	`pip install pandas`
`pip install ==`	Install specific version	`pip install pandas==2.1.0`
`pip uninstall`	Remove a package	`pip uninstall pandas`
`pip list`	Show installed packages	`pip list`
`pip freeze`	Output installed packages in requirements format	`pip freeze > requirements.txt`
`pip install -r`	Install from requirements file	`pip install -r requirements.txt`
`pip show`	Display package info	`pip show numpy`
`pip install --upgrade`	Upgrade a package	`pip install --upgrade pandas`

# Terminal / Command line — PIP commands

# Install a package from PyPI
$ pip install pandas

# Install a specific version
$ pip install pandas==2.1.0

# Upgrade an existing package
$ pip install --upgrade pandas

# Uninstall a package
$ pip uninstall pandas

# List all installed packages
$ pip list
# Package    Version
# ---------- -------
# numpy      1.26.4
# pandas     2.2.1
# ...

# Export installed packages to a requirements file
$ pip freeze > requirements.txt

# Install all packages from a requirements file
$ pip install -r requirements.txt

# Show details about a specific package
$ pip show numpy
# Name: numpy
# Version: 1.26.4
# Summary: Fundamental package for array computing...
# Location: /usr/lib/python3/dist-packages

Key Distinction — pip list vs. pip freeze:

pip list displays packages in a human-readable table. pip freeze outputs in package==version format, ideal for generating requirements.txt files that can recreate an environment.

Creating and Importing Custom Modules

Any Python file can serve as a module. You can organize reusable code into your own modules and packages.

# File: data_utils.py (your custom module)

def clean_column_name(name):
    """Standardize a column name to lowercase with underscores."""
    return name.strip().lower().replace(" ", "_")

def remove_outliers(data, threshold=3):
    """Remove values more than `threshold` std devs from the mean."""
    import statistics
    mean = statistics.mean(data)
    stdev = statistics.stdev(data)
    return [x for x in data if abs(x - mean) <= threshold * stdev]

VALID_FORMATS = [".csv", ".json", ".xlsx", ".parquet"]

# File: main.py (importing your custom module)

# Standard import — the module file must be in the same directory
# or on Python's module search path (sys.path)
import data_utils

clean = data_utils.clean_column_name("  Sales Revenue  ")
print(clean)  # "sales_revenue"

# Selective import
from data_utils import remove_outliers, VALID_FORMATS

cleaned = remove_outliers([10, 12, 11, 200, 13, 9])
print(cleaned)  # [10, 12, 11, 13, 9]

Creating a Package

# Package directory structure:
# my_analytics/
#     __init__.py       <-- makes it a package
#     cleaning.py
#     analysis.py
#     visualization.py

# __init__.py can expose key names for convenience
from .cleaning import clean_column_name
from .analysis import run_summary

# In another script you can then do:
from my_analytics import clean_column_name
# or
from my_analytics.analysis import run_summary

Module Search Order:

When you import data_utils, Python searches in this order: (1) the current directory, (2) directories listed in the PYTHONPATH environment variable, (3) the standard library, (4) site-packages (where pip installs packages). You can inspect the search path with import sys; print(sys.path).

2.2.2 Exception Handling and Script Robustness

Robust data scripts anticipate and gracefully handle errors. Python's exception handling mechanism lets you catch errors at runtime and respond appropriately, rather than letting the entire program crash.

try / except / else / finally Blocks

The full exception handling structure has four clauses:

Clause	When It Runs	Required?
`try`	Code that might raise an exception	Yes
`except`	Runs if the specified exception occurs	Yes (at least one)
`else`	Runs only if no exception was raised	No
`finally`	Always runs, exception or not	No

# Basic try/except
try:
    number = int(input("Enter a number: "))
    result = 100 / number
except ValueError:
    print("Invalid input. Please enter a valid integer.")
except ZeroDivisionError:
    print("Cannot divide by zero.")

# Full try/except/else/finally
def read_config(filepath):
    try:
        f = open(filepath, "r")
        content = f.read()
    except FileNotFoundError:
        print(f"Config file not found: {filepath}")
        content = None
    except PermissionError:
        print(f"No permission to read: {filepath}")
        content = None
    else:
        # Only runs if no exception occurred
        print(f"Successfully loaded {len(content)} characters.")
    finally:
        # Always runs — ideal for cleanup
        try:
            f.close()
        except NameError:
            pass  # f was never assigned if open() failed

    return content

Execution Flow:

try → if exception, jump to matching except; if no exception, run else. In both cases, finally always executes last. This makes finally perfect for releasing resources like file handles and database connections.

# Catching multiple exceptions in one handler
try:
    value = data[key]
    result = int(value)
except (KeyError, ValueError, TypeError) as e:
    print(f"Data error: {e}")
    result = None

# Using the exception object for details
try:
    scores = [85, 92, 78]
    print(scores[10])
except IndexError as e:
    print(f"Error type: {type(e).__name__}")  # IndexError
    print(f"Message: {e}")                      # list index out of range

Common Built-in Exceptions

Knowing which exception corresponds to which error is essential for the exam and for writing robust data code.

Exception	Raised When	Typical Data Scenario
`ValueError`	Right type, wrong value	`int("abc")`, `float("N/A")`
`TypeError`	Wrong type for operation	`"5" + 3`, passing wrong arg type
`KeyError`	Missing dictionary key	`row["nonexistent_col"]`
`IndexError`	Index out of range	`data[100]` on a 50-item list
`FileNotFoundError`	File does not exist	`open("missing.csv")`
`ZeroDivisionError`	Division by zero	Computing a ratio with a zero denominator
`ImportError`	Module cannot be imported	`import nonexistent_lib`
`AttributeError`	Object has no such attribute	Calling `.append()` on a tuple
`NameError`	Variable not defined	Using a variable before assignment

# Demonstrating common exceptions

# ValueError
try:
    age = int("twenty")
except ValueError as e:
    print(f"ValueError: {e}")
    # ValueError: invalid literal for int() with base 10: 'twenty'

# TypeError
try:
    result = "price: " + 49.99
except TypeError as e:
    print(f"TypeError: {e}")
    # TypeError: can only concatenate str (not "float") to str

# KeyError
try:
    record = {"name": "Alice", "age": 30}
    email = record["email"]
except KeyError as e:
    print(f"KeyError: missing key {e}")
    # KeyError: missing key 'email'

# FileNotFoundError
try:
    with open("data_2026.csv") as f:
        data = f.read()
except FileNotFoundError:
    print("File not found. Check the file path.")

# ZeroDivisionError
try:
    conversion_rate = 0 / 0
except ZeroDivisionError:
    print("Cannot compute rate: division by zero.")
    conversion_rate = 0.0

Raising Exceptions and Custom Messages

Use the raise statement to intentionally trigger exceptions when your code detects invalid conditions. This is critical for input validation in data pipelines.

# Raising exceptions for validation
def validate_age(age):
    if not isinstance(age, (int, float)):
        raise TypeError(f"Age must be numeric, got {type(age).__name__}")
    if age < 0 or age > 150:
        raise ValueError(f"Age must be between 0 and 150, got {age}")
    return True

# Usage
try:
    validate_age(-5)
except ValueError as e:
    print(e)  # Age must be between 0 and 150, got -5

try:
    validate_age("thirty")
except TypeError as e:
    print(e)  # Age must be numeric, got str

# Validating a dataset before processing
def validate_dataset(df):
    """Validate that a DataFrame meets minimum requirements."""
    required_cols = ["id", "date", "value"]

    if df.empty:
        raise ValueError("Dataset is empty.")

    missing = [col for col in required_cols if col not in df.columns]
    if missing:
        raise KeyError(f"Missing required columns: {missing}")

    if df["id"].duplicated().any():
        raise ValueError("Duplicate IDs found in dataset.")

    return True

# Re-raising an exception after logging
import logging

def process_file(path):
    try:
        with open(path) as f:
            data = f.read()
    except FileNotFoundError:
        logging.error(f"File not found: {path}")
        raise  # Re-raise the same exception

Interpreting Error Messages and Tracebacks

Python tracebacks are read bottom to top. The last line shows the exception type and message; lines above show the call stack with the most recent call at the bottom.

# Example traceback:
# Traceback (most recent call last):
#   File "pipeline.py", line 45, in <module>
#     result = process_records(data)
#   File "pipeline.py", line 32, in process_records
#     cleaned = clean_value(record["price"])
#   File "pipeline.py", line 18, in clean_value
#     return float(value)
# ValueError: could not convert string to float: 'N/A'

Reading Tracebacks:

Bottom line: The exception type (ValueError) and the message (could not convert string to float: 'N/A').
Second from bottom: The exact line of code that caused the error — return float(value).
Working upward: The chain of function calls that led to the error, with file names and line numbers.

# Common traceback patterns in data code:

# 1. KeyError in pandas — misspelled column name
# KeyError: 'reveneu'
# Fix: Check df.columns and correct the spelling

# 2. FileNotFoundError — wrong path or filename
# FileNotFoundError: [Errno 2] No such file or directory: 'dta/sales.csv'
# Fix: Check os.path.exists() and correct the path

# 3. TypeError — operations on incompatible types
# TypeError: unsupported operand type(s) for +: 'int' and 'str'
# Fix: Convert types before operations

# 4. ImportError — module not installed
# ModuleNotFoundError: No module named 'sklearn'
# Fix: pip install scikit-learn

Real-World Error Handling in Data Code

Data processing scripts frequently encounter malformed data, missing files, and network issues. Here are practical patterns for robust data code.

Reading Files Safely

import csv
import os

def load_csv_safely(filepath):
    """Load a CSV file with comprehensive error handling."""
    if not os.path.exists(filepath):
        print(f"Error: File '{filepath}' does not exist.")
        return []

    if not filepath.endswith(".csv"):
        print(f"Warning: '{filepath}' may not be a CSV file.")

    rows = []
    try:
        with open(filepath, "r", encoding="utf-8") as f:
            reader = csv.DictReader(f)
            for i, row in enumerate(reader):
                rows.append(row)
    except UnicodeDecodeError:
        print("Encoding error. Trying latin-1...")
        with open(filepath, "r", encoding="latin-1") as f:
            reader = csv.DictReader(f)
            rows = list(reader)
    except csv.Error as e:
        print(f"CSV parsing error: {e}")
    else:
        print(f"Successfully loaded {len(rows)} rows.")

    return rows

Parsing Data with Type Conversion

def parse_numeric_column(records, column):
    """Safely convert a column to float, handling errors."""
    parsed = []
    errors = []

    for i, record in enumerate(records):
        try:
            value = record[column]
            parsed.append(float(value))
        except KeyError:
            errors.append(f"Row {i}: column '{column}' not found")
        except (ValueError, TypeError):
            errors.append(f"Row {i}: cannot convert '{record.get(column)}' to float")

    if errors:
        print(f"Encountered {len(errors)} errors in column '{column}':")
        for err in errors[:5]:  # Show first 5 errors
            print(f"  - {err}")

    return parsed

Handling Common Pandas Errors

import pandas as pd

# Safe file loading with pandas
def load_dataframe(filepath):
    try:
        df = pd.read_csv(filepath)
    except FileNotFoundError:
        print(f"File not found: {filepath}")
        return pd.DataFrame()
    except pd.errors.EmptyDataError:
        print(f"File is empty: {filepath}")
        return pd.DataFrame()
    except pd.errors.ParserError as e:
        print(f"Parse error: {e}")
        return pd.DataFrame()
    else:
        print(f"Loaded DataFrame: {df.shape[0]} rows, {df.shape[1]} columns")
        return df

# Safe column access
def safe_column_mean(df, column):
    try:
        return df[column].mean()
    except KeyError:
        print(f"Column '{column}' not found. Available: {list(df.columns)}")
        return None
    except TypeError:
        print(f"Column '{column}' is not numeric.")
        return None

Complete Data Pipeline with Error Handling

import csv
import statistics
import os

def analyze_sales(filepath):
    """End-to-end pipeline with robust error handling."""

    # Step 1: Load data
    try:
        with open(filepath, "r") as f:
            reader = csv.DictReader(f)
            records = list(reader)
    except FileNotFoundError:
        raise FileNotFoundError(f"Sales file not found: {filepath}")

    if not records:
        raise ValueError("No records found in file.")

    # Step 2: Parse and validate revenue values
    revenues = []
    skipped = 0

    for row in records:
        try:
            revenue = float(row["revenue"])
            if revenue < 0:
                raise ValueError("Negative revenue")
            revenues.append(revenue)
        except (ValueError, KeyError):
            skipped += 1
            continue

    # Step 3: Compute statistics
    try:
        result = {
            "total": sum(revenues),
            "mean": statistics.mean(revenues),
            "median": statistics.median(revenues),
            "stdev": statistics.stdev(revenues),
            "count": len(revenues),
            "skipped": skipped,
        }
    except statistics.StatisticsError:
        print("Not enough data to compute statistics.")
        return None

    return result

Best Practices for Robust Scripts

Catch Specific Exceptions, Not Bare except

A bare except: catches everything, including KeyboardInterrupt and SystemExit, which makes it impossible to stop your program. Always specify exception types.

# BAD — bare except catches everything, hides bugs
try:
    result = process(data)
except:
    pass

# GOOD — catch specific exceptions
try:
    result = process(data)
except (ValueError, TypeError) as e:
    print(f"Processing error: {e}")
    result = None

# ACCEPTABLE — catch Exception (still skips SystemExit, KeyboardInterrupt)
try:
    result = process(data)
except Exception as e:
    logging.error(f"Unexpected error: {e}")
    result = None

Use finally for Resource Cleanup

Always close files, database connections, and network sockets in a finally block (or use with statements which handle this automatically).

# Best practice: use context managers (with statement)
with open("data.csv") as f:
    data = f.read()
# File is automatically closed, even if an exception occurs

# Equivalent manual approach with finally
f = None
try:
    f = open("data.csv")
    data = f.read()
finally:
    if f:
        f.close()

Validate Inputs Early

Check data types, ranges, and required fields at the start of functions. Raise descriptive exceptions for invalid inputs rather than letting cryptic errors surface later in the pipeline.

def calculate_growth_rate(current, previous):
    """Calculate percentage growth between two values."""
    if not isinstance(current, (int, float)):
        raise TypeError(f"current must be numeric, got {type(current).__name__}")
    if not isinstance(previous, (int, float)):
        raise TypeError(f"previous must be numeric, got {type(previous).__name__}")
    if previous == 0:
        raise ValueError("previous cannot be zero (division by zero).")

    return ((current - previous) / previous) * 100

Log Errors, Do Not Silence Them

Swallowing exceptions with pass masks bugs. At minimum, log the error so it can be diagnosed later.

Use else to Separate Success Logic

Code in the else block only runs when try succeeds. This keeps error-handling code separate from normal flow, improving readability.

Practice Quiz: Module Management and Exception Handling

Test your knowledge with 10 multiple-choice questions. Click an option to see if it is correct.

Q1. What is the correct way to import only the sqrt function from the math module?

A) import sqrt from math

B) from math import sqrt

C) import math.sqrt

D) math import sqrt

Correct: B. The syntax from module import name imports a specific name from a module. Option A reverses the order. Option C would import the math module (not sqrt directly). Option D is invalid syntax.

Q2. What is the difference between pip list and pip freeze?

A) pip list shows only standard library modules; pip freeze shows all

B) They are identical commands with different names

C) pip list shows a human-readable table; pip freeze outputs in package==version format for requirements files

D) pip freeze locks packages so they cannot be upgraded

Correct: C. pip list displays installed packages in a formatted table. pip freeze outputs them in package==version format, which can be redirected to a requirements.txt file for environment reproducibility.

Q3. What exception is raised by int("hello")?

A) TypeError

B) ValueError

C) NameError

D) SyntaxError

Correct: B. int() receives a string (correct type), but "hello" is not a valid integer representation (wrong value). This triggers a ValueError. A TypeError would occur if you passed a type that int() cannot convert at all (e.g., a list).

Q4. When does the else block execute in a try/except/else/finally structure?

A) When an exception is caught by except

B) Always, regardless of exceptions

C) Only when no exception was raised in the try block

D) Only when a finally block is also present

Correct: C. The else clause runs only if the try block completes without raising any exception. It is useful for code that should run on success but should not be inside the try block (to avoid accidentally catching its exceptions).

Q5. What does the following code print?
import numpy as np
print(type(np))

A) <class 'numpy'>

B) <class 'module'>

C) <class 'package'>

D) <class 'alias'>

Correct: B. When you import a module (with or without an alias), the variable refers to a module object. The alias np is simply an alternate name for the same module object. type(np) returns <class 'module'>.

Q6. Which pip command generates a file that can recreate the current environment?

A) pip list > requirements.txt

B) pip freeze > requirements.txt

C) pip export > requirements.txt

D) pip save > requirements.txt

Correct: B. pip freeze outputs packages in the package==version format that pip install -r requirements.txt expects. pip list produces a formatted table that is not directly usable by pip install -r. Options C and D are not valid pip commands.

Q7. What happens when you run the following code?
try:
    x = 10 / 0
except ZeroDivisionError:
    print("A")
else:
    print("B")
finally:
    print("C")

A) Prints: A B C

B) Prints: A C

C) Prints: A

D) Prints: B C

Correct: B. The try block raises ZeroDivisionError, so the except block runs and prints "A". Since an exception occurred, the else block is skipped. The finally block always runs, printing "C". Result: A C.

Q8. What is the primary risk of using from module import *?

A) It is slower than a standard import

B) It only works with standard library modules

C) It can cause namespace collisions and makes it unclear where names originate

D) It installs the module from PyPI automatically

Correct: C. Wildcard imports dump all public names from a module into the current namespace. If two modules export a function with the same name, the second import silently overwrites the first. It also makes code harder to read because the origin of each name is unclear.

Q9. What exception type would you use to validate that a function argument is a positive number?
def set_quantity(qty):
    if qty <= 0:
        raise ???("Quantity must be positive.")
    return qty

A) TypeError

B) ValueError

C) KeyError

D) IndexError

Correct: B. The argument is the right type (a number), but the value is invalid (non-positive). ValueError is appropriate when the type is correct but the value does not meet requirements. TypeError would be used if the argument was the wrong type entirely (e.g., a string instead of a number).

Q10. Which standard library module provides Counter and defaultdict?

A) statistics

B) itertools

C) collections

D) functools

Correct: C. The collections module provides specialized container types including Counter (for counting hashable objects), defaultdict (dict with default factory), OrderedDict, namedtuple, and deque.

Previous: 2.1 Core Python Proficiency Next: 2.3 OOP for Data Modeling

Module Management and Exception Handling