Block 2: Programming and Database Skills
Topic 2.1 · 5 Objectives
Python is dynamically typed, meaning you do not declare a variable's type explicitly. The interpreter infers the type at runtime based on the assigned value. The core built-in data types you must know for the exam are:
| Type | Example | Description |
|---|---|---|
int | 42 | Whole numbers, arbitrary precision |
float | 3.14 | 64-bit floating point (IEEE 754) |
str | "hello" | Immutable sequence of Unicode characters |
bool | True / False | Logical values; subclass of int |
Python resolves variable names using the LEGB rule, searching in this order:
print, len, range).Use global to modify a global variable inside a function, and nonlocal to modify a variable in an enclosing function's scope. Avoid both when possible — prefer passing arguments and returning values.
List comprehensions provide a concise, readable way to create lists by applying an expression to each item in an iterable, optionally filtering with a condition.
Functions encapsulate reusable logic. In data analysis, they help create modular, testable data-processing pipelines.
Positional arguments are matched by their position in the function call. Keyword arguments are matched by name, which makes calls more readable and order-independent.
Never use a mutable object (like a list or dict) as a default argument value. Use None instead and create the mutable inside the function body. Otherwise, the object is shared across all calls.
*args collects extra positional arguments into a tuple. **kwargs collects extra keyword arguments into a dictionary. They enable flexible function signatures.
Functions can return multiple values as a tuple. This is commonly used to return both a result and a status, or multiple computed statistics at once.
When a function uses return a, b, c, Python actually returns a single tuple (a, b, c). You can unpack it into separate variables or keep it as a tuple.
Python's strength in data analysis comes from its rich ecosystem of specialized libraries. This course expects you to know which library to choose for a given task.
| Library | Purpose | Key Use Cases |
|---|---|---|
pandas |
Data manipulation and analysis | DataFrames, reading CSV/Excel, groupby, merging, reshaping, time series |
numpy |
Numerical computing | N-dimensional arrays, vectorized math, linear algebra, random sampling |
matplotlib |
Core plotting library | Line plots, bar charts, histograms, scatter plots, full customization |
seaborn |
Statistical visualization | Heatmaps, pair plots, box plots, violin plots, distribution plots |
scikit-learn |
Machine learning | Classification, regression, clustering, model evaluation, preprocessing |
scipy |
Scientific computing | Statistical tests, optimization, interpolation, signal processing |
statistics |
Basic statistics (stdlib) | Mean, median, mode, stdev, variance — no install required |
Tabular data? Use pandas. Numerical arrays? Use numpy. Statistical tests? Use scipy.stats. Machine learning? Use scikit-learn. Plotting? Start with matplotlib; use seaborn for statistical plots. Simple stats without imports? Use statistics (standard library).
Lists are ordered, mutable sequences. They are the most versatile data structure in Python and are used extensively for storing collections of data points.
Tuples are ordered, immutable sequences. Once created, their elements cannot be changed. Use tuples for data that should not be modified, like database records or coordinates.
You cannot add, remove, or change elements of a tuple after creation. Attempting coordinates[0] = 41.0 raises a TypeError. This makes tuples safer for data that should not change and slightly faster than lists.
Dictionaries store key-value pairs with O(1) average lookup time. They are ideal for mapping identifiers to records, configuration settings, and JSON-like data.
Sets are unordered collections of unique elements. They support fast membership testing and mathematical set operations.
Strings are immutable sequences of characters. Data analysts frequently use string methods for cleaning and formatting text data.
| Scenario | Best Choice | Reason |
|---|---|---|
| Ordered collection that changes over time | list |
Mutable, maintains insertion order |
| Fixed record (e.g., database row) | tuple |
Immutable, hashable, slightly faster |
| Lookup by key / label mapping | dict |
O(1) key lookup, key-value semantics |
| Remove duplicates / membership tests | set |
Unique elements, O(1) membership test |
| Text data / labels | str |
Rich methods for manipulation and formatting |
If a question asks about storing unique items or fast lookups, think set or dict. If order matters and elements may repeat, think list. If data should not change, think tuple.
PEP 8 is the official style guide for Python code. Following PEP 8 ensures consistency and readability across projects and teams.
| Entity | Convention | Example |
|---|---|---|
| Variables and functions | snake_case | total_revenue, calculate_mean() |
| Constants | UPPER_SNAKE_CASE | MAX_RETRIES, DEFAULT_TIMEOUT |
| Classes | PascalCase | DataProcessor, SalesReport |
| Modules / packages | lowercase (short) | utils.py, analysis.py |
| Private attributes | Leading underscore | _internal_cache |
PEP 257 defines conventions for writing docstrings — the string literals that appear as the first statement in a module, function, class, or method.
"""...""") for all docstrings.""" goes on its own line for multi-line docstrings.Watch for questions that test whether you can spot style violations: using camelCase for variables, mixing tabs and spaces, putting imports at the bottom, missing blank lines between functions, or lines exceeding 79 characters.
data = [10, 20, 30, 40, 50]result = data[1:4]print(result)[10, 20, 30, 40][20, 30, 40][20, 30, 40, 50][10, 20, 30]data[1:4] returns elements at indices 1, 2, and 3. The start index is inclusive and the stop index is exclusive, so we get [20, 30, 40].def process(data, verbose) = True:def process(data, verbose=True):def process(verbose=True, data):def process(data; verbose=True):= inside the parentheses. Parameters with defaults must come after parameters without defaults, which makes C) a SyntaxError.**kwargs collect in a function definition?**kwargs collects any keyword arguments that are not explicitly defined in the function signature into a dictionary. *args collects extra positional arguments into a tuple.pandasnumpyscipy.statsstatisticsscipy.stats provides ttest_ind() for independent two-sample t-tests and ttest_rel() for paired t-tests. The statistics module only covers basic descriptive stats. numpy and pandas do not include hypothesis testing functions.result = [x**2 for x in range(6) if x % 2 == 0]print(result)[0, 1, 4, 9, 16, 25][0, 4, 16][4, 16, 36][1, 9, 25]range(6) produces 0 through 5. The condition x % 2 == 0 filters to even numbers: 0, 2, 4. Squaring these gives [0, 4, 16].CalculateMean (PascalCase)calculateMean (camelCase)calculate_mean (snake_case)CALCULATE_MEAN (UPPER_SNAKE_CASE)snake_case for functions and variables, PascalCase for classes, and UPPER_SNAKE_CASE for constants.a = {1, 2, 3, 4}b = {3, 4, 5, 6}print(a - b){1, 2}{5, 6}{3, 4}{1, 2, 5, 6}- operator computes the set difference: elements in a that are not in b. Since 3 and 4 are in both sets, only {1, 2} remain. Note: a ^ b (symmetric difference) would give {1, 2, 5, 6}.# Return the square of x.'Return the square of x.'"""Return the square of x.""""""return the square of x""""""). A one-line docstring should be a complete sentence starting with a capital letter and ending with a period, all on a single line. Option A is a comment (not a docstring), B uses single quotes, and D lacks capitalization and a period.