Block 2: Programming and Database Skills
Topic 2.3 · 3 Objectives
Object-Oriented Programming (OOP) lets you bundle related data and behavior into classes. In data analysis workflows, classes model real-world entities such as records, responses, configurations, and exporters. Understanding how to define and use classes is essential for the exam.
class KeywordA class is a blueprint for creating objects. You define a class using the class keyword, followed by a name (conventionally in PascalCase) and a colon:
__init__ is the constructor — it runs automatically when you create an object. The first parameter is always self, which refers to the instance being created. Every attribute assigned to self becomes an instance variable unique to that object.
Instance methods are functions defined inside a class that operate on a specific object. They always take self as their first parameter, giving them access to the object's attributes and other methods:
_protected and __privatePython uses naming conventions rather than strict access modifiers to signal how attributes should be used:
| Convention | Example | Meaning |
|---|---|---|
| No prefix | self.name | Public — part of the class's external API |
| Single underscore | self._cache | Protected — "internal, don't touch" (not enforced) |
| Double underscore | self.__secret | Private — triggers name mangling (_ClassName__secret) |
__ double underscores does NOT make an attribute truly private. Python rewrites self.__attr to self._ClassName__attr. This is designed to avoid accidental name collisions in inheritance, not for security.
@propertyThe @property decorator lets you expose computed or validated attributes that look like simple attribute access from the outside:
obj.celsius before the property was added continues to work unchanged.
Objects maintain their own internal state, which methods can modify over time. This pattern is common in data pipelines where records pass through processing stages:
Real data projects often combine multiple classes. Three key OOP patterns appear in this course: composition, inheritance, and polymorphism.
Composition means one class contains instances of another class as attributes. This is a "has-a" relationship. For example, a Survey has multiple Question objects:
Inheritance creates an "is-a" relationship. A subclass inherits all methods and attributes from its parent and can override or extend them:
When a subclass defines a method with the same name as one in the parent class, the subclass version overrides the parent. Use super() when you still need to call the parent's version:
Polymorphism means calling the same method name on different objects and getting behavior specific to each object's class. This is powerful in data pipelines where you process items uniformly without knowing their exact type:
run_export_pipeline does not need to check the type of each exporter. It simply calls .export(), and each subclass provides its own implementation. This is the core idea behind polymorphism and a common exam topic.
Combining composition, inheritance, and polymorphism in a single data pipeline:
Understanding the difference between object identity and equality is crucial for avoiding subtle bugs and is a frequent exam topic.
When you assign an object to a new variable, you create a new reference to the same object in memory, not a copy:
A particularly tricky case occurs when a mutable list is passed into an object's constructor:
== vs isPython provides two distinct comparison operators:
| Operator | Checks | Question It Answers |
|---|---|---|
== | Equality (value) | "Do these have the same content?" |
is | Identity (memory address) | "Are these the exact same object?" |
is only for None checks (if x is None) and when you specifically need identity. Use == for all value comparisons. Never rely on is for integer or string comparison in production code.
__eq__()By default, == on custom objects checks identity (same as is). You can override this by implementing the __eq__ dunder method:
__repr__ and __str__ Methods| Method | Purpose | Called By |
|---|---|---|
__repr__ | Unambiguous, developer-facing representation | repr(), interactive shell, lists |
__str__ | Readable, user-facing representation | str(), print() |
__repr__. Python falls back to __repr__ when __str__ is not available, but not the other way around. The __repr__ output should ideally be valid Python that could recreate the object.
A complete example putting identity and comparison together:
__init__ method in a Python class?__init__ is the constructor method. It is automatically called when a new instance is created and is used to set up the object's initial state through self.attribute = value assignments.self.__secret from outside the class MyClass?NoneAttributeError because of name manglingPermissionError__secret to _MyClass__secret internally, so accessing obj.__secret directly raises AttributeError. The attribute still exists as obj._MyClass__secret.@staticmethod@classmethod@property@attribute@property decorator defines a getter that allows a method to be accessed using attribute syntax (e.g., obj.name instead of obj.name()). You can also define a setter with @name.setter.[1, 2][1, 2, 3][3]b = a does not create a copy; it creates an alias. Both a and b reference the same object in memory. When b.items.append(3) modifies the list, the change is visible through a as well.Survey class contains a list of Question objects?Survey has Question objects. This is different from inheritance, which is an "is-a" relationship.["...", "...", "..."]["Woof", "Meow", "Woof"]Animal.speak() is overridden["Woof", "Woof", "Meow"]speak(). The Dog objects return "Woof" and the Cat object returns "Meow". The base class method is overridden by each subclass.== and is in Python?== checks identity while is checks equality== checks value equality while is checks if two references point to the same objectis only works with numbers and strings== operator compares values (calling __eq__). The is operator compares identity — whether two variables reference the exact same object in memory. Use is primarily for None checks.super().__init__(data) do inside a subclass constructor?__init__ method to initialize inherited attributessuper() returns a proxy object for the parent class. Calling super().__init__(data) invokes the parent's constructor so that any attributes or setup logic defined in the parent class are properly initialized in the subclass instance.__repr__ but NOT __str__, what does print(obj) use?AttributeError__repr____str__ is not defined, Python falls back to __repr__ for print() and str() calls. The reverse is not true — repr() will never fall back to __str__. This is why __repr__ is the more important method to implement.p1 == p2 evaluate to?True — because they have the same x and y valuesFalse — because __eq__ is not defined, so it defaults to identity comparison__eq__ is not definedTrue — because Python compares all attributes automatically__eq__ method, Python's default == falls back to identity comparison (same as is). Since p1 and p2 are two different objects in memory, p1 == p2 returns False. You must implement __eq__ for value-based equality.