#!/usr/bin/env python
# coding: utf-8
# # Descriptors: The magic behind attribute access in Python
#
#
# # What is encapsulation about? (IMNSHO)
#
# 1. *Encapsulation* **is not** about hiding data.
# 2. *Access control* **is** about hiding data.
# 3. Encapsulation and access control are two different independent things.
# * You **don't need** access control to have encapsulation.
# * You can encapsulate behavior **without** having to restrict access.
# 4. Encapsulation separates the concept of **what something does** from **how it is implemented**.
# 5. Encapsulation decouples a programming construct's **public interface/API** from its **implemenation**.
# 6. When calling code wants to retrieve a value, it should not depend on from where the value comes. Internally, the class can store the value in a field or retrieve it from some external resource (such as a file or a database). Perhaps the value is not stored at all, but calculated on-the-fly. This should not matter to the calling code.
# # Before we begin: Underscores in attributes in Python
#
# 1. Single underscore before a name (e.g. `_foo`)
# * Used as a convention, these attributes should be treated as a non-public part of the API (whether it is a function, a method or a data member) and considered an implementation detail and subject to change without notice (source: [Python documentation](https://docs.python.org/3/tutorial/classes.html#tut-private)).
# * It's more than a convention and actually does mean something to the interpreter; if you `from import *`, none of the names that start with an `_` will be imported unless the module's/package's `__all__` list explicitly contains them.
# 2. Double underscore before a name (e.g. `__foo`)
# * This is not a convention, any identifier of the form `__foo` (at least two leading underscores, at most one trailing underscore) is textually replaced with `_classname__foo`, where classname is the current class name with leading underscore(s) stripped. This is called *name mangling*. (source: [Python documentation](https://docs.python.org/3/tutorial/classes.html)).
# * Name mangling is helpful for letting subclasses override methods without breaking intraclass method calls.
# 3. Double underscore before and after a name (e.g. `__foo__`)
# * Methods that use this naming format are called *special* or *magic* methods and are automatically invoked when certain syntax is used. We typically override these methods to implement the desired behaviour in classes (e.g. constructors, operator overloading, indexing etc).
# * *Special* attributes that provide access to the implementation and are not intended for general use. Examples from class special attributes: `__name__` is the class name, `__module__` is the module name in which the class was defined, `__dict__` is the dictionary containing the class’s namespace, `__bases__` is a tuple containing the base classes.
# In[1]:
# name mangling mechanism
class Mapping:
def __init__(self, iterable):
self.items_list = []
# self.__update inside the class is equivalent to self._Mapping__update
# the same function will be called even if __update is overridden in inheriting classes
self.__update(iterable)
def __update(self, iterable):
for item in iterable:
self.items_list.append(item)
class MappingSub(Mapping):
def __update(self, keys, values):
# provides new signature for __update() but does not break __init__()
for item in zip(keys, values):
self.items_list.append(item)
# In[2]:
m = Mapping([1, 2])
ms = MappingSub([1,2])
print('__update' in dir(m), '__update' in dir(ms))
m._Mapping__update([3, 4])
print(m.items_list)
ms._Mapping__update([3, 4]) # call update function of Mapping class
ms._MappingSub__update([5, 6], ['five', 'six']) # call update of MappingSub class
print(ms.items_list)
# # What is an attribute?
#
# * Quite simply, an attribute is a way to get from one object to another.
# * Apply the power of the almighty dot `objectname.attributename` and voila! you now have the handle to another object.
# * You also have the power to create attributes, by assignment: `objectname.attributename = anotherobject`.
# * Which object does an attribute access return, though? And where does the object set as an attribute end up?
#
# Depending on the programming language:
# * You don't have any control to attribute access (Java plebs).
# * You control attribute access through properties (C# cool kids).
# * You can completely customize attribute access in addition to properties (Python master race).
#
# # Instance attribute access
# * When we access an instance we actually call its `__getattribute__` method, i.e. `a.x -> a.__getattribute__(x)`.
# * `__getattribute__` has an order of priority that describes where to look for attributes and how to react to them.
# * Classes and instances have a `__dict__` where user provided attributes are stored and looked up.
# * Python provides extra attributes, most of which are not stored in `__dict__` (e.g. special methods).
# * `__dict__` is looked up first and this is how we override special methods.
# * We can also override this behavior to save memory for classes with a few fields using `__slots__`. However we cannot add new attributes to `__slots__`.
# In[3]:
class B:
x = 1
class A(B):
y = 2
def __getattr__(self, value):
return str(value)
a = A()
print("a.x: {}, a.y: {}".format(a.x, a.y)) # x from B, y from A
a.y = 3
print("a.x: {}, a.y: {}, A.y: {}".format(a.x, a.y, A.y)) # x from B, y from a (overrides y in A)
print("a.z: {}".format(a.z)) # call __getattr__
print(A.__dict__)
print(a.__dict__)
# # Descriptor protocol
# Raymond Hettinger ([Python Documentation](https://docs.python.org/3/howto/descriptor.html)):
# > In general, a descriptor is an object attribute with "binding behavior", one whose attribute access has been overridden by methods in the descriptor protocol.
#
# * Those methods are `__get__`, `__set__` and `__delete__`. If any of those methods are defined for an object, it is said to be a **descriptor**.
# * Only one of the methods *needs* to be implemented in order to be considered a descriptor, but any number of them *can* be implemented.
# * There are two types of descriptors based on which sets of these methods are implemented: **data** and **non-data** descriptors.
# 1. A **data** descriptor implements at least `__set__` or `__delete__`, but can include both. They also often include `__get__`, since it's rare to want to set something without also being able to get it too.
# 2. A **non-data** descriptor only implements `__get__`. If it adds `__set__` or `__delete__`to its method list, it becomes a data descriptor.
# # `__get__(self, instance, owner)`
#
# 1. `self` is the descriptor instance.
# 2. `owner` is the class the descriptor is accessed *from*.
# * When you call `A.x`, where `x` is a descriptor object with `__get__`, it's called with `A` as owner and `instance` as `None`.
# * This lets the descriptor know that `__get__` is being called from a *class*, not an *instance*.
# * `A.x` is translated to `A.__dict__['x'].__get__(None, A)`.
# 3. `instance` is the instance that the descriptor is accessed *from*.
# * If the discriptor is accessed from an *instance* it receives it as `instance` and the class of the instance as `owner`.
# * `a.x` is translated to `type(a).__dict__['x'].__get__(a, type(a))`
# * Note that the call starts with `type(a)`, not just `a`, because descriptors are stored on *classes* not *instances*.
#
# Two important points:
# 1. In order to be able to apply per-instance as well as per-class functionality, descriptors are given `instance` and `owner` (the class of the instance).
# 2. It is not the *instance* that the descriptor is being called from, but instead, the `instance` *parameter* is the instance the descriptor is being called from. It is actually being called from the instance class.
# # `__set__(self, instance, value)`
#
# 1. `__set__` does not have an owner parameter that accepts a class and does not need it, since data descriptors are generally designed for storing per-instance data.
# 2. `A.x = value` does not get translated to anything; `value` replaces the descriptor object stored in `x` (however, see note below).
# 3. `a.x = value` is translated to `type(a).__dict__['x'].__set__(a, value)`
# # `__delete__(self, instance)`
#
# 1. invoked when `del a.x` is called.
# 2. `del a.x` is translated to `type(a).__dict__['x'].__delete__(a)`
#
# **Note**: If we want a descriptor's `__set__` or `__delete__` methods to work from the *class* level, the descriptor must be created on the class's *metaclass*. When doing so, everything that refers to `owner` is referring to the *metaclass*, while a reference to `instance` refers to the *class*. After all, classes are just instances of metaclasses.
# # Instance & class attribute access
# * Descriptors are invoked by the `__getattribute__` method.
# * Overriding `__getattribute__` prevents automatic descriptor calls.
# * Class attribute access still uses `__getattribute__`, but it's the one defined on its *metaclass*.
# * Priorities when an *instance* attribute is looked up:
# 1. Data descriptors in its class (up the MRO).
# 2. Instance attributes.
# 3. Non-data descriptors in its class / class attributes (up the MRO).
# 4. The `__getattr__` method.
# * Priorities when an *class* attribute is looked up:
# 1. Data descriptors in its metaclass (up the MRO).
# 2. Class attributes (up the MRO).
# 3. Non-data descriptors in its metaclass / metaclass attributes (up the MRO).
# 4. The `__getattr__` method.
# # Instance attribute access priority (`a.x`)
# 1. Look in the class `__dict__`, working up the MRO.
# * If found, check if it's a data descriptor.
# * If it has a `__get__` method, call it and return the result.
# 2. Look in the instance `__dict__`.
# * If found, return the value in `__dict__`.
# 3. Check class `__dict__` again, working up the MRO.
# * If found, check if it's a descriptor.
# * If it has a `__get__` method, call it and return the result.
# * If it doesn't have a `__get__` method, return the descriptor object itself.
# * If found and not a descriptor, return the value in `__dict__`.
# 4. Call `__getattr__` if it exists and return the result.
# 5. If everything up to this point has failed, raise `AttributeError`.
# # Class attribute access priority (`A.x`)
# 1. Look in the metaclass `__dict__`, working up the MRO.
# * If found, check if it's a data descriptor.
# * If it has a `__get__` method, call it and return the result.
# 2. Look in the class `__dict__`, working up the MRO.
# * If found, check if it's a descriptor.
# * If it has a `__get__` method, call it and return the result.
# * If it doesn't have a `__get__` method, return the descriptor object itself.
# * If found and not a descriptor, return the value in `__dict__`.
# 3. Check metaclass `__dict__` again, working up the MRO.
# * If found, check if it's a descriptor.
# * If it has a `__get__` method, call it and return the result.
# * If it doesn't have a `__get__` method, return the descriptor object itself.
# * If found and not a descriptor, return the value in `__dict__`.
# 4. Call `__getattr__` if it exists and return the result.
# 5. If everything up to this point has failed, raise `AttributeError`.
# # Instance attribute access priority:
# ## `__set__` & `__delete__` (`a.x = value` & `del a.x`)
#
# 1. Look in the class `__dict__`, working up the MRO.
# * If found, check if it's a data descriptor.
# * If it has a `__set__` or `__delete__` method, call `__set__` or `__delete__`.
# * If it doesn't have the corresponding method, raise `AttributeError`.
# 2. Look in the instance `__dict__`.
# * `a.x = value`
# * Set attribute to value.
# * `del a.x`
# * If found, delete attribute.
# * If not found, raise `AttributeError`.
# # Class attribute access priority:
# ## `__set__` & `__delete__` (`A.x = value` & `del A.x`)
#
# 1. Look in the metaclass `__dict__`, working up the MRO.
# * If found, check if it's a data descriptor.
# * If it has a `__set__` or `__delete__` method, call `__set__` or `__delete__`.
# * If it doesn't have the corresponding method, raise `AttributeError`.
# 2. Look in the class `__dict__`.
# * `A.x = value`
# * Set attribute to value.
# * `del A.x`
# * If found, delete attribute.
# * If not found, raise `AttributeError`.
#
# In[4]:
# implementation of classmethod and staticmethod, equivalent to the standard library
class MyClassmethod:
def __init__(self, func):
self.func = func
# ignore the instance, provide the class as first argument (usually named cls) so the
# returned function can be called with the arguments the user wants to explicitly provide
def __get__(self, instance, owner):
def cls_wrapper(*args, **kwargs):
return self.func(owner, *args, **kwargs) # what if I put cls=owner?
return cls_wrapper
class MyStaticmethod:
def __init__(self, func):
self.func = func
# essentially just accepts a function and then returns it when __get__ is called
def __get__(self, instance, owner):
return self.func
# In[5]:
class A:
def foo(self):
print(self)
@MyClassmethod # same as: bar = MyClassmethod(bar)
def bar(cls):
print(cls)
@MyStaticmethod # same as: baz = MyStaticmethod(baz)
def baz():
print('static method')
# both methods are accessed through their respective descriptors
# In[6]:
a = A()
# instance method, business as always
a.foo()
print()
# access the method object, descriptor is called and returns the cls_wrapper method object
print(A.bar)
# call the method, instance in __get__ is None (don't care), owner is A
A.bar()
# run it on the instance, instance in __get__ is a (don't care), owner is A
a.bar()
print()
# access the method object, descriptor returns a function with no arguments
print(A.baz)
# call the method without any instance (self) or class (cls) object
# of course same result if we call it in the instance
A.baz()
a.baz()
# In[7]:
# implementation of property, equivalent to the standard library
class MyProperty:
def __init__(self, fget=None, fset=None, fdel=None):
self.fget = fget
self.fset = fset
self.fdel = fdel
def __get__(self, instance, owner):
if instance is None: # this was called from the class, not the instance
return self
elif self.fget is None:
raise AttributeError("unreadable attribute")
else:
return self.fget(instance)
def __set__(self, instance, value):
if self.fset is None:
raise AttributeError("can't set attribute")
else:
self.fset(instance, value)
def __delete__(self, instance):
if self.fdel is None:
raise AttributeError("can't delete attribute")
else:
self.fdel(instance)
def getter(self, fget):
return type(self)(fget, self.fset, self.fdel)
def setter(self, fset):
return type(self)(self.fget, fset, self.fdel)
def deleter(self, fdel):
return type(self)(self.fget, self.fset, fdel)
# In[8]:
class A:
def __init__(self, x):
self._x = x
@MyProperty
def x(self):
print("returning _x: {}".format(self._x))
return self._x
@x.setter
def x(self, value):
print("setting _x to {}".format(value))
self._x = value
# In[9]:
a = A(1)
print(A.x) # call the descriptor from the class, returns the descriptor object
print(a.x)
print()
a.x = 2
print(a.x)
print()
A.x = 'bye bye descriptor'
print(a.x) # descriptor is gone from class, instance gets attribute from class
#
# # Is this useful?
# * Why do I need to know this? Can't I just use `property`?
# * No problem. `property` is awesome! Use `property` for greater good!
# * However there are times where logic needs to be repeated in properties.
# * This can lead to code duplication.
# * We can try to fix this by writing helper methods.
# * But then in each property, code for these method calls will be duplicated.
# * Descriptors allow us to **capture the logic** for attribute access and **re-use** it for different attributes.
# In[10]:
class BasketballGame:
def __init__(self, points, rebounds, steals):
self.points = points
self.rebounds = rebounds
self.steals = steals
@property
def points(self):
return self._points
@points.setter
def points(self, value):
if value < 0:
raise ValueError('Positive values only!')
self._points = value
@property
def rebounds(self):
return self._rebounds
@rebounds.setter
def rebounds(self, value):
if value < 0:
raise ValueError('Positive values only!')
self._rebounds = value
@property
def steals(self):
return self._steals
@steals.setter
def steals(self, value):
if value < 0:
raise ValueError('Positive values only!')
self._steals = value
# In[11]:
class NonNegativeField:
def __init__(self, name=''):
# need to store the field name on the descriptor object itself
# as descriptors are defined on the class level
self.name = name
def __get__(self, instance, owner):
return instance.__dict__[self.name]
def __set__(self, instance, value):
if value < 0:
raise ValueError('Positive values only!')
instance.__dict__[self.name] = value
# In[12]:
class BasketballGame:
# is there a better way, so that we don't have to repeat the field name?
points = NonNegativeField('points')
rebounds = NonNegativeField('rebounds')
steals = NonNegativeField('steals')
def __init__(self, points, rebounds, steals):
self.points = points
self.rebounds = rebounds
self.steals = steals
# In[13]:
a = BasketballGame(points=100, rebounds=30, steals=10)
print("points: {}, rebounds: {}".format(a.points, a.rebounds))
try:
a.points = -5
except ValueError as e:
print("Error! {}".format(e))
# In[14]:
def named_descriptors(cls):
for name, attr in cls.__dict__.items():
if isinstance(attr, NonNegativeField):
attr.name = name
return cls
@named_descriptors
class BasketballGame:
points = NonNegativeField()
rebounds = NonNegativeField()
steals = NonNegativeField()
def __init__(self, points, rebounds, steals):
self.points = points
self.rebounds = rebounds
self.steals = steals
# In[15]:
a = BasketballGame(points=100, rebounds=30, steals=10)
print("points: {}, rebounds: {}".format(a.points, a.rebounds))
try:
a.points = -5
except ValueError as e:
print("Error! {}".format(e.args))
# In[16]:
class NonNegativeField:
def __get__(self, instance, owner):
return instance.__dict__[self.name]
def __set__(self, instance, value):
if value < 0:
raise ValueError('Positive values only!')
instance.__dict__[self.name] = value
# new in Python 3.6
def __set_name__(self, owner, name):
self.name = name
class BasketballGame:
points = NonNegativeField()
rebounds = NonNegativeField()
steals = NonNegativeField()
def __init__(self, points, rebounds, steals):
self.points = points
self.rebounds = rebounds
self.steals = steals
# In[17]:
a = BasketballGame(points=100, rebounds=30, steals=10)
print("points: {}, rebounds: {}".format(a.points, a.rebounds))
try:
a.points = -5
except ValueError as e:
print("Error! {}".format(e))
#
# # References
# 1. [StackOverflow - What is encapsulation? How does it actually hide data?](http://stackoverflow.com/questions/5673829/what-is-encapsulation-how-does-it-actually-hide-data)
# * [Shahriar Tajbakhsh - Underscores in Python](https://shahriar.svbtle.com/underscores-in-python)
# * [Shalabh Chaturvedi - Python Attributes and Methods](http://www.cafepy.com/article/python_attributes_and_methods/python_attributes_and_methods.html)
# * [Raymond Hettinger - Descriptor HowTo Guide](https://docs.python.org/3/howto/descriptor.html)
# * Simeon Franklin - Python Descriptors [video](https://www.youtube.com/watch?v=ZdvpNaWwx24) & [presentation](http://simeonfranklin.com/talk/descriptors.html)
# * [Laura Rupprecht - Describing Descriptors - PyCon 2015](https://www.youtube.com/watch?v=h2-WPwGnHqE)
# * Jacob Zimmerman - Python Descriptors, Apress Publishing (2006)
# * [Dan Sackett - An introduction to Python descriptors](http://programeveryday.com/post/an-introduction-to-python-descriptors/)