#!/usr/bin/env python # coding: utf-8 # # Exposing Python 3.6's Private Dict Version # *This notebook originally appeared as a [post](http://jakevdp.github.io/blog/2017/05/26/exposing-private-dict-version/) on the blog [Pythonic Perambulations](http://jakevdp.github.io). The content is MIT licensed.* # # # I just got home from my sixth PyCon, and it was wonderful as usual. If you weren't able to attend—or even if you were—you'll find a wealth of entertaining and informative talks on the [PyCon 2017 YouTube channel](https://www.youtube.com/channel/UCrJhliKNQ8g0qoE_zvL8eVg/videos?sort=p&view=0&flow=grid). # # Two of my favorites this year were a complementary pair of talks on Python dictionaries by two PyCon regulars: Raymond Hettinger's [Modern Python Dictionaries A confluence of a dozen great ideas](https://www.youtube.com/watch?v=npw4s1QTmPg) and Brandon Rhodes' [The Dictionary Even Mightier](https://www.youtube.com/watch?v=66P5FMkWoVU) (a followup of his PyCon 2010 talk, [The Mighty Dictionary](https://www.youtube.com/watch?v=C4Kc8xzcA68)) # # Raymond's is a fascinating dive into the guts of the CPython dict implementation, while Brandon's focuses more on recent improvements in the dict's user-facing API. One piece both mention is the addition in Python 3.6 of a private dictionary version to aid CPython optimization efforts. In Brandon's words: # # > "[PEP509](https://www.python.org/dev/peps/pep-0509/) added a private version number... every dictionary has a version number, and elsewhere in memory a master version counter. And when you go and change a dictionary the master counter is incremented from a million to a million and one, and that value a million and one is written into the version number of that dictionary. So what this means is that you can come back later and know if it's been modified, without reading maybe its hundreds of keys and values: you just look and see if the version has increased since the last time you were there." # # He later went on to say, # # > "[The version number] is internal; I haven't seen an interface for users to get to it..." # # which, of course, I saw as an implicit challenge. So let's expose it! # # # ## Exposing CPython's Internals # # In a [post a few years ago](https://jakevdp.github.io/blog/2014/05/09/why-python-is-slow/), I showed how to use the ``ctypes`` module to muck around in the internals of CPython's implementation at runtime, and I'll use a similar strategy here. # # Briefly, the approach is to define a ``ctypes.Structure`` object that mirrors the sructure CPython uses to implement the type in question. # We can start with the base structure that [underlies every Python object](https://github.com/python/cpython/blob/3.6/Include/object.h#L106-L110): # # ```C # typedef struct _object { # _PyObject_HEAD_EXTRA # Py_ssize_t ob_refcnt; # struct _typeobject *ob_type; # } PyObject; # ``` # # A ``ctypes`` wrapper might look like this: # In[1]: import sys assert (3, 6) <= sys.version_info < (3, 7) # Valid only in Python 3.6 import ctypes py_ssize_t = ctypes.c_ssize_t # Almost always the case class PyObjectStruct(ctypes.Structure): _fields_ = [('ob_refcnt', py_ssize_t), ('ob_type', ctypes.c_void_p)] # Next, let's look at the Python 3.6 [``PyDictObject`` definition](https://github.com/python/cpython/blob/3.6/Include/dictobject.h#L23-L41), which boils down to this: # # ```C # typedef struct { # PyObject_HEAD # Py_ssize_t ma_used; # uint64_t ma_version_tag; # PyDictKeysObject *ma_keys; # PyObject **ma_values; # } PyDictObject; # ``` # # We can mirror the structure behind the ``dict`` this way, plus add some methods that will be useful later: # In[2]: class DictStruct(PyObjectStruct): _fields_ = [("ma_used", py_ssize_t), ("ma_version_tag", ctypes.c_uint64), ("ma_keys", ctypes.c_void_p), ("ma_values", ctypes.c_void_p), ] def __repr__(self): return (f"DictStruct(size={self.ma_used}, " f"refcount={self.ob_refcnt}, " f"version={self.ma_version_tag})") @classmethod def wrap(cls, obj): assert isinstance(obj, dict) return cls.from_address(id(obj)) # As a sanity check, let's make sure our structures match the size in memory of the types they are meant to wrap: # In[3]: assert object.__basicsize__ == ctypes.sizeof(PyObjectStruct) assert dict.__basicsize__ == ctypes.sizeof(DictStruct) # With this setup, we can now wrap any dict object to get a look at its internal properties. # Here's what this gives for a simple dict: # In[4]: D = dict(a=1, b=2, c=3) DictStruct.wrap(D) # To convince ourselves further that we're properly wrapping the object, let's make two more explicit references to this dict, add a new key, and make sure the size and reference count reflect this: # In[5]: D2 = D D3 = D2 D3['d'] = 5 DictStruct.wrap(D) # It seems this is working correctly! # ## Exploring the Version Number # # So what does the version number do? As Brandon explained in his talk, every dict in CPython 3.6 now has a version number that is # # 1. globally unique # 2. updated locally whenever a dict is modified # 3. incremented globally whenever *any* dict is modified # # This global value is stored in the [``pydict_global_version``](https://github.com/python/cpython/blob/3.6/Objects/dictobject.c#L243) variable in the CPython source. # So if we create a bunch of new dicts, we should expect each to have a higher version number than the last: # In[6]: for i in range(10): dct = {} print(DictStruct.wrap(dct)) # You might expect these versions to increment by one each time, but the version numbers are affected by the fact that Python uses many dictionaries in the background: among other things, local variables, global variables, and object attributes are all stored as dicts, and creating or modifying any of these results in the global version number being incremented. # # Similarly, any time we modify our dict it gets a higher version number: # In[7]: D = {} Dwrap = DictStruct.wrap(D) for i in range(10): D[i] = i print(Dwrap) # ## Monkey-patching Dict # # Let's go a step further and monkey-patch the dict object itself with a method that accesses the version directly. # Basically, we want to add a ``get_version`` method to the ``dict`` class that accesses this value. # # Our first attempt might look something like this: # In[8]: dict.get_version = lambda obj: DictStruct.wrap(obj).ma_version_tag # We get an error, because Python protects the attributes of built-in types from this kind of mucking. # But never fear! We can get around this with (you guessed it) ``ctypes``! # # The attributes and methods of any Python object are stored in its ``__dict__`` attribute, which in Python 3.6 is not a dictionary but a ``mappingproxy`` object, which you can think of as a read-only wrapper of the underlying dictionary: # In[9]: class Foo: bar = 4 Foo.__dict__ # In fact, looking at the Python 3.6 [``mappingproxyobject`` implementation](https://github.com/python/cpython/blob/fff9a31a91283c39c363af219e595eab7d4da6f7/Objects/descrobject.c#L794-L797), we see that it's simply an object with a pointer to an underlying dict. # # ```C # typedef struct { # PyObject_HEAD # PyObject *mapping; # } mappingproxyobject; # ``` # # Let's write a ``ctypes`` structure that exposes this: # In[10]: import types class MappingProxyStruct(PyObjectStruct): _fields_ = [("mapping", ctypes.POINTER(DictStruct))] @classmethod def wrap(cls, D): assert isinstance(D, types.MappingProxyType) return cls.from_address(id(D)) # Sanity check assert types.MappingProxyType.__basicsize__ == ctypes.sizeof(MappingProxyStruct) # Now we can use this to get a C-level handle for the underlying dict of any mapping proxy: # In[11]: proxy = MappingProxyStruct.wrap(dict.__dict__) proxy.mapping # And we can pass this handle to functions in the C API in order to modify the dictionary wrapped by a read-only mapping proxy: # In[12]: def mappingproxy_setitem(obj, key, val): """Set an item in a read-only mapping proxy""" proxy = MappingProxyStruct.wrap(obj) ctypes.pythonapi.PyDict_SetItem(proxy.mapping, ctypes.py_object(key), ctypes.py_object(val)) # In[13]: mappingproxy_setitem(dict.__dict__, 'get_version', lambda self: DictStruct.wrap(self).ma_version_tag) # Once this is executed, we can call ``get_version()`` as a method on *any* Python dictionary to get the version number: # In[15]: {}.get_version() # This kind of monkey patching could be used for any built-in type; for example, we could add a ``scramble`` method to strings that randomly chooses upper or lower case for its contents: # In[16]: import random mappingproxy_setitem(str.__dict__, 'scramble', lambda self: ''.join(random.choice([c.lower(), c.upper()]) for c in self)) # In[17]: 'hello world'.scramble() # The possibilities are endless, but be warned that any time you muck around with the CPython internals at runtime, there are likely to be strange side-effects. # This is definitely not code you should use for any purpose beyond simply having fun exploring the language. # # If you're curious about other ways you can modify the CPython runtime, you might be interested in my post from two years ago, [Why Python is Slow: Looking Under the Hood](https://jakevdp.github.io/blog/2014/05/09/why-python-is-slow/). # ## So... Why? # # Now we have easy access to the dict version number, and you might wonder what can we do with this. # # The answer is, currently, not so much. In the CPython source, the only time the version tag is referenced aside from its definition is [in a unit test](https://github.com/python/cpython/search?utf8=%E2%9C%93&q=ma_version_tag). # Various Python optimization projects will in the future be able to use this feature to better optimize Python code, but to my knowledge none do yet (for example, here's a relevant [Numba issue](https://github.com/numba/numba/issues/2242) and [FATpython discussion](http://faster-cpython.readthedocs.io/fat_python.html)). # # So for the time being, access to the dictionary version number is, as they say, purely academic. # But I hope that some time in the near future, a web search will land someone on this page who will find this code useful in more than a purely academic sense. # # Happy hacking! # *This post was written entirely in the IPython notebook. You can # [download](http://jakevdp.github.io/downloads/notebooks/DictVersion.ipynb) # this notebook, or see a static view # [here](http://nbviewer.ipython.org/url/jakevdp.github.io/downloads/notebooks/DictVersion.ipynb).*