Lessons learnt while automating docstring checking - Saurabh Hirani

Original blog post link just for comments: here

The aim of this post is describe the evolution of a program which automates docstring testing of a module and its components (functions, classes, nested classes and their respective methods). It starts out simple and we keep on making incremental changes to address limitations.

Although I am using an ipython notebook, I don't want the end user to copy paste and run the cell source code to try the code examples outside of the notebook. So I am using the %load ipython magic to load the source files from my ipython directory instead of typing code directly in the cells. These files are present in this github repo. So to use code separately, clone this repo and you are good to run t1.py, t2.py and so on. Read from here, play from there.

For the sake of simplicity, let us assume that the module which we want to test for docstrings has the structure defined in mod1.py:

In [1]:
%load mod1.py
In [2]:
class SpecFile(object):
    class Section(object):
        def __init__(self): pass
        def validate(self): pass
        
    class DynamicSection(Section):
        def __init__(self): pass
        
    class StaticSection(Section):
        def __init__(self): pass
        
    class Section1(StaticSection):
        def __init__(self): pass
        def validate(self): pass
        
    class Section2(DynamicSection):
        def __init__(self): pass
        def validate(self): pass
        
def main():
    pass

This represents the structure of a spec file which contains various sections. Some sections have static i.e. pre-defined keys and some have dynamic keys i.e. key-value pairs that are not known in advance. Section1 and Section2 are just instances of 2 such sections. The main function is called when user executes the program from the cmdline. We are not concerned with the internal logic as I am trying to automate docstring testing.

nbcommon.py contains all the common operations.

In [3]:
%load nbcommon.py
In [ ]:
# custom exception class for no docstring
class NoDocstrError(Exception): pass

def has_docstr(entity):
    """ Check whether this entity has a docstring """
    docstr = entity.__doc__
    return docstr != None and docstr.strip() != ''

import inspect

def get_entity_type(entity):
    """ Check whether entity is supported for docstring heck """
    for entity_type in ['module', 'function', 'class', 'method'] :
        # inspect module has inspect.ismodule, inspect.isfunction - leverage that
        inspect_func = getattr(inspect, 'is' + entity_type)
        if inspect_func(entity): return entity_type
    raise ValueError('Invalid entity: %s passed' % entity)

inspect module is used to get the members of an entity (a class, module, etc.) By using get_entity_type we are just saying that while getting the members of a module - we will perform docstring tests on only the ones which are supported by this function.

I don't know how to run a py.test code in a notebook. So to keep things simple, I will use self contained examples to test for docstrings.

Also because ipython notebook does not support namespaces as far as I know, I will use classes like TestDocstr1, TestDocstr2, etc. to show the evolution of code.

For the first cut, we will use the code in t1.py which has the following structure:

  1. Class Modstruct1 which when passed an entity - has a method get_all_members to return the entity's members.
  2. Class TestDocstr1 whose class method - test_docstr uses Modstruct1 to get an entity's members and raises an exception if any of them don't have docstrings.
In [4]:
%load t1.py
In [5]:
from nbcommon import *
import mod1

import inspect
from collections import defaultdict

class Modstruct1(object):
    """ Return a data structure representing all members of the passed
    entity """

    def __init__(self, base_entity):
        self.base_entity = base_entity

    def get_all_members(self):
        """ Get all the members (nested also) of the passed entity """
        return inspect.getmembers(self.base_entity)


class TestDocstr1(object):

    @classmethod
    def test_docstr(self, entity):
        """ Test whether the passed in entity and its children have docstring """
        entity_type = None
        non_docstr_entities = defaultdict(list)
        all_members = Modstruct1(entity).get_all_members()

        # get all the members of the passed entity
        for member in all_members:
            ref = member[1]
            try:
                entity_type = get_entity_type(ref)
                if not has_docstr(ref):
                    non_docstr_entities[entity_type].append(ref)
            except ValueError:
                # invalid entity type - skip it
                continue

        # if any entities without docstring - consolidate and raise error
        if non_docstr_entities.keys():
            errors = []
            for entity_type, refs in non_docstr_entities.iteritems():
                for ref in refs:
                    errors.append('%s %s does not have docstr' % (entity_type,
                                                                  ref.__name__))
            raise NoDocstrError('\n'.join(errors))

        return True

TestDocstr1.test_docstr(mod1)
---------------------------------------------------------------------------
NoDocstrError                             Traceback (most recent call last)
<ipython-input-5-4f05e337cab5> in <module>()
     48         return True
     49 
---> 50 TestDocstr1.test_docstr(mod1)

<ipython-input-5-4f05e337cab5> in test_docstr(self, entity)
     44                     errors.append('%s %s does not have docstr' % (entity_type,
     45                                                                   ref.__name__))
---> 46             raise NoDocstrError('\n'.join(errors))
     47 
     48         return True

NoDocstrError: function main does not have docstr
class SpecFile does not have docstr

As you can see t1.py has the following limitations:

  1. Passing mod1 resulted in checks being performed only on it's immediated children - main and SpecFile - we should've tested mod1 also along with the members of SpecFile.
  2. It does not prefix member name by it's parent i.e. main should be printed as mod1.main and SpecFile as mod1.SpecFile.

So in order to address this limitation we need to:

  1. Inspect the module also, and if a member is a class - drill down into the class and get it's nested classes and their respective methods.
  2. Prefix each member by it's parent name.

Using the above pointers, we evolve t1.py to t2.py which has the following structure:

  1. Same as t1.py but with Modstruct2 and TestDocstr2 instead of Modstruct1 and Testdocstr1.
  2. Adding a little more intelligence in get_all_members if member is of type class and making each member have the following structure:
{
  'name': member_name,
  'ref': object_ref,
  'type': module|function|class|method,
  'parent_ref': ref_to_parent
  'parent_name': parent_name,
}

Also, an additional check we can do at this stage is to filter out only those members which are defined in this module i.e. if someone does a 'from somemodule import *' we don't want to test the members which polluted the module name space.

In [6]:
%load t2.py
In [7]:
from nbcommon import *
import mod1

import inspect
import sys
from collections import defaultdict

class Modstruct2(object):
    """ Return a data structure representing all members of the passed
    entity """

    def __init__(self, base_entity):
        self.base_entity_type = get_entity_type(base_entity)
        self.base_entity = base_entity
        self.base_module = base_entity
        if self.base_entity_type != 'module':
            # if entity_type is class - know which module it belongs to
            self.base_module = sys.modules[base_entity.__module__]

    def get_entity_members(self, entity):
        """ Get first level members of the passed entity """
        members = []
        parent_name = entity.__name__
        for member in inspect.getmembers(entity):
            ref = member[1]
            # member has to be of supported entity type
            try:
                ref_type = get_entity_type(ref)
            except ValueError:
                continue

            # we will not inspect modules imported in base module
            if inspect.ismodule(ref): continue

            # member has to be defined in base module
            if ref.__module__ != self.base_module.__name__: continue

            # valid member - construct member data
            member_data = {
                'type': ref_type, 
                'ref': ref, 
                'name': entity.__name__ + '.' + ref.__name__,
                'parent_ref': entity,
                'parent_name': parent_name,
            }
            members.append(member_data)
        return members

    def get_all_members(self):
        """ Get all the members (nested also) of the passed entity """
        # add base module as the first element
        all_members = [{'type': 'module',
                        'ref': self.base_module, 
                        'name': self.base_module.__name__,
                        'parent_ref': None,
                        'parent_name': None}]

        # get first level members of the main entity
        nested_members = self.get_entity_members(self.base_entity)
        all_members.extend(nested_members)

        # call get_entity_members repetitively till you reach a stage where 
        # there are no nested members
        while nested_members:
            curr_nested_members = []
            for member_data in nested_members:
                if member_data['type'] == 'class':
                    # drill nested members only in a class
                    members = self.get_entity_members(member_data['ref'])
                    curr_nested_members.extend(members)
            nested_members = curr_nested_members
            all_members.extend(nested_members)

        return all_members

class TestDocstr2(object):

    @classmethod
    def test_docstr(self, entity):
        all_members = Modstruct2(entity).get_all_members()

        non_docstr_entities = defaultdict(list)

        # get all the nested members of root entity
        for member_data in all_members:
            # consolidate members based on type
            if not has_docstr(member_data['ref']):
                member_name = member_data['name']
                non_docstr_entities[member_data['type']].append(member_name)

        if non_docstr_entities.keys():
            errors = []
            # create error string
            for entity_type, refs in non_docstr_entities.iteritems():
                for refname in refs:
                    errors.append('%s: %s does not have docstr' % (entity_type,
                                                                   refname))
            raise NoDocstrError('\n' + '\n'.join(errors))
        return True

TestDocstr2.test_docstr(mod1)
---------------------------------------------------------------------------
NoDocstrError                             Traceback (most recent call last)
<ipython-input-7-a2b07a2dae29> in <module>()
     99         return True
    100 
--> 101 TestDocstr2.test_docstr(mod1)

<ipython-input-7-a2b07a2dae29> in test_docstr(self, entity)
     96                     errors.append('%s: %s does not have docstr' % (entity_type,
     97                                                                    refname))
---> 98             raise NoDocstrError('\n' + '\n'.join(errors))
     99         return True
    100 

NoDocstrError: 
function: mod1.main does not have docstr
class: mod1.SpecFile does not have docstr
class: SpecFile.DynamicSection does not have docstr
class: SpecFile.Section does not have docstr
class: SpecFile.Section1 does not have docstr
class: SpecFile.Section2 does not have docstr
class: SpecFile.StaticSection does not have docstr
module: mod1 does not have docstr
method: DynamicSection.__init__ does not have docstr
method: DynamicSection.validate does not have docstr
method: Section.__init__ does not have docstr
method: Section.validate does not have docstr
method: Section1.__init__ does not have docstr
method: Section1.validate does not have docstr
method: Section2.__init__ does not have docstr
method: Section2.validate does not have docstr
method: StaticSection.__init__ does not have docstr
method: StaticSection.validate does not have docstr

This code, although prints more info than t1.py has a glaring limitation:

  1. Section.validate name should have been mod1.SpecFile.Section.validate - names of all members should be fully qualified - parent name prefix is not enough - we need the entire ancestry starting from the module name.
  2. Sometimes we just need to check for the docstrings of a specific subset of the module, instead of the whole module i.e. check if mod1.SpecFile.Section1 members have docstrings or not.
Addressing limtiation 1 - get fully qualified name:

If we were using Python 3.3 - we wouldn't need to write code to fix this because as per this SO answer an additional attribute __qualname__ has been added to functions and classes in Python 3.3 - which would give you the fully qualified name.

Python tutor gives online shell for executing Python 2.7/3.3 code - this link runs a qualname checking code snippet in Python 3.3 - try running it in 2.7 also to see the difference.

Fortunately I am using Python 2.7.3 - so we march forth :)

One way to do address this limitation is to create a dictionary - id_name_map - which maps an object's id to it's name and have each newly added id look up to it's parent name before adding it.

To illustrate, the various stages of id_name_map can be:

# add mod1
{
  id_of_mod1: 'mod1',
}

# inspect mod1 - returns SpecFile, main
# class SpecFile - is the parent present in id_name_map - if yes - prefix it with its parent name
# function main - is the parent present in id_name_map - if yes - prefix it with its parent name
{
  id_of_mod1: 'mod1',
  id_of_main: 'mod1.main',
  id_of_SpecFile: 'mod1.SpecFile',
}

# inspect SpecFile - returns Section, StaticSection, DynamicSection, Section1, Section2
# for each of the above - is the parent present in id_name_map - if yes - prefix it with its parent name
{
  id_of_mod1: 'mod1',
  id_of_main: 'mod1.main',
  id_of_SpecFile: 'mod1.SpecFile',
  id_of_Section: 'mod1.SpecFile.Section',
  id_of_StaticSection: 'mod1.SpecFile.StaticSection',
  id_of_DynamicSection: 'mod1.SpecFile.DynamicSection',
  id_of_Section1: 'mod1.SpecFile.Section1',
  id_of_Section2: 'mod1.SpecFile.Section2',
}

and so on .... 
Addressing limtiation 2 - extract members of a specific subset of the module:

Overcoming this limitation is merely a matter of selecting a subset of members from all_members (created by get_all_members).

We cycle through all_members and extract out only those members whose name starts with the fully qualified name of the user specified entity i.e. beginning with mod1.SpecFile or mod1.SpecFile.Section1 and so on.

And so we come up with t3.py:

In [8]:
%load t3.py
In [9]:
from nbcommon import *
import mod1

import inspect
import sys
from collections import defaultdict

class Modstruct3(object):
    """ Return a data structure representing all members of the passed
    entity """

    def __init__(self, base_entity):
        self.base_entity_type = get_entity_type(base_entity)
        self.base_entity = base_entity
        self.base_module = base_entity
        self.id_name_map = {}
        self.all_members = []
        if self.base_entity_type != 'module':
            # if entity_type is class - know which module it belongs to
            self.base_module = sys.modules[base_entity.__module__]

    def get_entity_name(self, entity):
        """ Return fully qualified name of entity """
        return self.id_name_map.get(id(entity), None)

    def build_id_name_map(self, entity, parent=None):
        """ Map entity id to its fully qualified name """
        entity_name = entity.__name__
        if not parent is None:
            id_parent = id(parent)
            if id_parent in self.id_name_map:
                parent_name = self.id_name_map[id_parent]
                entity_name = '.'.join([parent_name, entity.__name__])
        self.id_name_map[id(entity)] = entity_name

    def extract_entity_members(self):
        """ From all the members extract out member tree of the base 
        entity """
        if self.base_entity_type == 'module':
            self.base_entity_members = self.all_members
            return self.base_entity_members

        base_entity_name = self.get_entity_name(self.base_entity)

        base_entity_members = []
        for member in self.all_members:
            if member['name'].startswith(base_entity_name):
                base_entity_members.append(member)
        self.base_entity_members = base_entity_members

    def get_entity_members(self, entity):
        """ Get first level members of the passed entity """
        members = []
        parent_name = self.get_entity_name(entity)
        for member in inspect.getmembers(entity):
            ref = member[1]
            # member has to be of supported entity type
            try:
                ref_type = get_entity_type(ref)
            except ValueError:
                continue

            # we will not inspect modules imported in base module
            if inspect.ismodule(ref): continue

            # member has to be defined in base module
            if ref.__module__ != self.base_module.__name__: continue

            # valid member - construct member data
            member_data = {
                'type': ref_type, 
                'ref': ref, 
                'name': parent_name + '.' + ref.__name__,
                'parent_ref': entity,
                'parent_name': parent_name
            }
            members.append(member_data)
            self.build_id_name_map(ref, entity)
        return members

    def get_all_members(self):
        """ Get all the members (nested also) of the passed entity """

        # add base module as the first element
        all_members = [{'type': 'module',
                        'ref': self.base_module, 
                        'name': self.base_module.__name__,
                        'parent_ref': None,
                        'parent_name': None}]

        # add base module as first entry to id_name_map - root of all names
        self.build_id_name_map(self.base_module, None)

        # get first level members of the module
        nested_members = self.get_entity_members(self.base_module)
        all_members.extend(nested_members)

        # call get_entity_members repetitively till you reach a stage where 
        # there are no nested members
        while nested_members:
            curr_nested_members = []
            # for member_type, member_ref, member_name in nested_members:
            for member_data in nested_members:
                if member_data['type'] == 'class':
                    # drill nested members only in a class
                    members = self.get_entity_members(member_data['ref'])
                    curr_nested_members.extend(members)
            nested_members = curr_nested_members
            all_members.extend(nested_members)

        self.all_members = all_members

        # extract subset of members in case base_entity is not a module
        self.extract_entity_members()

        return self.base_entity_members

class TestDocstr3(object):

    @classmethod
    def test_docstr(self, entity):
        all_members = Modstruct3(entity).get_all_members()

        non_docstr_entities = defaultdict(list)

        # get all the nested members of root entity
        for member_data in all_members:
            # consolidate members based on type
            if not has_docstr(member_data['ref']):
                member_name = member_data['name']
                non_docstr_entities[member_data['type']].append(member_name)

        if non_docstr_entities.keys():
            errors = []
            # create error string
            for entity_type, refs in non_docstr_entities.iteritems():
                for refname in refs:
                    errors.append('%s: %s does not have docstr' % (entity_type,
                                                                   refname))
            raise NoDocstrError('\n' + '\n'.join(errors))
        return True

TestDocstr3.test_docstr(mod1)
---------------------------------------------------------------------------
NoDocstrError                             Traceback (most recent call last)
<ipython-input-9-faf23dd8dd0b> in <module>()
    141         return True
    142 
--> 143 TestDocstr3.test_docstr(mod1)

<ipython-input-9-faf23dd8dd0b> in test_docstr(self, entity)
    138                     errors.append('%s: %s does not have docstr' % (entity_type,
    139                                                                    refname))
--> 140             raise NoDocstrError('\n' + '\n'.join(errors))
    141         return True
    142 

NoDocstrError: 
function: mod1.main does not have docstr
class: mod1.SpecFile does not have docstr
class: mod1.SpecFile.DynamicSection does not have docstr
class: mod1.SpecFile.Section does not have docstr
class: mod1.SpecFile.Section1 does not have docstr
class: mod1.SpecFile.Section2 does not have docstr
class: mod1.SpecFile.StaticSection does not have docstr
module: mod1 does not have docstr
method: mod1.SpecFile.DynamicSection.__init__ does not have docstr
method: mod1.SpecFile.DynamicSection.validate does not have docstr
method: mod1.SpecFile.Section.__init__ does not have docstr
method: mod1.SpecFile.Section.validate does not have docstr
method: mod1.SpecFile.Section1.__init__ does not have docstr
method: mod1.SpecFile.Section1.validate does not have docstr
method: mod1.SpecFile.Section2.__init__ does not have docstr
method: mod1.SpecFile.Section2.validate does not have docstr
method: mod1.SpecFile.StaticSection.__init__ does not have docstr
method: mod1.SpecFile.StaticSection.validate does not have docstr
In [10]:
TestDocstr3.test_docstr(mod1.main) # Testing if the function has docstring
---------------------------------------------------------------------------
NoDocstrError                             Traceback (most recent call last)
<ipython-input-10-acc9704b548d> in <module>()
----> 1 TestDocstr3.test_docstr(mod1.main) # Testing if the function has docstring

<ipython-input-9-faf23dd8dd0b> in test_docstr(self, entity)
    138                     errors.append('%s: %s does not have docstr' % (entity_type,
    139                                                                    refname))
--> 140             raise NoDocstrError('\n' + '\n'.join(errors))
    141         return True
    142 

NoDocstrError: 
function: mod1.main does not have docstr
In [11]:
TestDocstr3.test_docstr(mod1.SpecFile) # Testing if the class and its members have docstring
---------------------------------------------------------------------------
NoDocstrError                             Traceback (most recent call last)
<ipython-input-11-67ece24d1001> in <module>()
----> 1 TestDocstr3.test_docstr(mod1.SpecFile) # Testing if the class and its members have docstring

<ipython-input-9-faf23dd8dd0b> in test_docstr(self, entity)
    138                     errors.append('%s: %s does not have docstr' % (entity_type,
    139                                                                    refname))
--> 140             raise NoDocstrError('\n' + '\n'.join(errors))
    141         return True
    142 

NoDocstrError: 
class: mod1.SpecFile does not have docstr
class: mod1.SpecFile.DynamicSection does not have docstr
class: mod1.SpecFile.Section does not have docstr
class: mod1.SpecFile.Section1 does not have docstr
class: mod1.SpecFile.Section2 does not have docstr
class: mod1.SpecFile.StaticSection does not have docstr
method: mod1.SpecFile.DynamicSection.__init__ does not have docstr
method: mod1.SpecFile.DynamicSection.validate does not have docstr
method: mod1.SpecFile.Section.__init__ does not have docstr
method: mod1.SpecFile.Section.validate does not have docstr
method: mod1.SpecFile.Section1.__init__ does not have docstr
method: mod1.SpecFile.Section1.validate does not have docstr
method: mod1.SpecFile.Section2.__init__ does not have docstr
method: mod1.SpecFile.Section2.validate does not have docstr
method: mod1.SpecFile.StaticSection.__init__ does not have docstr
method: mod1.SpecFile.StaticSection.validate does not have docstr

But see what happens when we try to run the docstring test on a method:

In [12]:
TestDocstr3.test_docstr(mod1.SpecFile.Section.validate)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-91e4aba87ed2> in <module>()
----> 1 TestDocstr3.test_docstr(mod1.SpecFile.Section.validate)

<ipython-input-9-faf23dd8dd0b> in test_docstr(self, entity)
    120     @classmethod
    121     def test_docstr(self, entity):
--> 122         all_members = Modstruct3(entity).get_all_members()
    123 
    124         non_docstr_entities = defaultdict(list)

<ipython-input-9-faf23dd8dd0b> in get_all_members(self)
    112 
    113         # extract subset of members in case base_entity is not a module
--> 114         self.extract_entity_members()
    115 
    116         return self.base_entity_members

<ipython-input-9-faf23dd8dd0b> in extract_entity_members(self)
     45         base_entity_members = []
     46         for member in self.all_members:
---> 47             if member['name'].startswith(base_entity_name):
     48                 base_entity_members.append(member)
     49         self.base_entity_members = base_entity_members

TypeError: startswith first arg must be str, unicode, or tuple, not NoneType

As per the error - member['name'] is None, which implies that we could not find mod1.SpecFile.Section.validate's id key in id_name_map. But why did it fail only for the method, and not while checking docstring for a module, class or function? To know more let's run Modstruct3 insteance method get_all_members separately.

In [13]:
m1 = Modstruct3(mod1)
members = m1.get_all_members()
target_keys = ['mod1', 'mod1.main', 'mod1.SpecFile.Section', 'mod1.SpecFile.Section.validate']
target_members = {}
for member in members:
    # map id => name for target_keys members
    if member['name'] in target_keys:
        target_members[member['name']] = id(member['ref'])
In [14]:
id(mod1), target_members['mod1'] # same
Out[14]:
(149022228, 149022228)
In [15]:
id(mod1.main), target_members['mod1.main'] # same
Out[15]:
(149019812, 149019812)
In [16]:
id(mod1.SpecFile.Section), target_members['mod1.SpecFile.Section'] # same
Out[16]:
(149522052, 149522052)
In [17]:
id(mod1.SpecFile.Section.validate), target_members['mod1.SpecFile.Section.validate'] # diff
Out[17]:
(148864572, 148861892)

As you can see id of mod1.SpecFile.Section.validate and that stored in the dictionary returned by get_all_members instance method is different. This is because, as per this SO answer whenever you look up a method via class.name or instance.name, the method object is created a-new. We can illustrate this by a simple example:

In [18]:
%load test_meth_id.py
In [19]:
def f1(): pass

class A():
    def m1(): pass

x = f1
y = f1
z = f1
print "\n==== id for function f1 ===="
print 'id(f1) = ' + str(id(f1))
print 'id(x) = ' + str(id(x))
print 'id(y) = ' + str(id(y))
print 'id(z) = ' + str(id(z))

x = A
y = A
z = A
print "\n==== id for class A ===="
print 'id(A) = ' + str(id(A))
print 'id(x) = ' + str(id(x))
print 'id(y) = ' + str(id(y))
print 'id(z) = ' + str(id(z))

x = A.m1
y = A.m1
z = A.m1
print "\n==== id for method A.m1 ===="
print 'id(A.m1) = ' + str(id(A.m1))
print 'id(x) = ' + str(id(x))
print 'id(y) = ' + str(id(y))
print 'id(z) = ' + str(id(z))

print 'x is y ' + str(x is y)
print 'x == y ' + str(x == y)
==== id for function f1 ====
id(f1) = 150503124
id(x) = 150503124
id(y) = 150503124
id(z) = 150503124

==== id for class A ====
id(A) = 150346556
id(x) = 150346556
id(y) = 150346556
id(z) = 150346556

==== id for method A.m1 ====
id(A.m1) = 148863252
id(x) = 148864572
id(y) = 148862892
id(z) = 148861852
x is y False
x == y True

As you can see, the class/function id stays the same no matter how many references are created to it, but the method id changes for each assignment. Hence if the entity is a method, then instead of looking it up via the id, we could just cycle through all members and do a member == method check as shown by introducing the new method get_base_entity_name in t4.py:

In [22]:
%load t4.py
In [23]:
from nbcommon import *
import mod1

import inspect
import sys
from collections import defaultdict

class Modstruct4(object):
    """ Return a data structure representing all members of the passed
    entity """

    def __init__(self, base_entity, **options):
        self.base_entity_type = get_entity_type(base_entity)
        self.base_entity = base_entity
        self.base_module = base_entity
        self.id_name_map = {}
        self.all_members = []
        self.options = {'categorize': False}
        self.options.update(options)
        if self.base_entity_type != 'module':
            # if entity_type is class - know which module it belongs to
            self.base_module = sys.modules[base_entity.__module__]

    def get_entity_name(self, entity):
        """ Return fully qualified name of entity """
        return self.id_name_map.get(id(entity), None)

    def get_base_entity_name(self):
        """ Return the name of the base entity passed in by the user """
        # if base entity is not a method - just look up its id
        if self.base_entity_type != 'method':
            return self.get_entity_name(self.base_entity)

        # else as method id does not stay constant, cycle through all members 
        # and return the member matching the base entity ref
        for member in self.all_members:
            if self.base_entity == member['ref']:
                return self.get_entity_name(member['ref'])

    def build_id_name_map(self, entity, parent=None):
        """ Map entity id to its fully qualified name """
        entity_name = entity.__name__
        if not parent is None:
            id_parent = id(parent)
            if id_parent in self.id_name_map:
                parent_name = self.id_name_map[id_parent]
                entity_name = '.'.join([parent_name, entity.__name__])
        self.id_name_map[id(entity)] = entity_name

    def extract_entity_members(self):
        """ From all the members extract out member tree of the base 
        entity """
        if self.base_entity_type == 'module':
            self.base_entity_members = self.all_members
            return self.base_entity_members

        base_entity_name = self.get_base_entity_name()

        base_entity_members = []
        for member in self.all_members:
            if member['name'].startswith(base_entity_name):
                base_entity_members.append(member)
        self.base_entity_members = base_entity_members

    def get_entity_members(self, entity):
        """ Get first level members of the passed entity """
        members = []
        parent_name = self.get_entity_name(entity)
        for member in inspect.getmembers(entity):
            ref = member[1]
            # member has to be of supported entity type
            try:
                ref_type = get_entity_type(ref)
            except ValueError:
                continue

            # we will not inspect modules imported in base module
            if inspect.ismodule(ref): continue

            # member has to be defined in base module
            if ref.__module__ != self.base_module.__name__: continue

            # valid member - construct member data
            member_data = {
                'type': ref_type, 
                'ref': ref, 
                'name': parent_name + '.' + ref.__name__,
                'parent_ref': entity,
                'parent_name': parent_name
            }
            members.append(member_data)
            self.build_id_name_map(ref, entity)
        return members

    def get_all_members(self):
        """ Get all the members (nested also) of the passed entity """

        # add base module as the first element
        all_members = [{'type': 'module',
                        'ref': self.base_module, 
                        'name': self.base_module.__name__,
                        'parent_ref': None,
                        'parent_name': None}]

        # add base module as first entry to id_name_map - root of all names
        self.build_id_name_map(self.base_module, None)

        # get first level members of the module
        nested_members = self.get_entity_members(self.base_module)
        all_members.extend(nested_members)

        # call get_entity_members repetitively till you reach a stage where 
        # there are no nested members
        while nested_members:
            curr_nested_members = []
            # for member_type, member_ref, member_name in nested_members:
            for member_data in nested_members:
                if member_data['type'] == 'class':
                    # drill nested members only in a class
                    members = self.get_entity_members(member_data['ref'])
                    curr_nested_members.extend(members)
            nested_members = curr_nested_members
            all_members.extend(nested_members)

        self.all_members = all_members

        # extract subset of members in case base_entity is not a module
        self.extract_entity_members()

        # categorize members if required
        if self.options['categorize']:
            return self.categorize()

        return self.base_entity_members

class TestDocstr4(object):

    @classmethod
    def test_docstr(self, entity):
        all_members = Modstruct4(entity).get_all_members()

        non_docstr_entities = defaultdict(list)

        # get all the nested members of root entity
        for member_data in all_members:
            # consolidate members based on type
            if not has_docstr(member_data['ref']):
                member_name = member_data['name']
                non_docstr_entities[member_data['type']].append(member_name)

        if non_docstr_entities.keys():
            errors = []
            # create error string
            for entity_type, refs in non_docstr_entities.iteritems():
                for refname in refs:
                    errors.append('%s: %s does not have docstr' % (entity_type,
                                                                   refname))
            raise NoDocstrError('\n' + '\n'.join(errors))
        return True

TestDocstr4.test_docstr(mod1.SpecFile.Section1.validate)
---------------------------------------------------------------------------
NoDocstrError                             Traceback (most recent call last)
<ipython-input-23-691f6c3a5810> in <module>()
    159         return True
    160 
--> 161 TestDocstr4.test_docstr(mod1.SpecFile.Section1.validate)

<ipython-input-23-691f6c3a5810> in test_docstr(self, entity)
    156                     errors.append('%s: %s does not have docstr' % (entity_type,
    157                                                                    refname))
--> 158             raise NoDocstrError('\n' + '\n'.join(errors))
    159         return True
    160 

NoDocstrError: 
method: mod1.SpecFile.Section1.validate does not have docstr

And that works. So that's about it - now Modstruct can be used to find all the members of an entity and various checks can be performed on them.

As Modstruct provides a more generic functionality, it can be abstracted out as a separate utility. I've made some changes to enhance its usability and uploaded the code at mod_struct github repo

Hope this post was useful to you. Do share your thoughts and insights in the blog comment section as describe initially.