Demo on collections module

Reference: python doc

In [1]:
import collections

ChainMap

ChainMap is a faster way to merge some dicts, compared to dict_old.update(dict_new)

In [3]:
dict1 = {"John": True, "Bob": False}
dict2 = {"Alice": True}
cm = collections.ChainMap(dict1, dict2) # this is first than merging two dicts by updates
cm.maps
Out[3]:
[{'Bob': False, 'John': True}, {'Alice': True}]
In [5]:
dict2["Lucy"]=False
cm.maps ## the edit on the input can be reflected by chainmap automatically
Out[5]:
[{'Bob': False, 'John': True}, {'Alice': True, 'Lucy': False}]
In [8]:
cm["Mike"]=False
cm.maps ## all the add on cm itself is reflected on the first map
Out[8]:
[{'Bob': False, 'John': True, 'Mike': False}, {'Alice': True, 'Lucy': False}]
In [10]:
cm['Alice'] ## the search is go through the maps list, slower than one dict! 
Out[10]:
True
In [11]:
dict2['John']=False
cm['John'] # for the same key, the previous one will override the latter one when query
Out[11]:
True
In [ ]:
## a deep search version of ChainMap, the ordinary chainmap only update the first dict in the map list
## below is a hack subclass where update can also happen in other dicts

class DeepChainMap(ChainMap):
    'Variant of ChainMap that allows direct updates to inner scopes'

    def __setitem__(self, key, value):
        for mapping in self.maps:
            if key in mapping:
                mapping[key] = value
                return
        self.maps[0][key] = value

    def __delitem__(self, key):
        for mapping in self.maps:
            if key in mapping:
                del mapping[key]
                return
        raise KeyError(key)
In [13]:
cm_c = cm.new_child() ## support for nested context
cm_c.maps 
Out[13]:
[{},
 {'Bob': False, 'John': True, 'Mike': False},
 {'Alice': True, 'John': False, 'Lucy': False}]
In [14]:
cm_c.parents
Out[14]:
ChainMap({'John': True, 'Bob': False, 'Mike': False}, {'Alice': True, 'Lucy': False, 'John': False})

Counter

counter is ... just a counter, similar with the implmentation via dict, actually a subclass of dict which can directly count iterable input

In [16]:
ct = collections.Counter()
for w in [1,2,3,1,2,1]:
    ct[w] += 1
ct
Out[16]:
Counter({1: 3, 2: 2, 3: 1})
In [17]:
## better way to count
collections.Counter([1,2,3,1,2,1])
Out[17]:
Counter({1: 3, 2: 2, 3: 1})
In [19]:
wd=collections.Counter("hello world!")
wd
Out[19]:
Counter({' ': 1,
         '!': 1,
         'd': 1,
         'e': 1,
         'h': 1,
         'l': 3,
         'o': 2,
         'r': 1,
         'w': 1})
In [20]:
wd["z"] # for missing value, return 0 instead of rasing error
Out[20]:
0
In [23]:
list(wd.elements()) ## same things together
Out[23]:
['h', 'e', 'l', 'l', 'l', 'o', 'o', ' ', 'w', 'r', 'd', '!']
In [24]:
wd.most_common(2) ## the 2 most common char
Out[24]:
[('l', 3), ('o', 2)]
In [25]:
wd2 = {"l":10}
wd.update(wd2)
wd # the value is added instead of updated by dict.update
Out[25]:
Counter({' ': 1,
         '!': 1,
         'd': 1,
         'e': 1,
         'h': 1,
         'l': 13,
         'o': 2,
         'r': 1,
         'w': 1})
In [29]:
wd2 = collections.Counter(wd2)
wd2 ## counter can also directly initialized with a dict
Out[29]:
Counter({'l': 10})
In [30]:
wd+wd2, wd-wd2, wd&wd2, wd|wd2 ## overload for +, - (keep only positive ones), min, max
Out[30]:
(Counter({' ': 1,
          '!': 1,
          'd': 1,
          'e': 1,
          'h': 1,
          'l': 23,
          'o': 2,
          'r': 1,
          'w': 1}),
 Counter({' ': 1,
          '!': 1,
          'd': 1,
          'e': 1,
          'h': 1,
          'l': 3,
          'o': 2,
          'r': 1,
          'w': 1}),
 Counter({'l': 10}),
 Counter({' ': 1,
          '!': 1,
          'd': 1,
          'e': 1,
          'h': 1,
          'l': 13,
          'o': 2,
          'r': 1,
          'w': 1}))

deque

deque is a double-ended queue, it is thread safe, and O(1) for pop and append on both side of the queue

In [34]:
dq = collections.deque([1,2,3,4],maxlen=5) # the maxlen gives the restriction on the queue, new append must 
# come with pop from the other end when maxlen is reached 
In [35]:
dq.append(5)
list(dq)
Out[35]:
[1, 2, 3, 4, 5]
In [36]:
dq.appendleft(0)
list(dq)
Out[36]:
[0, 1, 2, 3, 4]
In [37]:
dq.count(1) # count how many 1 in the queue
Out[37]:
1
In [38]:
dq.extend([6,7,8])
list(dq)
Out[38]:
[3, 4, 6, 7, 8]
In [39]:
dq.popleft()
Out[39]:
3
In [40]:
list(dq)
Out[40]:
[4, 6, 7, 8]
In [41]:
dq.rotate(2)
In [42]:
list(dq) ## rotate the linked list 2 steps to the right
Out[42]:
[7, 8, 4, 6]

defaultdict

of course, defaultdict is a subclass of dict. with a new input for constructor called default_factory. It is designed for a default value of the dict when key is missing. Namely everytime the accessed key is missing, the default_factory() is called.

In [45]:
s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
d = collections.defaultdict(list)
[d[k].append(v) for k,v in s]
d
Out[45]:
defaultdict(list, {'blue': [2, 4], 'red': [1], 'yellow': [1, 3]})
In [46]:
## to understand the above code, just recall that
list() ## the is the function called each time a missing key is accessed
Out[46]:
[]
In [47]:
## the usage above is equivalent to
d = {}
for k, v in s:
    d.setdefault(k, []).append(v)
d
Out[47]:
{'blue': [2, 4], 'red': [1], 'yellow': [1, 3]}
In [49]:
int(),set(),tuple() ## lots of interesting default factory can be used and explored!
Out[49]:
(0, set(), ())

namedtuple

assign name to tuple, which is better understood compared to access by index pos

In [52]:
Point = collections.namedtuple('Point','x y') # for the field name ['x', 'y'], 'x y', 'x,y' are all legal formats
p=Point(11,22)
In [53]:
p.x, p.y
Out[53]:
(11, 22)
In [54]:
p[0], p[1] ## the old fashioned way in tuple also works
Out[54]:
(11, 22)
In [57]:
p = p._replace(x=33)  ## tuple finally can be changed
p.x
Out[57]:
33
In [59]:
p._fields ## view field names
Out[59]:
('x', 'y')

OrderedDict

A dict in python is implemented by hash to obtain O(1) speed with the cost that the keys cannot be sorted. An orderdict, well, is a dict remembering order. (order of keys that first inserted)

In [61]:
od = collections.OrderedDict([(1,2),(3,4)])
In [64]:
od[1], od[3]
Out[64]:
(2, 4)
In [67]:
od.popitem(last=True) # pop in LIFO fashion if last is True (by default), otherwise pop in FIFO order
Out[67]:
(3, 4)
In [68]:
od
Out[68]:
OrderedDict([(1, 2)])
In [ ]: