Demo on collections module¶

Reference: python doc

In [1]:

import collections

ChainMap¶

ChainMap is a faster way to merge some dicts, compared to dict_old.update(dict_new)

In [3]:

dict1 = {"John": True, "Bob": False}
dict2 = {"Alice": True}
cm = collections.ChainMap(dict1, dict2) # this is first than merging two dicts by updates
cm.maps

Out[3]:

[{'Bob': False, 'John': True}, {'Alice': True}]

In [5]:

dict2["Lucy"]=False
cm.maps ## the edit on the input can be reflected by chainmap automatically

Out[5]:

[{'Bob': False, 'John': True}, {'Alice': True, 'Lucy': False}]

In [8]:

cm["Mike"]=False
cm.maps ## all the add on cm itself is reflected on the first map

Out[8]:

[{'Bob': False, 'John': True, 'Mike': False}, {'Alice': True, 'Lucy': False}]

In [10]:

cm['Alice'] ## the search is go through the maps list, slower than one dict! 

Out[10]:

True

In [11]:

dict2['John']=False
cm['John'] # for the same key, the previous one will override the latter one when query

Out[11]:

True

In [ ]:

## a deep search version of ChainMap, the ordinary chainmap only update the first dict in the map list
## below is a hack subclass where update can also happen in other dicts

class DeepChainMap(ChainMap):
    'Variant of ChainMap that allows direct updates to inner scopes'

    def __setitem__(self, key, value):
        for mapping in self.maps:
            if key in mapping:
                mapping[key] = value
                return
        self.maps[0][key] = value

    def __delitem__(self, key):
        for mapping in self.maps:
            if key in mapping:
                del mapping[key]
                return
        raise KeyError(key)

In [13]:

cm_c = cm.new_child() ## support for nested context
cm_c.maps 

Out[13]:

[{},
 {'Bob': False, 'John': True, 'Mike': False},
 {'Alice': True, 'John': False, 'Lucy': False}]

In [14]:

cm_c.parents

Out[14]:

ChainMap({'John': True, 'Bob': False, 'Mike': False}, {'Alice': True, 'Lucy': False, 'John': False})

Counter¶

counter is ... just a counter, similar with the implmentation via dict, actually a subclass of dict which can directly count iterable input

In [16]:

ct = collections.Counter()
for w in [1,2,3,1,2,1]:
    ct[w] += 1
ct

Out[16]:

Counter({1: 3, 2: 2, 3: 1})

In [17]:

## better way to count
collections.Counter([1,2,3,1,2,1])

Out[17]:

Counter({1: 3, 2: 2, 3: 1})

In [19]:

wd=collections.Counter("hello world!")
wd

Out[19]:

Counter({' ': 1,
         '!': 1,
         'd': 1,
         'e': 1,
         'h': 1,
         'l': 3,
         'o': 2,
         'r': 1,
         'w': 1})

In [20]:

wd["z"] # for missing value, return 0 instead of rasing error

Out[20]:

In [23]:

list(wd.elements()) ## same things together

Out[23]:

['h', 'e', 'l', 'l', 'l', 'o', 'o', ' ', 'w', 'r', 'd', '!']

In [24]:

wd.most_common(2) ## the 2 most common char

Out[24]:

[('l', 3), ('o', 2)]

In [25]:

wd2 = {"l":10}
wd.update(wd2)
wd # the value is added instead of updated by dict.update

Out[25]:

Counter({' ': 1,
         '!': 1,
         'd': 1,
         'e': 1,
         'h': 1,
         'l': 13,
         'o': 2,
         'r': 1,
         'w': 1})

In [29]:

wd2 = collections.Counter(wd2)
wd2 ## counter can also directly initialized with a dict

Out[29]:

Counter({'l': 10})

In [30]:

wd+wd2, wd-wd2, wd&wd2, wd|wd2 ## overload for +, - (keep only positive ones), min, max

Out[30]:

(Counter({' ': 1,
          '!': 1,
          'd': 1,
          'e': 1,
          'h': 1,
          'l': 23,
          'o': 2,
          'r': 1,
          'w': 1}),
 Counter({' ': 1,
          '!': 1,
          'd': 1,
          'e': 1,
          'h': 1,
          'l': 3,
          'o': 2,
          'r': 1,
          'w': 1}),
 Counter({'l': 10}),
 Counter({' ': 1,
          '!': 1,
          'd': 1,
          'e': 1,
          'h': 1,
          'l': 13,
          'o': 2,
          'r': 1,
          'w': 1}))

deque¶

deque is a double-ended queue, it is thread safe, and O(1) for pop and append on both side of the queue

In [34]:

dq = collections.deque([1,2,3,4],maxlen=5) # the maxlen gives the restriction on the queue, new append must 
# come with pop from the other end when maxlen is reached 

In [35]:

dq.append(5)
list(dq)

Out[35]:

[1, 2, 3, 4, 5]

In [36]:

dq.appendleft(0)
list(dq)

Out[36]:

[0, 1, 2, 3, 4]

In [37]:

dq.count(1) # count how many 1 in the queue

Out[37]:

In [38]:

dq.extend([6,7,8])
list(dq)

Out[38]:

[3, 4, 6, 7, 8]

In [39]:

dq.popleft()

Out[39]:

In [40]:

list(dq)

Out[40]:

[4, 6, 7, 8]

In [41]:

dq.rotate(2)

In [42]:

list(dq) ## rotate the linked list 2 steps to the right

Out[42]:

[7, 8, 4, 6]

defaultdict¶

of course, defaultdict is a subclass of dict. with a new input for constructor called default_factory. It is designed for a default value of the dict when key is missing. Namely everytime the accessed key is missing, the default_factory() is called.

In [45]:

s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
d = collections.defaultdict(list)
[d[k].append(v) for k,v in s]
d

Out[45]:

defaultdict(list, {'blue': [2, 4], 'red': [1], 'yellow': [1, 3]})

In [46]:

## to understand the above code, just recall that
list() ## the is the function called each time a missing key is accessed

Out[46]:

[]

In [47]:

## the usage above is equivalent to
d = {}
for k, v in s:
    d.setdefault(k, []).append(v)
d

Out[47]:

{'blue': [2, 4], 'red': [1], 'yellow': [1, 3]}

In [49]:

int(),set(),tuple() ## lots of interesting default factory can be used and explored!

Out[49]:

(0, set(), ())

namedtuple¶

assign name to tuple, which is better understood compared to access by index pos

In [52]:

Point = collections.namedtuple('Point','x y') # for the field name ['x', 'y'], 'x y', 'x,y' are all legal formats
p=Point(11,22)

In [53]:

p.x, p.y

Out[53]:

(11, 22)

In [54]:

p[0], p[1] ## the old fashioned way in tuple also works

Out[54]:

(11, 22)

In [57]:

p = p._replace(x=33)  ## tuple finally can be changed
p.x

Out[57]:

In [59]:

p._fields ## view field names

Out[59]:

('x', 'y')

OrderedDict¶

A dict in python is implemented by hash to obtain O(1) speed with the cost that the keys cannot be sorted. An orderdict, well, is a dict remembering order. (order of keys that first inserted)

In [61]:

od = collections.OrderedDict([(1,2),(3,4)])

In [64]:

od[1], od[3]

Out[64]:

(2, 4)

In [67]:

od.popitem(last=True) # pop in LIFO fashion if last is True (by default), otherwise pop in FIFO order

Out[67]:

(3, 4)

In [68]:

od

Out[68]:

OrderedDict([(1, 2)])

In [ ]: