Exploring Collections module in Python

python datastructures

Posted by Vipin Ragashetti on October 28, 2019 · 7 mins read

Collections is a built-in Python module that implements specialized container datatypes providing alternatives to Python’s general purpose built-in containers such as dict, list, set, and tuple.

Some of the useful data structures present in this module are:

namedtuple

The data stored in a plain tuple can only be accessed through indexes as can be seen in the example below:
plain_tuple = (10,11,12,13)
plain_tuple[0]
10
plain_tuple[3]
13

We can’t give names to individual elements stored in a tuple. Now, this might not be needed in simple cases. However, in case a tuple has many fields this might be kind of a necessity and will also impact the code’s readability. It is here that namedtuple’s functionality comes into the picture. It is a function for tuples with Named Fields and can be seen as an extension of the built-in tuple data type. Named tuples assign meaning to each position in a tuple and allow for more readable, self-documenting code. Each object stored in them can be accessed through a unique (human-readable) identifier and this frees us from having to remember integer indexes. Let’s see its implementation.

from collections import namedtuple
fruit = namedtuple('fruit','number variety color')
guava = fruit(number=2,variety='HoneyCrisp',color='green')
apple = fruit(number=5,variety='Granny Smith',color='red')

We construct the namedtuple by first passing the object type name (fruit) and then passing a string with the variety of fields as a string with spaces between the field names. We can then call on the various attributes:

guava.color
'green'
apple.variety
'Granny Smith'

counter

Counter is a dict subclass which helps to count hashable objects. The elements are stored as dictionary keys while the object counts are stored as the value. Let’s work through a few examples with Counter.
# Importing Counter from collections
from collections import Counter

# With Strings
c = Counter('abcacdabcacd')
print(c)
Counter('a': 4, 'c': 4, 'b': 2, 'd': 2)

# With lists
lst = [5,6,7,1,3,9,9,1,2,5,5,7,7]
c = Counter(lst)
print(c)
Counter('a': 4, 'c': 4, 'b': 2, 'd': 2)

# With Sentence
s = 'the lazy dog attacked over another lazy dog'
words = s.split()
Counter(words)
Counter('another': 1, 'dog': 2, 'attacked': 1, 'lazy': 2, 'over': 1, 'the': 1)
Common functions on counter object
sum(c.values())                 # total of all counts
c.clear()                       # reset all counts
list(c)                         # list unique elements
set(c)                          # convert to a set
dict(c)                         # convert to a regular dictionary c.items()# convert to a list like (elem, cnt)
Counter(dict(list_of_pairs))    # convert from a list of(elem, cnt)
c.most_common()[:-n-1:-1]       # n least common elements
c += Counter()                  # remove zero and negative counts

defaultdict

Dictionaries are an efficient way to store data for later retrieval having an unordered set of key: value pairs. Keys must be unique and immutable objects.



from collections import defaultdict
d = defaultdict(object)
print(d['A'])
object object at 0x7fc9bed4cb00

The defaultdict in contrast will simply create any items that you try to access (provided of course they do not exist yet).The defaultdict is also a dictionary-like object and provides all methods provided by a dictionary. However, the point of difference is that it takes the first argument (default_factory) as a default data type for the dictionary.

OrderedDict

An OrderedDict is a dictionary subclass that remembers the order in which that keys were first inserted. When iterating over an ordered dictionary, the items are returned in the order their keys were first added. Since an ordered dictionary remembers its insertion order, it can be used in conjunction with sorting to make a sorted dictionary

Dictionary ordered by key

OrderedDict(sorted(d.items(), key=lambda t: t[0]))
OrderedDict([('apple', 4), ('banana', 3), ('orange', 2), ('pear', 1)])

Dictionary ordered by value

OrderedDict(sorted(d.items(), key=lambda t: t[1]))
OrderedDict([('pear', 1), ('orange', 2), ('banana', 3), ('apple', 4)])

Dictionary ordered by length of key string

OrderedDict(sorted(d.items(), key=lambda t: len(t[0])))
OrderedDict([('pear', 1), ('apple', 4), ('banana', 3), ('orange', 2)])

deque

deque is the list optimised for inserting and removing items.
from collections import deque

list = ["a","b","c"]
deq = deque(list)
print(deq)

deque(['a', 'b', 'c'])

Inserting elements into deque

You can easily insert an element to the deq we created at either of the ends. To add an element to the right of the deque, you have to use append() method.

If you want to add an element to the start of the deque, you have to use appendleft() method.

deq.append("d")
deq.appendleft("e")
print(deq)

deque(['e', 'a', 'b', 'c', 'd'])

Removing elements from deque

Removing elements is similar to inserting elements. You can remove an element the similar way you insert elements. To remove an element from the right end, you can use pop() function and to remove an element from left, you can use popleft().

deq.pop()
deq.popleft()
print(deq)

deque(['a', 'b', 'c'])

Clearing deque and counting each element in deque

# Clearing the deque

list = ["a","b","c"]
deq = deque(list)
print(deq)
print(deq.clear())

# Counting each element
list = ["a","b","c"]
deq = deque(list)
print(deq.count("a"))

ChainMap

ChainMap is used to combine several dictionaries or mappings. It returns a list of dictionaries.

Creating the ChainMap

To create a chainmap we can use ChainMap() constructor. We have to pass the dictionaries we are going to combine as an argument set.

dict1 = { 'a' : 1, 'b' : 2 }
dict2 = { 'c' : 3, 'b' : 4 }
chain_map = ChainMap(dict1, dict2)
print(chain_map.maps)
[{'b': 2, 'a': 1}, {'c': 3, 'b': 4}]

Another important point is ChainMap updates its values when its associated dictionaries are updated. For example, if you change the value of 'c' in dict2 to '5', you will notice the change in ChainMap as well.

dict2['c'] = 5
print(chain_map.maps)

[{'a': 1, 'b': 2}, {'c': 5, 'b': 4}]

Getting keys and values from ChainMap

dict1 = { 'a' : 1, 'b' : 2 }
dict2 = { 'c' : 3, 'b' : 4 }
chain_map = ChainMap(dict1, dict2)
print (list(chain_map.keys()))
print (list(chain_map.values()))

Adding new dictionary to ChainMap

dict3 = {'e' : 5, 'f' : 6}
new_chain_map = chain_map.new_child(dict3)
print(new_chain_map)

ChainMap({'f': 6, 'e': 5}, {'a': 1, 'b': 2}, {'b': 4, 'c': 3})
Python collection module still needs improvements and features as compared to Java collection library. We can except changes in upcoming python releases.