Just as there are atomic algorithms there are also atomic data strutures. These essentially amount to different objects to store data in, each object following a conveniently designed protocol. In this post I’m going to go through the most common ones that every coder should just sort of know. Namely: linked lists, hash tables, stacks, queues and r rooted trees.
This may sound like a lot for a single post, but as we’ll see, a lot of the protocols kind of boil do to a single use case - storing data, and accessing it conveniently in order to implement different algorithms. Implementation of algorithms is key here, the design of data structures is intrinsically interesting from a maths perspective, but their design is heavily influenced by the problems you can solve by using them.
Similar to the previous post on algorithms I’m going to go through the above data structures one by one, and write a simple Python implementation of the protocol. Furthermore, as a Python programmer I’m also going to write a little bit about how Python implements these data structures as language primitives (i.e. for free!) as it doesn’t necessarily strictly adhere to the above protocols as described in CS textbooks but they do offer some convenient extras that are special to the language. My aim here is simply to make visible all the moving parts.
Stacks and Queues
Stacks and queues are well named, and would behave as you would expect for such objects. Stacks operate under ‘Last In First Out’ (LIFO), like a stack of plates - you would get to the bottom plate by first removing the plates on top of it. Whereas queues operate under ‘First In First Out’ (FIFO), much like standing in a (well functioning) queue.
Below I’ve implemented a stack object using the Python list. Stacks implement two methods to add (push) and delete (pop) data. The pop method usually doens’t accept any arguments as Stacks by their nature follow LIFO.
class Stack: def __init__(self): """Implement using a list""" self._stack =  def __repr__(self): return str(self._stack) def empty(self): """Check whether stack is empty""" return not bool(self._stack) def push(self, v): """Add an item to top of stack""" self._stack += [v] def pop(self): """Remove item from top of stack""" self._stack = self._stack[:-1]
Python doesn’t offer stack primitives, but they do encourage the use of Python lists as stacks, as you can perform operations in a LIFO manner:
l = list([1, 2, 3, 4]) # add to top of stack l.append(5) # pop from top of stack l.pop() # lists are truthy, so can check if empty:  # False not  # True
Again, Python doesn’t offer a primitive queue object, so I’ve implemented the basic protocol below using a list. Additionally, the delete and insert operations are called dequeue and enqueue by convention for queues.
class Queue: def __init__(self): """Implement using a list""" self._queue =  def empty(self): """Check if queue is empty""" return not bool(self._queue) def enqueue(self, v): """Add an item to back of the queue""" self._queue += [v] def dequeue(self): """Remove item from front of the queue""" self._queue = self._queue[0:]
Although Python doesn’t offer queue or stack primitives, it does offer a
built-in implementation for FIFO/LIFO objects through the
that ships with the standard library that fulfill both the simple
protocols above, with some extras. It would in general be wise to use this
rather than implement your own, as they will have taken care to optimise
the performance of the objects in comparison to the ones I’ve implemented
What kind of problems can we use this technology for? Pretty much anything where you want to access data in a consistent way, for
Stacks and Queues operations to insert and delete data are both $O(1)$. This latter statement is a result of the fact that data access occurs in a pre-defined manner (LIFO/FIFO). Therefore, when implementing an algorithm that relies on data access in some pre-defined manner, one should attempt to use a stack/queue. For example I made use of a queue to implement breadth first search, where I needed to store the next nodes to consider in order of consideration.
Linked Lists (Dynamic Arrays)
In most low level programming languages, you have to declare how much stuff can go in an array before putting in anything in by setting aside some memory when you create an array object. In doing this though, you quickly run into problems. Imagine you create an algorithm to process the results of some mathematical calculation, each time you get a result you add it to an array, however setting your calculation off you quickly eat up all the memory you’ve allocated to your initial array, and you have to create a new array object as a part of your algorithm. This could quickly get messy, with references to new arrays being created dynamically. (Singly) Linked Lists, or Dynamic Arrays, get around this problem by allowing you to extend arrays as you go. One does so by modelling each datum in the array as being stored in a node, with pointer to the next datum.
This model is pretty good, though it is a little fiddly to deal with inserting and deleting data, the basic methods for a linked list, which I’ve implemented below, are again to do with the insertion and removal of data.
class Node: """ Node to store data, and pointer to next node in linked list as well as a pointer to the previous node. """ def __init__(self, data): self._next = None self.prev = None self.data = data class LinkedList: """Implement using Nodes and Pointers""" def __init__(self): self.head = Node(None) def __repr__(self): """Some Python sugar to print a representation""" x = self.head ll =  while x.data is not None: ll.append(x.data) x = x._next return str(ll) def insert(self, data): """Insert an element at the head of the list""" node = Node(data) node._next = self.head self.head.prev = node self.head = node self.head.prev = Node(None) def search(self, data): """Return the node that holds a given piece of data""" x = self.head while x.data != data: x = x._next return x def remove(self, data): """Splice out a given node""" x = self.search(data) x.prev._next = x._next x._next.prev = x.prev if x.data == self.head.data: self.head = x._next
This is all well and good, but luckily you probably won’t have to ever
fiddle around with a similar implementation of your own, as Python has
you covered with the ever helpful list. Python lists amount to dynamic
arrays which you can (within reason) keep appending data to. They
are fantastic, versatile objects, with many more features than my little
Linked List above. Specifically, my
remove method fails to account for
nodes with duplicate data, and my
insert method can’t place a node
anywhere but at the head of the linked list. Python lists allow you to
place items in a list by index, which is much more versatile than the
above - which is the basic protocol for (singly) linked lists.
Linked lists are ‘linear’, i.e. the way to search through them happens
linearly - one node at a time. Therefore we can reason that the
method, and therefore the
remove method are both $O(N)$, whereas the
insert method is much cheaper as we always insert at the head of the
list, and is $O(1)$. Therefore, these data structures are more useful when
our algorithm needs to store data much more often than it needs to look
We can use a similar node-pointer model to represent rooted trees, in the context of this post we’ll only consider binary trees and binary search trees. As with linked lists, we’ll assume each node contains some data, the remaining attributes are the pointers to the other nodes, which will inevitable vary with the type of rooted tree.
The nodes of binary trees can only point at most to two other nodes,
hence binary. We refer to nodes, and the ones they point to as
‘parents’ and ‘children’, respectively,
In terms of implementation, it’s a fairly common pattern I’ve seen
elsewhere to specify
binary (search) trees in an abstract manner. Using just a
object, with pointers to a given node’s parent, and two children.
class Node: """Implement a tree by specifying relations between nodes.""" def __init__(self, data): self.parent = None self.left_child = None self.right_child = None self.data = data
In terms of manipulating this structure to view/insert/delete data we
can implement common tree-traversal methods. Below, I’ve implemented a
lookup method to find a given node in binary (search) tree using in-order
def lookup(node, value): """ Lookup a given value in a binary tree, starting the search at a specified root node. Returns True if value is contained, within tree, False otherwise. """ if node: if ( node.data == value or lookup(node.left_child, value) or lookup(node.right_child, value) ): return True return False
Binary Search Trees
Binary search trees have the additional property that all child nodes to the left of a given node contain data in of smaller or equal value to the datum in that node, and all nodes to it’s right contain data of greater value. For integers this is obvious, but less so for things like strings. Where you may have to map the data you’re storing back to unique integers in order to store it. Using this we can write a simple method to insert data into Binary Search Trees.
# construct a simple BST with empty leaf nodes n2 = Node(2) n1 = Node(1) n3 = Node(3) n2.left_child = n1 n2.right_child = n3 n1.parent = n2 n3.parent = n2 n3.left_child = Node(None) n3.right_child = Node(None) n3.left_child.parent = n3 n3.right_child.parent = n3 n1.right_child = Node(None) n1.left_child = Node(None) n1.left_child.parent = n1 n1.right_child.parent = n1 def insert(node, value, pointer=None): """ Insert data into Binary Search Trees Begin by looking up whether or not a node exists with this value in the tree. If it does not, we have to recurse down the tree to find an appropriate place to store the value. """ if node and node.data: if value <= node.data: pointer = node.left_child insert(node.left_child, value, pointer) else: pointer = node.right_child insert(node.right_child, value, pointer) else: pointer.data = value
I’ve used a similar recursive strategy to implement
above. This is because recursion pops out fairly naturally when manipulating
trees, mostly beacuse we’ll want to consider each node of a tree in a
linear manner, often based on a condition, but the data structure is
itself clearly non-linear.
Python doesn’t offer any tree primitives, so are they really that useful? I think a lot people understimate the importance of trees. Famously the author of Homebrew complained about being rejected from Google for ‘failing to invert a binary tree’, which he thought was BS as he had created software that pretty much all Mac coders use. Initially I was inclined to side with him, but I read this blog post that really changed my mind on the issue.
So why doesn’t Python implement tree primitives despite them being so
important in Computer Science? The truth is it does, a more complex abstraction
such as a Python dictionary, or a JSON object is built on top of a tree like
structure. In terms of raw trees, much like my implementation of a linked
list earlier, Python’s creators probably thought it was more useful in
terms of solving problems to wrap the basic abstraction inside a higher
level concept, which when you think about it is. Do you really want to
implement a tree traversal method, every time you want to look up an item
dict like structure?
My recursive tree methods above go through each node, one by one, operating on a constructed tree in-place. Therefore, they are of $O(1)$ in space, and of $O(log(N))$ in time. This comes from the number of levels in a binary tree, which can be found by using a similar argument to that of merge sort. But as a binary tree could degenerate into a linked-list, this time complexity in the worst case would be $O(N)$.
Hash tables are the most effective way of representing an abstratction
dictionary like objects. i.e. where we want to read, update, and
insert data by reference to some name. In theory names (or hashes) are
difficult to create uniquely (search for hash collisions), but in practice
most hashing algorithms are sound enough to make collisions very rare indeed.
This lookup can be very efficient, $O(1)$, as long as we can guarantee
I’ve built a fairly contrived implementation below, using Python lists as the underlying data structure. I’ve used lists to handle potential hash collision, i.e. if a hash for a particular value corresponds to a hash for some other value, by using a list to store values you can handle collisions.
def small_hash(value): return hash(value) % 100 class HashTable: """Simple Hashtable implementation using Python lists""" def __init__(self): # buffer space self._table = [None for i in range(int(10e5))] self._values =  def __repr__(self): l =  for v in self._values: l.append((v, self._table[small_hash(v)])) return str(l) def __setitem__(self, value, item): key = small_hash(value) if value not in self._values: self._values.append(value) if not self._table[key]: self._table[key] =  self._table[key].add(item) def __getitem__(self, value): key = small_hash(value) return self._table[key]
In reality you would never bother implementing the above code in Python as dictionaries are primitives in the language, and in many other languages both interpreted and compiled.
If you’re hashing algorithm is decent, you should be able to perform lookups/inserts in $O(1)$ on average.
Data structures are a very deep topic, and I’ve scratched the surface here with simple implementations, of fairly simple data structures. For example, the hash table I’ve implemented is one in which I’ve not optimised the hash function to reduce collisions, or considered really the type of problem I’d wish to solve (and hence the underlying data structure it should rely on). However, it’s a good start for anyone interesting in studying the topic, and CLRS is really the place to supplemement this post.