Iterators in Python – Part 1

This is the first in a series of Python instructables I hope to continue, and is aimed at intermediate learners who have already completed something like CodeAcademy's Python Course, but are looking for more information about programming effectively. Hopefully you can get something out of this article and the articles that follow.

In this first installment, I'll explain iteration, and I'll show you how to make your objects iterable by implementing the Python Iterator Protocol. In the next installment, I'll show you nifty things you can do with iterators, often in a way that's more intuitive, compact, and beautiful than approaches you've already learned.

All examples (with more complete code I've omitted here for simplicity) can be found at https://gitlab.matevi.sh/blog-posts/iterators-1/.

Prerequisite Knowledge
  • Basic-to-intermediate knowledge of Python
  • The final example in this article uses concepts that those who understand recursion will be familiar with. Knowledge of recursion is not necessary for understanding iterators, but if you want to learn more about Computer Science as well as practical programming, you might as well learn it sooner rather than later. If you want a thorough introduction to recursion, check out this page.

Iteration

What is iteration? Simply put, iteration is accessing (and generally processing) the elements of some collection of items, one after the other, in some orderly manner. Most programming languages have a concept of iteration, even if the language doesn't have any niceties to make it particularly easy.

The snippet below iterates over an array of strings. It should be somewhat clear what it's doing, even if you don't know C.

// Define an array of strings
char myStrings[][10] = {"one", "two", "three", "four", "five"};

// Print strings in array, from 0 -> |length of myStrings (4)|
size_t i = 0;  
for (i = 0; i < 5; i++) {  
    printf("%s\n", myStrings[i]);
}

In this example, the C language isn't providing us with any means to easily get at the elements in myStrings – the person writing the code needs to know the length of the array, and the order they want to access elements. However we are still iterating over the elements, as we're accessing many (in this case, all) of them in order.

More modern languages than C, especially those which are Object Oriented often have a API for returning iterables so that a person who might want to use your code (or, of course, you) can make a simple call such as:

myIterator = iter(somePersonsObject)  
# do stuff with myIterator

In Python iterable objects can be iterated over using a variety of methods. Some like for loops and list comprehensions you're probably already familiar with, and are a core part of Python. However it's using other methods such as map and filter (along with everything in the itertools module) that we can really do some awesome things when chained together.

This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle [streams], because that is a universal interface.

– Doug McIlroy

How's it Done?

In Python (both 2 and 3) there are two types of objects you need to know about to start iterating over your own objects.

Iterables

First, there are iterables, the objects capable of being iterated over. Generally your object will have some elements that you want to perform some operation on.

There are two methods by which an object can "indicate" to Python it is iterable (that is, when iter(my_obj) is called, the existance of these methods will be checked)

  1. It can define an __iter__(self) method which should return an iterator (more on this in a second).
  2. It can define __getitem__(self, index), which should return the object corresponding to index, eventually raising an IndexError when the index no longer references an element (i.e. if it's out of bounds).

The preferred method is through __iter__, and if you have an object that defines both, __iter__ is the method that will be called. For example:

>>> class Iterable(object):
       def __init__(self):
        self.arr = range(10)

      def __iter__(self):
        return iter(self.arr)

      def __getitem__(self, index):
        """ If Python tried to iterate with this method we'd be stuck in an infinite loop (because `__getitem__` never raises an `IndexError`) """
        return self.arr[0]

>>> print [i for i in Iterable()]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Iterators

An iterator (in Python) is an object that defines a next() method (thankfully changed to __next__() in Python 3 for consistency with Python's other magic methods), which should return the next element, and raising a StopIteration exception if there are no more elements.

All Together Now

Your iterable, which defines a __iter__ method, returns an iterator, which defines a __next__ method. The iterator generally has a small bit of state (variables) to keep track of where it is. For instance:

>>> class Iterable(object):
       def __init__(self):
        self.arr = range(10)

      def __iter__(self):
        # This iterable returns *itself* as the iterator, and keeps track of
        # where it is with a private variable (indicated by the underscore)
        self._current_index = 0
        return self

      def __next__(self):
         # Get item and update which index we're returning next
         item = self.arr[self._current_index]
         self._current_index += 1
         return item

What about an example?

Sure thing: Maybe you've made a binary tree which has nodes (the circles) which contain a numerical value, and a left and right pointer, each of which can point to another node.
A binary tree

A simple implementation of such a tree might look like this:

class BinaryTreeNode(object):

  def __init__(self, value):
    self.value = value
    self.left = None
    self.right = None

  def set_left(self, node):
    self.left = node
    return self  # return self so we can chain calls

  def set_right(self, node):
    self.right = node
    return self  # return self so we can chain calls

  def __str__(self):
    return "<%d: %s, %s>" % (self.value, str(self.left), str(self.right))

I could then build a simple tree as such:

>>> tree = BinaryTreeNode(1).set_left(BinaryTreeNode(0)).set_right(BinaryTreeNode(2))
>>> print tree
<1: <0: None, None>, <2: None, None>>  

But what if I want to iterate over each nodes values, from left to right? For instance, in the image above, I'd want to have the output be: 2, 7, 5, 6, 11, 2, 5, 4, 9. This is what we'd call tree traversal, and it is a form of iteration.

Let's get started

First, let's implement an 'in-order' iterator over the tree. You can think of in-order iteration as the order you'd get if you 'flattened' the tree. The ordering I mentioned (2, 7, 5, 6, 11, 2, 5, 4, 9) is the in-order traversal of the tree above.

Let's think about how we might implement this ordering with our BinaryTreeNode. When you're trying to solve a problem with trees, often you can break down the problem into smaller problems. We want to be able to pass in the root node, and have the iteration take place over every node in the tree, in the correct order.

Let's look at the first few numbers of the in-order traversal for the tree: 2, 7, 5, 6, 11 ... what do we notice? They're all from the left side of the tree! Then we have 2, which is the root node, then we have 5, 4, 9, all from the right. So if we recursively call iter on the left and right, and chain the results together, maybe we could make that work.

We can use itertools.chain which chains iterators together into one, taking from the first until it runs out of items (what exception would it raise to indicate this?), and then starts consuming the second iterator until it runs out, and so on.

>>> from itertools import chain  # basically
>>> class IteratingBinaryTreeNode(BinaryTreeNode):

      def __iter__(self):
        left = iter(self.left or [])
        right = iter(self.right or [])
        return chain(left, right)

>>>> tree = (IteratingBinaryTreeNode(4) # Root
              .set_left(IteratingBinaryTreeNode(2) # L
                .set_left(IteratingBinaryTreeNode(1)) # LL
                .set_right(IteratingBinaryTreeNode(3))) # LR
              .set_right(IteratingBinaryTreeNode(5) # RL
                .set_right(IteratingBinaryTreeNode(6)))) # RR
>>> print list(iter(iter_tree))
[]

Wait, what? Somethings missing. Let's step through what we've told the interpreter to do.

First we have left = iter(self.left or []) – inside the parenthesis, we first see if self.left is None, and if not we just use an empty list. That way when we call iter, we know that the item is either a list or an IteratingBinaryTreeNode, both of which we know we can call iter on. Then we do the same for right.

But what does calling iter on the left child node do? It returns the result of calling __iter__! But what happens when we have a node with no children? Both left and right would be empty iterators, but the value of the node is never returned!

We can fix this by just making sure we return the value of the current node at the right time, between the nodes that are before it (the nodes on the left), and nodes that are after it (the nodes on the right). So now we have:

>>> class IteratingBinaryTreeNode(BinaryTreeNode):

      def __iter__(self):
        left = iter(self.left or [])
        right = iter(self.right or [])
        return chain(left, iter([self.value]), right)

# This is the same as before, but you'll need to re-run this code so that tree is built from our fixed node class
>>> tree = (IteratingBinaryTreeNode(4) # Root
              .set_left(IteratingBinaryTreeNode(2) # L
                .set_left(IteratingBinaryTreeNode(1)) # LL
                .set_right(IteratingBinaryTreeNode(3))) # LR
              .set_right(IteratingBinaryTreeNode(5) # RL
                .set_right(IteratingBinaryTreeNode(6)))) # RR
>>> list(iter(tree))
[1, 2, 3, 4, 5, 6]

What can I do with it?

Take a look at the itertools documentation for a cookbook full of reusable tools and paradigms you can build that can consume the output iterators.