Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Functional programming in Python

Programming Paradigms

There are a number of programming paradigms or "styles of code" that can be categorized in a number of ways. The most popular, most commonly used ones are the following:

  • Procedural
  • Object Oriented
  • Declarative
  • Functional

Procedural programming is the most common one, putting one statement after the other. Using functions (that are called procedures in some languages). Maybe splitting out some code to libraries (aka. modules). C is probably the most popular procedural language.

Object Oriented is when you divide your code into classes. Create instances of the classes. Internally, inside the classes, you still have functions (procedures), though these are now called methods. Java is probably the most popular object oriented language.

Declarative is very different. SQL is probably the most well-known example. Here you describe the type of result you are looking for and the engine behind figures out how to go about to get what you asked for.

Functional is a style in which you basically write lots of functions. Haskell might be the most well-known example.


In Python we can use Procedural, Object Oriented, and Functional paradigms. It is also rather different from the other languages. You don't pick one paradigm, you just write code in hopefully the most appropriate way for the given task and given environment.

When you write len(text) this is procedural. You call the len function (procedure) on a variable called text.

When you write text.lower() this is basically Object Oriented, even though you don't yet defined your own classes and instances. You are using the fact that text is an instance object and it has a method called lower.

When using map, filter, or list-comprehension you use the functional paradigm of Python.

So you see you probably have already use some of the functional aspects of Python even without realizing it.

In the next few pages we'll show a lot of examples in Python using the functional paradigm and by the end of this booklet you'll understand how things work much better and you'll feel a lot more comfortable writing these constructs.

Functional programming

Functional programming has several key aspects or we might call them "preferences".

  • Immutability: Variables, despite their name, don't change.
  • Separation of data and functions.
  • Pure functions: Functions have no side-effects.
  • First-class functions: You can assign a function to another name. You can pass function to other functions and return them as well. You can also manipulate functions.
  • Higher order functions: A functions that either takes a function as a parameter or returns a function as a parameter.

In Python functional programming just blends in the rest of the code. You already used some functional constructs without probably realizing that they functional.

It also seems that Object Oriented programming and Functional programming are two very different styles. One could say they are ortoghonal to each other, but in Python you can combine them.

Let's start with some simple things.

Iterators (Iterables)

You already know that there are all kinds of objects in Python that you can iterate over using the for-in construct. For example, you can iterate over the characters of a string, or the elements of a list, or whatever range() returns. You can also iterate over the lines of a file and you have probably seen the for in construct in other cases as well. The objects that can be iterated over are collectively called iterables. You can do all kind of interesting things on such iterables. We'll see a few now.

  • There are several data types that we can iterate over using the for ... in ... construct. For example, strings, lists, tuples, filehandles, and range:

string

We can iterate over the "elements" of a string that are the characters. We can also print the whole string.

name = "Learn 🐍 Python"
for cr in name:
    print(cr)
print(name)

Output:

L
e
a
r
n
 
🐍
 
P
y
t
h
o
n
Learn 🐍 Python

list

We can iterate over the elements of a list. We can also print the whole list at once.


numbers = [101, 2, 3, 42]
for num in numbers:
    print(num)
print(numbers)

Output:

101
2
3
42
[101, 2, 3, 42]

tuple

We can iterate over the elements of a tuple. We can also print the whole tuple at once.


planets = ("Mars", "Earth", "Venus")
for planet in planets:
    print(planet)
print(planets)

Output:

Mars
Earth
Venus
('Mars', 'Earth', 'Venus')

filehandle

We can iterate over a filehandle. On each iteration we'll get one line from the file.

import sys

file = sys.argv[0]

with open(file) as fh:
    for row in fh:
        print(row, end="")

Running this program will print itself.

We can also print the filehandle: print(fh), but the output will be:

<_io.TextIOWrapper name='iterable_fh.py' mode='r' encoding='UTF-8'>

not the content of the file.

So the "thing" that the open function returned is iterable, but it is different from the previous 3 types.

range

The range function returns something, we can use the for-in loop to iterate over it and we'll get back the expected numbers.

However, if we print the value that we got back from the range function it looks strange. It is the range object.

Unlike in Python 2, here in Python 3 the range function does not return a list of numbers. It returns an object that allows us to iterate over the numbers, but it does not hold the numbers.


rng = range(3, 9, 2)
for num in rng:
    print(num)
print(rng)

Output:

3
5
7
range(3, 9, 2)

range is interesting. We are going to take a closer look in the next few pages.

range

So what does range really return?

Instead of returning the list of numbers (as it used to do in Python 2), now it returns a range object that provides "the opportunity to go over the specific series of numbers" without actually creating the list of numbers. Getting an object instead of the whole list has a number of advantages. One is space in the memory. In a later example we'll see how much memory is needed for the object returned by the range function and how much would it take to have the corresponding list of numbers in memory. For now let's see how we can use it. range can have 1, 2, or 3 parameters.

  • range(start, end, step) - Go from start (included) to end (not included) every step value.
  • range(start, end) - Where step defaults to 1.
  • range(end) - Where start defaults to 0, step defaults to 1.

range with 3 parameters

Go from start (included) to end (not included) every step value.

rng = range(3, 11, 2)

print(rng)
print()

for num in rng:
    print(num)

Output:

range(3, 11, 2)

3
5
7
9

range with 2 parameters

Go from start (included) to end (not included) every step=1.

rng = range(3, 7)

for num in rng:
    print(num)

Output:

3
4
5
6

range with 1 parameter

Go from start=0 (included) to end (not included) every step=1.

rng = range(3)

for num in rng:
    print(num)

Output:

0
1
2

range index and len

Similar to strings, lists, and tuple, but unlike filehandles, we can use the len function on the range object and we can also access a specific element using the square-bracket indexing without actually iterating over all the elements.

rng = range(10, 10000, 2)

print(rng[2])
print(len(rng))

Output:

14
4995

filehandle index and len

On a filehandle (returned by the open function) we cannot call the len function, nor can we access arbitrary element (row) using index in a square bracket []. We can only use it as an iterator.

len

import sys
file = sys.argv[0]

with open(file) as fh:
    print(len(fh))

Error:

Traceback (most recent call last):
  File ".../fh_len.py", line 5, in <module>
    print(len(fh))
          ~~~^^^^
TypeError: object of type '_io.TextIOWrapper' has no len()

index

import sys
file = sys.argv[0]

with open(file) as fh:
    print(fh[2])

Error:

Traceback (most recent call last):
  File ".../fh_index.py", line 5, in <module>
    print(fh[2])
          ~~^^^
TypeError: '_io.TextIOWrapper' object is not subscriptable

range with list

Using the list function we can tell the range object to generate the whole list immediately. Either using the variable that holds the range object, or wrapping the range() call in a list() call.

The sys module provides function called sys.getsizeof() that returns the size of a Python object in the memory. As you can see the size of the range object is 48 bytes while the size of the 3-element list is 112 bytes. It seems the range object uses less memory than even such a short lists. On the next page we'll see a more detailed analyzis.

import sys

rng = range(3, 9, 2)
numbers = list(rng)
print(rng)        # range(3, 9, 2)
print(numbers)    # [3, 5, 7]

others = list(range(2, 11, 3))
print(others)   # [2, 5, 8]

print(sys.getsizeof(rng))     # 48
print(sys.getsizeof(numbers)) # 112
  • range
  • list
  • getsizeof

range vs. list size using getsizeof

Showing that the range object remains the same size regardless of the size of the range, but if we convert it into a list then its memory footprint is proportional to its size. To the number of elements in it.

In this example we have a loop iterating over range(21), but that's only for the convenience, the interesting part is inside the loop. On every iteration we call range() with the current number, then we convert the resulting object into a list of numbers. Finally we print out the current number and the size of both the object returned by range() and the list generated from the object. As you can see the memory usage of the range object remains the same 48 bytes, while the memory usage of the list grows as the list gets longer.

The code we use to demonstrate the memory usage of a range object vs. the list of numbers generated using it.

import sys

for ix in range(21):
    rng = range(ix)
    numbers = list(rng)
    print("{:>3} {:>3} {:>4}".format(
        ix,
        sys.getsizeof(rng),
        sys.getsizeof(numbers)))

Output:

  0  48   64
  1  48   96
  2  48  104
  3  48  112
  4  48  120
  5  48  128
  6  48  136
  7  48  144
  8  48  160
  9  48  192
 10  48  200
 11  48  208
 12  48  216
 13  48  224
 14  48  232
 15  48  240
 16  48  256
 17  48  264
 18  48  272
 19  48  280
 20  48  288

for loop with transformation

There are many cases when we have a list of some values and we need to apply some transformation to each value. At the end we would like to have the list of the resulting values.

A very simple such transformation would be to double each value. Other, more interesting examples might be reversing each string, computing some more complex function on each number, etc.)

In this example we just double the values and use append to add each value to the list containing the results.

def double(n):
    return 2 * n

numbers = [7, 2, 4, 1]

double_numbers = []
for num in numbers:
    double_numbers.append( double(num) )
print(double_numbers)

Output:

[14, 4, 8, 2]

There are better ways to do this.

map

  • map(function, iterable, ...)

The map function of Python applies a function to every item in an iterable and returns an iterator that can be used to iterate over the results. Wow, how many times I repeated the word iter...something. Instead of trying to untangle that sentence, let's look at the following example:

We have a list of numbers in the brilliantly named variable numbers with 7, 2, 4, 1 as the content. We would like to create a list of all the doubles (so that would be 14, 4, 8, 2 in this case) and then iterate over them printing them on the screen. Sure, you probably have some more complex operation to do on the numbers than to simply double them, but in this example I did not want to complicate that part. Suffice to say that you have some computation to do on every element of the original list.

So you encapsulate your computation in a regular Python function (in our case the function is called double). Then you call map and pass to it two parameters. The first parameter is the double function itself, the second parameter is the list of the values you would like to work on. map will go over all the values in the numbers list, call the double function with each number and return you somthing that will allow you to iterate over the results. Something like this:

double_numbers = [ double(7), double(2), double(4), double(1)]

Except, that the above is not true.

When Python executes the double_numbers = map(double, numbers) line, no computation happens and no resulting list is created. Python only prepars "the possibility to do the computations". In the upcoming examples we'll see what does this sentence really mean, for now let's see what do we have in this example: double_numbers contains a map object, but when you iterate over it using the for num in double_numbers expression you get the expected values.

def double(n):
    return 2 * n

numbers = [7, 2, 4, 1]

double_iterable = map(double, numbers)
print(double_iterable)
for num in double_iterable:
    print(num)

Output:

<map object at 0x76a59d19e560>
14
4
8
2

map with list

Just as with the range object, here with the map object too you can use the list function to convert all the values at once.

However, there are two advantages of keeping it as a map object and not using list to flatten it into a list.

  • Memory usage.
  • Delayed (lazy) execution of code.
def double(num):
    return 2 * num

numbers = [7, 2, 4, 1]

double_numbers = list(map(double, numbers))
print(double_numbers)

Output:

[14, 4, 8, 2]

size of map

One advantage of keeping the result of map as it is is the memory footprint of the map object vs. the generated list.

The map function returns a map object that takes up a fixed amount of memory, regardless of the size of the original list.

If we flatten it using the list function, the size of the generated list will be similar to the size of the original list.

import sys

def double(num):
    return 2 * num

numbers = [7, 2, 4, 1]

double_iterable = map(double, numbers)
double_numbers = list(map(double, numbers))

print(sys.getsizeof(double_iterable))
print(sys.getsizeof(double_numbers))

Output:

48
120

map delaying function call

Not only the size that we already saw with the range case, but also the processing time saved by not calculating the results till you actually need it.

Let's see how it works, before getting in more explanation.

def double(n):
    print(f"double {n}")
    return 2 * n

numbers = [1, 4, 2, -1]

double_numbers = map(double, numbers)
print(double_numbers)

for num in double_numbers:
    print(num)

Output:

<map object at 0x7f90df760f98>
double 1
2
double 4
8
double 2
4
double -1
-2

Imagine a case where you apply several expensive (time consuming) transformations to some original list and then you iterate over the end-results looking for the first value that matches some condition. What if you find the value you were looking for after only a few iteration. Then making all that expensive calculations to the whole list was a waste of time.

This lazy evaluation can help you save both memory and time and you always have the option to force the immediate calculation by calling the list function.

In this example we have added a call to print in the double function in order to see when is it really executed. You can see that the first output comes from the print statement that was after the map call. Then on each iteration we see the output from inside the "double" function and then the result from the loop. In a nutshell Python does not execute the "double" function at the point where we called map. It only executes it when we iterate over the resulting object.

map on many values

Now imagine you have a very long list. I know this is not such a long list, but I trust you can imagin a long list of numbers. We would like to run some function on each element and then iterate over the results, but what if at one point in the iteration we decide to break out of the loop?

import sys

def double(n):
    print(f"double {n}")
    return 2 * n

numbers = [1, 4, 2, -1, 23, 12, 5, 6, 34, 143123, 98, 213]

double_numbers = map(double, numbers)
print(double_numbers)
for num in double_numbers:
    print(num)
    if num > 42:
        break

print()
print(sys.getsizeof(numbers))
print(sys.getsizeof(double_numbers))
<map object at 0x7fe5c5270d68>
double 1
2
double 4
8
double 2
4
double -1
-2
double 23
46

160
56

You can see that it did not need to waste time calculating the doubles of all the values, as it was calculating on-demand. You can also see that the object returned from map takes up only 56 bytes. Regardless of the size of the original array.

double with lambda

There are many other cases besides map where we need to pass a function as a parameter to some other function. Many cases the function we pass is some almost trivial function with a single operation in it. In those cases creating a named function like the "double" function in the previous examples is an overkill.

In this example we also used the list function to force the full evaluation of the map object to make it easier to show the results. Normally you probably would not use the list function here.

numbers = [1, 2, 3, 4]

double_numbers = list( map( lambda n: n * 2, numbers) )
print(double_numbers)

Output:

[2, 4, 6, 8]

What is lambda in Python?

Lambda creates simple anonymous function. It is simple because it can only have one statement in its body. It is anonymous because usually it does not have a name.

The usual use is as we saw earlier when we passed it as a parameter to the map function. However, in the next example we show that you can assign the lambda-function to a name and then you could used that name just as any other function you would define using def.

def dbl(n):
    return 2*n
print(dbl(3))

double = lambda n: 2*n
print(double(3))

Output:

6
6

lambda returning tuple

A lambda function can return complex data structures as well. e.g. a tuple.

dbl = lambda n: (n, 2*n)

ret = dbl(12)

print(ret)

Output:

(12, 24)

map returning tuples

numbers = [1, 2, 3, 4]

pairs = map(lambda n: (n, 2*n), numbers)
print(pairs)

for pair in pairs:
    print(pair)

Output:

<map object at 0x7fcd264a15f8>
(1, 2)
(2, 4)
(3, 6)
(4, 8)

lambda with two parameters

A lambda-function can have more than one parameters:

add = lambda x,y: x+y
print(add(2, 3))

Output:

5

map for more than one iterable

Lets "add" together two lists of numbers. Using + will just join the two lists together, but we can use the "map" function to add the values pair-wise.

v1 = [1, 3, 5, 9]
v2 = [2, 6, 4, 8]

v3 = v1 + v2
print(v3)

sums = map(lambda x,y: x+y, v1, v2)
print(sums)
print(list(sums))

Output:

[1, 3, 5, 9, 2, 6, 4, 8]
<map object at 0x7fcbecc8c668>
[3, 9, 9, 17]

map on uneven lists

In Python 3 the iterator stops when the shortest iterable is exhausted.

In Python 2 it used to extend the shorter lists by None values.

v1 = [1, 3, 5, 9]
v2 = [2, 6, 4, 8, 10]

sums = map(lambda x,y: x+y, v1, v2)
print(sums)

print(list(sums))

Output:

<map object at 0x7ff9469a8da0>
[3, 9, 9, 17]

replace None (for Python 2)

In Python 2 map used to extend the shorter lists by None values. So to avoid exceptions, we had some exra code replacing the None values by 0, using the ternary operator.

v1 = [1, 3, 5, 9]
v2 = [2, 6, 4, 8, 10]

print(map(lambda x,y: (0 if x is None else x) + (0 if y is None else y), v1, v2))
# [3, 9, 9, 17, 10]

Output:

map on uneven lists - fixed (for Python 2)

A nicer fix was this:

v1 = [1, 3, 5, 9]
v2 = [2, 6, 4, 8, 10]

print(map(lambda x,y: (x or 0) + (y or 0), v1, v2))
# [3, 9, 9, 17, 10]

map mixed iterators

map works on any iterable, so we might end up passing one list and one string to it.

v1 = ['foo', 'bar', 'baz']
v2 = 'abc'

result = map(lambda x,y: x+y, v1, v2)
print(result)
print( list(result) )

Output:

<map object at 0x7fc5e9ff4e80>
['fooa', 'barb', 'bazc']

map fetch value from dictionary

people = [
    {
        'name': 'Foo',
        'phone': '123',
    },
    {
        'name': 'Bar',
        'phone': '456',
    },
    {
        'name': 'SnowWhite',
        'phone': '7-dwarfs',
    }
]

names = map(lambda d: d['name'], people)

print(names)
print(list(names))

Output:

<map object at 0x7f5afffaeb00>
['Foo', 'Bar', 'SnowWhite']

Exercise: string to length

  • Given a list of strings, create an iterable that will provide the length of each string.
  • If the input is ['moon', venus', 'jupyter'] then the resulting iterable should return 4, 5, and 7.

Exercise: row to length

  • Given a file, create an iterable that will provide the length of each row.
  • Can you do it without reading the file?

Exercise: compare rows

  • Create an iterable that given two files will return true for each line where the first space in the first file is earlier than the first space in the second file. So

  • given: "ab cd" vs "abc d" the value is true

  • given: "ab cd" vs "ab cd" the value is false

  • given: "ab cd" vs "a bcd" the value is false

Solution: string to length

animals = ['chicken', 'cow', 'snail', 'elephant', 'pig', 'zebra', 'gnu', 'praying mantiss', 'snake']

length = map(len, animals)
print(length)
#print(list(length))
for ln in length:
    print(ln)

Solution: row to length

import sys
if len(sys.argv) != 2:
    exit(f"Usage: {sys.argv[0]} FILENAME")
filename = sys.argv[1]

with open(filename) as fh:
    length = map(len, fh)
    print(length)
    for ln in length:
        print(ln)
        # if ln > 10:
        #     break

Solution: compare rows

import sys
if len(sys.argv) != 3:
    exit(f"Usage: {sys.argv[0]} FILE FILE")

file_a, file_b = sys.argv[1:]

def compare(row_a, row_b):
    a = row_a.find(' ')
    b = row_b.find(' ')
    return a < b

with open(file_a) as fh_a, open(file_b) as fh_b:
    results = map(compare, fh_a, fh_b)
    print(results)

    for res in results:
        print(res)
<map object at 0x7f0858d3f8d0>
56
[False, True, False, True, True]
128

filter in a loop

numbers = [1, 3, 27, 10, 38]
def big(n):
    return n > 10

big_numbers = []
for num in numbers:
    if big(num):
        big_numbers.append(num)
print(big_numbers)

Output:

[27, 38]

filter

  • filter(function, iterable)

filter will return an iterable object that will return all the items of the original iterable that evaluate the function to True. This can have only one iterable!

numbers = [1, 3, 27, 10, 38]
def big(n):
    return n > 10

big_numbers = filter(big, numbers)
print(big_numbers)
#print(list(big_numbers))
for num in big_numbers:
    print(num)

Output:

<filter object at 0x7f4bc37355c0>
[27, 38]

filter with lambda

filter grep

numbers = [1, 3, 27, 10, 38]

big_numbers = filter(lambda n: n > 10, numbers)
print(big_numbers)
print(list(big_numbers))

Output:

<filter object at 0x7faed0fe57b8>
[27, 38]

filter - map example

numbers = [1, 7, 19, 5, 57,  23, 8]

def big(x):
    print(f"filtering {x}")
    return x > 10

def double(y):
    print(f"double {y}")
    return 2*y

big_numbers = filter(big, numbers)
print(big_numbers)

doubles = map(double,  big_numbers)
print(doubles)

for num in doubles:
    print(num)

Output:

<filter object at 0x7ffad9f82f28>
<map object at 0x7ffad9f829e8>
filtering 1
filtering 7
filtering 19
double 19
38
filtering 5
filtering 57
double 57
114
filtering 23
double 23
46
filtering 8

filter - map in one expression

numbers = [1, 7, 19, 5, 57,  23, 8]

def big(x):
    print(f"filtering {x}")
    return x > 10

def double(y):
    print(f"double {y}")
    return 2*y


for num in map(double,  filter(big, numbers)):
    print(num)

Output:

filtering 1
filtering 7
filtering 19
double 19
38
filtering 5
filtering 57
double 57
114
filtering 23
double 23
46
filtering 8

filter a dictionary using dict comprehension

We have a dictionary, we would like to create another dictionary based on this one that contains only key-value pairs that meet a certain condition.

def main():
    grades = {
        "math":  100,
        "biology": 53,
        "chemistry": 62,
        "art": 78,
        "history": 20,
    }
    print(grades)

    good_grades = {key: value for (key, value) in grades.items() if value >= 60}
    print(good_grades)

    bad_grades = {key: grades[key] for key in grades if grades[key] < 60}
    print(bad_grades)


main()

Output:

{'math': 100, 'biology': 53, 'chemistry': 62, 'art': 78, 'history': 20}
{'math': 100, 'chemistry': 62, 'art': 78}
{'biology': 53, 'history': 20}

Get indices of values

filter can help us get a sublist of values from an iterable, eg. from a list that match some condition. In this example we see how to get all the names that are exactly 3 characters long.

What if, however if instead of the values themselves, you would like to know their location? The indexes of the places where these value can be found. In that case, you would run the filter on the indexes from 0 till the last valid index of the list. You can do that using the range function.

Finally there is another example that shows how to get the indexes of all the names that have an "e" in them. Just to show you that we can use any arbitray condition there.

names = ["Helen", "Ann", "Mary", "Harry", "Joe", "Peter"]
names3 = filter(lambda w: len(w) == 3, names)
print( list(names3) )

loc3 = filter(lambda i: len(names[i]) == 3, range(len(names)))
print( list(loc3) )


has_e = filter(lambda i: "e" in names[i], range(len(names)))

print( list(has_e) )

Output:

['Ann', 'Joe']
[1, 4]
[0, 4, 5]

reduce

In Python 2 it was still part of the language.

reduce(function, iterable[, initializer])

from functools import reduce

numbers = [1, 2, 3, 4]

print(reduce(lambda x,y: x+y, numbers))    # 10  = ((1+2)+3)+4
print(reduce(lambda x,y: x*y, numbers))    # 24  = ((1*2)*3)*4
print(reduce(lambda x,y: x/y, [8, 4, 2]))  # 1.0 = (8/4)/2

print(reduce(lambda x,y: x+y, [2]))        # 2
print(reduce(lambda x,y: x*y, [2]))        # 2

Output:

10
24
1.0
2
2

The initializer is used as the 0th element returned by the iterable. It is mostly interesting in case the iterable is empty.

reduce empty list

from functools import reduce

numbers = []

print(reduce(lambda x,y: x+y, numbers))

# TypeError: reduce() of empty sequence with no initial value

Output:

reduce with default

from functools import reduce

print(reduce(lambda x,y: x+y, [], 0))      # 0
print(reduce(lambda x,y: x+y, [1, 2], 0))  # 3
print(reduce(lambda x,y: x+y, [3, 4], 2))  # 9

print( reduce(lambda x,y: x*y, [1, 2], 0) )  # 0
print( reduce(lambda x,y: x*y, [2, 3], 1) )  # 6
print( reduce(lambda x,y: x*y, [], 0) )      # 0
print( reduce(lambda x,y: x*y, [], 1) )      # 1

Output:

0
18
0
3
0
6
0

reduce list of dictionaries

  • Combining map and filter
  • The reduce-only solution requires the default value
from functools import reduce

grades = [
    {
        'subject': 'math',
        'grade': 97,
    },
    {
        'subject': 'chemistry',
        'grade': 60,
    },
    {
        'subject': 'grammar',
        'grade': 75,
    }
]

print(reduce(lambda x,y: x+y, map(lambda s: s['grade'], grades))) # 232

print(reduce(lambda acc,y: acc+y['grade'], grades, 0)) #232

Output:

zip

zip can iterate over a number of lists (or iterables in general) in parallel. The iteration will stop when the first iterator stops.

fname = ['Graham',          'Eric',            'Terry',
         'Terry',           'John',            'Michael']
lname = ['Chapman',         'Idle',            'Gilliam',
         'Jones',           'Cleese',          'Palin']
born  = ['8 January 1941',  '29 March 1943',   '22 November 1940',
         '1 February 1942', '27 October 1939', '5 May 1943']

for f_name, l_name, b_date in zip(fname, lname, born):
    print(f"{f_name:7} {l_name:7} was born {b_date}")

Output:

Graham  Chapman was born 8 January 1941
Eric    Idle    was born 29 March 1943
Terry   Gilliam was born 22 November 1940
Terry   Jones   was born 1 February 1942
John    Cleese  was born 27 October 1939
Michael Palin   was born 5 May 1943

Monty Python

Combining two lists using zip

names = ['Jan', 'Feb', 'Mar', 'Apr', 'May']
days =  [31, 28, 31, 30]

zipped = zip(names, days)
print(zipped)

pairs = list(zipped)
print(pairs)

Output:

<zip object at 0x7fc1b4464d40>
[('Jan', 31), ('Feb', 28), ('Mar', 31), ('Apr', 30)]

Creating dictionary from two lists using zip

names = ['Jan', 'Feb', 'Mar', 'Apr']
days =  [31, 28, 31, 30]

zipped = zip(names, days)
print(zipped)
print(list(zipped))

zipped = zip(names, days)
month = dict(zipped)
print(month)

Output:

{'Jan': 31, 'Feb': 28, 'Mar': 31, 'Apr': 30}

all, any

all any

  • all(iterable) - returns True if all the elements of iterable return True
  • any(iterable) - returns True if any of the elements in iterable return True
a = [True, True]
b = [True, False]
c = [False, False]

print(all(a))   # True
print(all(b))   # False
print(all(c))   # False
print()
print(any(a))   # True
print(any(b))   # True
print(any(c))   # False

Output:

Compare elements of list with scalar

print(2 > 1) # True
print(0 > 1) # False
print()

numbers = [2, 4]
# Comparing different types does not make sense, but nevertheless Python 2 would still do it.
# Python 3 raises exception:
# TypeError: '>' not supported between instances of 'list' and 'int'
# print(numbers > 1) # True
# print(numbers > 7) # True
# print()

# compare each element with the scalar and then check if 'all' were True
print(all(map(lambda x: x > 1, numbers)))  # True
print(all(map(lambda x: x > 2, numbers)))  # False

Output:

List comprehension - double

[]

We take the original example where we had a function called double, and this time we write a different expression to run the function on every element of an iterable.

def double(n):
    return 2*n

numbers = [1, 2, 3, 4]
name = "FooBar"

double_numbers = [double(n) for n in numbers]
print(double_numbers) # [2, 4, 6, 8]


double_chars = [double(n) for n in name]
print(double_chars)   # ['FF', 'oo', 'oo', 'BB', 'aa', 'rr']

Output:

List comprehension - simple expression

import sys

numbers  = [0, 1, 2, 3]

sqrs = map(lambda n: n*n, numbers)
print(sqrs)         # <map object at 0x7fdcab2f5940>
print(list(sqrs))   # [0, 1, 4, 9]
print(sys.getsizeof(sqrs))
print()

squares = [n*n for n in numbers]
print(squares)   # [0, 1, 4, 9]
print(sys.getsizeof(squares))

Output:

<map object at 0x7fa9cf2eb9e8>
[0, 1, 4, 9]
56

[0, 1, 4, 9]
96

List generator

()

Going over the values of the generator will empty the generator.

import sys

numbers  = [0, 1, 2, 3, 4, 5, 6]

gn = (n*n for n in numbers)
print(gn)
print(sys.getsizeof(gn))
print()

for num in gn:
    print(num)
print()

gn = (n*n for n in numbers)
squares = list(gn)
print(sys.getsizeof(squares))
print(squares)

print(list(gn)) # the generator was already exhausted

Output:

<generator object <genexpr> at 0x7f8c0bda2930>
120

0
1
4
9
16
25
36

160
[0, 1, 4, 9, 16, 25, 36]
[]

List comprehension

mapcar

text = ['aaaa', 'bb', 'ccc ccc']

length_1 = map(lambda x: len(x), text)
print(length_1)       # <map object at 0x7f60ceb90f98>
print(list(length_1)) # [4, 2, 7]


length_2 = map(len, text)
print(length_2)       # <map object at 0x7f60ceb90fd0>
print(list(length_2)) # [4, 2, 7]


length_3 = [ len(s)  for s in text ]
print(length_3) # [4, 2, 7]

Output:

In LISP this would be a mapcar.

Dict comprehension

people = {
    'Foo':       123,
    'Bar':       456,
    'SnowWhite': 7,
}

doubles = { k:v*2 for (k, v) in people.items() }
print(doubles)  # {'Foo': 246, 'Bar': 912, 'SnowWhite': 14}

Output:

Lookup table with lambda

lambda

import sys

table = {
    "cat"  : lambda : print("miau"),
    "dog"  : lambda : print("hauhau"),
    "duck" : lambda : print("hap hap"),
}


def main():
    if len(sys.argv) != 2:
        exit(f"Usage: {sys.argv[0]} NAME")

    animal = sys.argv[1]
    if animal in table:
        table[animal]()

main()

Output:

Read lines without newlines

readlines map lambda

import sys

if len(sys.argv) != 2:
    exit(f"Usage: {sys.argv[0]}")

filename = sys.argv[1]

with open(filename) as fh:
    rows = map(lambda s: s.rstrip("\n"), fh.readlines())

print("SIZE:", sys.getsizeof(rows))

for row in rows:
    print(row)


with open(filename) as fh:
    rows = map(lambda s: s.rstrip("\n"), fh)
    print("SIZE:", sys.getsizeof(rows))

    for row in rows:
        print(row)



#with open(filename) as fh:
#    #for row in fh:
#    #    ...
#    #rows = map(lambda s: s.rstrip("\n"), fh.readlines())
#    rows = map(lambda s: s.rstrip("\n"), fh)
#
#    for row in rows:
#        print(row)

Output:

Read key-value pairs

  • readlines
  • map
  • lambda
  • split
  • dict
name=Foo Bar
email=foo@bar.com
address=Foo street 42

Output:

import sys

if len(sys.argv) != 2:
    exit(f"Usage: {sys.argv[0]}")

filename = sys.argv[1]

with open(filename) as fh:
    pairs = dict(map(lambda x: x.split('='), map(lambda s: s.rstrip("\n"), fh.readlines())))

print(pairs)
{'name': 'Foo Bar', 'email': 'foo@bar.com', 'address': 'Foo street 42'}

Create index-to-value mapping in a dictionary based on a list of values

  • zip
  • dict
planned_order = ('b', 'c', 'd', 'a')
plan = dict(zip(range(len(planned_order)), planned_order))
print(plan)

Output:

{0: 'b', 1: 'c', 2: 'd', 3: 'a'}

Exercise: min, max, factorial

  • Implement an expression to calculate "min", and another expression to calculate "max" of lists.
  • Implement an expression that will calculate factorial. f(n) should return the value of n! (n! = n * (n-1) * (n-2) * ... * 1)
  • Implement an expression that given 2 lists will return a new list in which each element is the max() for each pair from the input lists. E.g. given [1, 3, 6] and [2, 4, 5] the result is [2, 4, 6]
  • Use reduce, map, lambda

Exercise: Prime numbers

Calculate and print the prime numbers between 2 and N. Use filter.

Exercise: Many validator functions

Given several validator functions (that get a parameter and return True or False), and given a list of values, return a sublist of values that pass all the validation checks. See the sekeleton:

def is_big(x):
    return x > 100

def is_even(x):
    return not x % 2

numbers = [90, 102, 101, 104]

cond = [is_big, is_even]

# z = ...
print(z) # [102, 104]

Exercise: Calculator using lookup table

Write a script that will accept a math expression such as python calc.py 2 + 3 and will print the result. Use lookup tables select the implementation of the actual computation. (supporting +, - , *, /) is enought

Exercise: parse file

In the following file we have lines:

SOURCE/FILENAME.json,TARGET

read in the file and create

  • a single dictionary where the SOURCE/FILENAME.json is the key and the TARGET is the value.
  • list of dictionaries in which the keys are 'source', 'filename', and 'target' and the values are from the respective columns (SOURCE, FILENAME.json, and TARGET)

You can solve this for-loop or with map and list-comprehensions. Do it in both ways.

{% embed include file="src/examples/functional/books.txt)

Solution: min, max, factorial

from functools import reduce

numbers = [2, 1, 4, 3]

# min
print(reduce(lambda x,y: x if x < y else y, numbers))  # 1
# max
print(reduce(lambda x,y: x if x > y else y, numbers))  # 4

# factorial
n = 4
print(reduce(lambda x,y: x*y, range(1, n+1), 1))   # 24
# The 1 at the end is the initializor of reduce to provide
# correct results for n = 0.

a = [1, 3, 6]
b = [2, 4, 5]
c = map(lambda x,y: x if x > y else y, a, b)
print(list(c))  # [2, 4, 6]

Solution: Prime numbers

Calculating the prime numbers

n = 50

nums = range(2, n)
for i in range(2, 1+int(n ** 0.5)):
    nums = filter(lambda x: x == i or x % i, nums)

print(nums)

Solution: Many validator functions

def is_big(x):
    return x > 100

def is_even(x):
    return not x % 2

numbers = [90, 102, 101, 104]

cond = [is_big, is_even]

z = filter( lambda n: all([f(n) for f in cond]),   numbers)
print(z) # [102, 104]

Solution: Calculator using lookup table

import sys

table = {
    "+" : lambda x, y: x+y,
    "-" : lambda x, y: x-y,
    "*" : lambda x, y: x*y,
    "/" : lambda x, y: x/y,
}


def main():
    if len(sys.argv) != 4:
        exit(f"Usage: {sys.argv[0]} NUMBER OP NUMBER")
    action = table[sys.argv[2]]
    print( action(int(sys.argv[1]), int(sys.argv[3])) )

main()


map with condtion

  • map

The conversion function can do anything. It can have a condition inside.

numbers = [1, 2, 3, 4]

def cond_double(n):
   if n % 2:
      return 2*n
   else:
      return n

cd = map(cond_double, numbers)
print(cd)                        # [2, 2, 6, 4]

Output:

map with lambda

numbers = [1, 2, 3, 4]

def dbl(x):
   return 2*x
d1 = map(dbl, numbers)
print(d1)  # [2, 4, 6, 8]

double = lambda x: 2*x
d2 = map(double, numbers)
print(d2)  # [2, 4, 6, 8]

d3 = map(lambda n: 2*n, numbers)
print(d3)   # [2, 4, 6, 8]

Output:

map with lambda with condition

numbers = [1, 2, 3, 4]

a = map(lambda n: 2*n if n % 2 else n, numbers)
print(a)   # [2, 2, 6, 4]

Output:

List comprehension - complex

numbers = [1, 3, 2, 4]

t = filter(lambda n: n > 2, numbers)
print(t)  # [3, 4]

n1 = map(lambda n: n*n, t)
print(n1) # [9, 16]


n2 = map(lambda n: n*n, filter(lambda n: n > 2, numbers))
print(n2)  # [9, 16]



n3 = [ n*n for n in numbers if n > 2 ]
print(n3) # [9, 16]

Output:

Change list before iteration over map object

A small warning. Because map only connects to our iterator, but does not iterate over it till it is called, if we change the content of the underlying iterator, that will be reflected in the results of the iteration over the map object.

def double(n):
    return 2 * n

numbers = [1, 4, 2]

double_numbers = map(double, numbers)

numbers.append(5)
for num in double_numbers:
    print(num)
2
8
4
10

Replace list before iteration over map object

2
8
4
def double(n):
    return 2 * n

numbers = [1, 4, 2]

double_numbers = map(double, numbers)

numbers = [3, 7]
for num in double_numbers:
    print(num)

Iterators - with and without Itertools

Advantages of iterators and generators

  • Lazy evaluation
  • Save processing (or at least delay the use)
  • Save memory
  • Handle an infinite series of information
  • Turn complex operations into a simple matter of for loop.

The Fibonacci research institute

  • We have a bunch of mathematicians who research the Fibonacci series.

  • We have a bunch of people who research a series of DNA sequences.

  • ???

Fibonacci plain

  • We don't call this as this has an infinite loop
def fibonacci():
    a, b = 0, 1
    while True:
        a, b = b, a+b

# fibonacci()

Fibonacci copy-paste

def fibonacci():
    a, b = 0, 1
    while True:
        a, b = b, a+b

        print(a)
        if a % 17 == 0:
            print('found')
            break

        if a > 200:
            print('not found')
            break

fibonacci()

Iterators Glossary

What are iterators and iterables?

  • All of them are iterables
  • A filehandle and the map object are also iterators. (Side note: You should always open files using the with statement and not like this.)
  • iter() would return the iterator from an iterable. We don't need this.
from collections.abc import Iterator, Iterable

a_string = "Hello World"
a_list   = ["Tiger", "Mouse"]
a_tuple  = ("Blue", "Red")
a_range  = range(10)
a_fh     = open(__file__)
a_map    = map(lambda x: x*2, a_list)

for thing in [a_string, a_list, a_tuple, a_range, a_map, a_fh]:
    print(thing.__class__.__name__)
    print(issubclass(thing.__class__, Iterator))
    print(issubclass(thing.__class__, Iterable))
    zorg = iter(thing)
    print(zorg.__class__.__name__)
    print(issubclass(zorg.__class__, Iterator))

    print()

a_fh.close()

Output:

str
False
True
str_iterator
True

list
False
True
list_iterator
True

tuple
False
True
tuple_iterator
True

range
False
True
range_iterator
True

TextIOWrapper
True
True
TextIOWrapper
True

A file-handle is an iterator

  • collections
  • Iterator
  • Iterable
  • io
  • TextIOWrapper
  • issubclass

This slightly a repetition of the previous statement, that filehandles are iterators.

from collections.abc import Iterator, Iterable
from io import TextIOWrapper

with open(__file__) as fh:
    print(fh.__class__.__name__)
    print(issubclass(fh.__class__, TextIOWrapper))
    print(issubclass(fh.__class__, Iterator))
    print(issubclass(fh.__class__, Iterable))

    for line in fh:
        pass
        #print(line, end="")

Output:

TextIOWrapper
True
True
True

range is iterable but it is not an iterator

Just as a string or a list, the range function in Python is also an "iterable" but it is not an "iterator". In many aspects it behaves as an iterator. Specifically it allows us to iterate over numbers. Range Is Not An Iterator

for n in range(2, 12, 3):
    print(n)
print()

for n in range(3):
    print(n)
print()

for n in range(2, 5):
    print(n)
print()

from collections.abc import Iterator, Iterable
rng = range(2, 5)
print(issubclass(rng.__class__, Iterator))
print(issubclass(rng.__class__, Iterable))

Output:

2
5
8
11

0
1
2

2
3
4

False
True

Range with floating point steps

import it

r = it.Range(1, 6, 2.2)
for n in r:
    print(n)

Output:

class Range():
    def __init__(self, start, end, step):
        self.current = start
        self.end = end
        self.step = step

    def __iter__(self):
        return self

    def __next__(self):
        if self.current >= self.end:
            raise StopIteration
        v = self.current
        self.current += self.step
        return v
1
3.2
5.4
import it

def test_range():
    r = it.Range(1, 6, 2.2)
    result = list(r)
    assert result == [1, 3.2, 5.4]

Iterator: a counter

  • StopIteration
  • next
  • iter
  • next

We can create a iterator using a class. We are required to implement the __iter__ method that returns the iterator object and the __next__ method that returns the next element in our iteration. We can indicated that the iteration was exhaused by raising a StopIteration exception.

The instance-object that is created from this class-object is the iterator, not the class-object itself!

  • __iter__
  • __next__ (in Python 2 this used to called next)
  • raise StopIteration
class Counter():
   def __init__(self):
       self.count = 0

   def __iter__(self):
       return self

   def __next__(self):
       self.count += 1
       if self.count > 3:
           raise StopIteration
       return self.count

Using iterator

The class returned an iterator, we could use a for loop to iterate over the element. We tried to run through the iterator again, but it did not print anything. It was exhausted.

from counter import Counter

cnt = Counter()
for c in cnt:
   print(c)

for c in cnt:
   print(c)

Output:

1
2
3

Iterator without temporary variable

from counter import Counter

for c in Counter():
   print(c)

Output:

1
2
3

The type of the iterator

How can we know it is an iterator? We check it.

from collections.abc import Iterator, Iterable
from counter import Counter

cnt = Counter()
print(cnt.__class__.__name__)
print(issubclass(cnt.__class__, Iterator))
print(issubclass(cnt.__class__, Iterable))

Output:

Counter
True
True

Using iterator with next

A feature of any iterator is that we could iterate over it using the next call.

from counter import Counter

cnt = Counter()

while True:
    try:
        a = next(cnt)
        print(a)
    except Exception as ex:
        print(ex.__class__.__name__)
        break

Output:

1
2
3
StopIteration

Mixing for and next

You can even use next inside a for loop, but then you will have to handle the StopIteration exception that migh happen during your call of next.

I am not really sure when would we want to use this.

from counter import Counter

cnt = Counter()

for i in cnt:
    print(f"i: {i}")
    try:
        n = next(cnt)
        print(f"n: {n}")
    except Exception as ex:
        print(ex.__class__.__name__)
        break

Output:

i: 1
n: 2
i: 3
StopIteration

Iterable which is not an iterator

from counter import Counter

class GetMyIterable():
    def __init__(self):
        pass
    def __iter__(self):
        return Counter()


thing = GetMyIterable()

from collections.abc import Iterator, Iterable
print(issubclass(thing.__class__, Iterator))
print(issubclass(thing.__class__, Iterable))

for i in thing:
    print(i)

Output:

False
True
1
2
3

Iterator returning multiple values

class SquareCounter():
   def __init__(self):
       self.count = 0

   def __iter__(self):
       return self

   def __next__(self):
       self.count += 1
       if self.count > 5:
           raise StopIteration
       return self.count, self.count ** 2

for cnt, sqr in SquareCounter():
   print(f"{cnt}  {sqr}")

Output:

1  1
2  4
3  9
4  16
5  25

Range-like iterator

class Range():
    def __init__(self, start, end):
        self.current = start
        self.end = end

    def __iter__(self):
        return self

    def __next__(self):
        if self.current >= self.end:
            raise StopIteration
        v = self.current
        self.current += 1
        return v
import it

r = it.Range(1, 4)
for n in r:
    print(n)

print('---')

for n in it.Range(2, 5):
    print(n)

Output:

1
2
3
---
2
3
4

Unbound or infinite iterator

So far each iterator had a beginning and an end. However we can also create infinte or unbounded iterators. The nice thing about them is that we can pass them around as we do with any other object and we can execute operations on them without burning our CPU.

Of course the user will have to be carefull not to try to flatten the iterator, not to try to get all the values from it, as that will only create an infinite loop or a never ending operation.

In this very simple example we count from 0 and we never stop.

When we use the Counter in the for loop we need to include a stop-condition, otherwise our loop will never end.

class Counter():
   def __init__(self):
       self.count = 0

   def __iter__(self):
       return self

   def __next__(self):
       self.count += 1
       return self.count

for c in Counter():
   print(c)
   if c > 10:
       break

Output:

1
2
3
4
5
6
7
8
9
10
11

Unbound iterator Fibonacci

Now we can get back to our original problem, the slightly more complex Fibonacci series. In this example we created an unbounded iterator that on every iteration will return the next element of the Fibonacci series.

class Fibonacci():
    def __init__(self):
        self.values = []

    def __iter__(self):
        return self

    def __next__(self):
        if len(self.values) == 0:
            self.values.append(1)
            return 1

        if len(self.values) == 1:
            self.values.append(1)
            return 1

        self.values.append(self.values[-1] + self.values[-2])
        self.values.pop(0)

        return self.values[-1]
from fibonacci import Fibonacci
for v in Fibonacci():
    print(v)
    if v > 10:
        break

Output:

1
1
2
3
5
8
13

Operations on Unbound iterator

from fibonacci import Fibonacci

fib = Fibonacci()

#odd = [x for x in fib if x % 2 == 1]
odd = filter(lambda x: x % 2 == 1, fib)

print("Let's see")

for v in odd:
    print(v)
    if v > 10:
        break

Output:

Let's see
1
1
3
5
13

itertools

itertools yield

itertools is a standard Python library that provides a number of interesting iterators. We are going to see a few examples here:

itertools - count

  • Unbound counter: Count from N to infinity.
import itertools

for c in itertools.count(start=19, step=1):
    print(c)
    if c > 23:
        break

# 19
# 20
# 21
# 22
# 23
# 24

itertools - cycle

import itertools

ix = 0
for c in itertools.cycle(['A', 'B', 'C']):
    print(c)
    ix += 1
    if ix >= 5:
        break

print('')

ix = 0
for c in itertools.cycle('DEF'):
    print(c)
    ix += 1
    if ix >= 5:
        break

Output:

A
B
C
A
B

D
E
F
D
E

Exercise: iterators - reimplement the range function

In one of the first slides of this chapter we saw a partial implementation of the range function. Change that code to have a full implementation, that can accept 1, 2, or 3 parameters.

Exercise: iterators - cycle

  • Reimplement the cycle functions of itertools using iterator class.

Exercise: iterators - alter

  • Implement the alter functions as an iterator that will return
1
-2
3
-4
5
-6
...
  • Optionally provide a start and end parameters
  • start defaults to 1
  • end defaults to unlimited

Exercise: iterators - limit Fibonacci

Change the Iterator version of the Fibonacci series so optionally you will be able to provide a parameter called "limit" to the constructor. If the limit is provided, the iterator should stop when the value passes the limit.

Exercise: iterators - Fibonacci less memory

Change the Iterator version of the Fibonacci series so it will NOT hold the previous values in memory.

Exercise: read char

Create an iterator that given a filename will return an object that on every iteration will return a single character. As an option let the user skip newlines, or maybe any pre-defined character.

Exercise: read section

  • Create an iterator that given the name of a file like the following, will return once section at a time.
  • It will return a list one each iteration and each element of the list will be a line from the current section.
  • Other ideas what should be returned on each iteration?
name     = Mercury
distance = 0.4
mass     = 0.055


name     = Venus
distance = 0.7
mass     = 0.815


name     = Earth
distance = 1
mass     = 1

name     = Mars
distance = 1.5
mass     = 0.107

Exercise: collect packets

  • You get a series of packets (e.g. lines in a file)
  • In each line you have several fields: id, seqid, maxseq, content
  • id is a unique identifier of a series of packets (lines)
  • seqid is the seuence id of a packet in a series. (an integer)
  • maxseq is the length of the sequence.
  • content is the actual content.

In each iteration return a message that is built up from all the packages in the given sequence.

12,1,5,First of Twelve
12,2,5,Second of Twelve
12,3,5,Third of Twelve
12,4,5,Fourth of Twelve
12,5,5,Fifth of Twelve

9,1,4,First of Nine
9,2,4,Second of Nine
9,3,4,Third of Nine
9,4,4,Fourth of Nine

11,1,3,First of Eleven
11,2,3,Second of Eleven
11,3,3,Third of Eleven

Output:

['First of Twelve', 'Second of Twelve', 'Third of Twelve', 'Fourth of Twelve', 'Fifth of Twelve']
['First of Nine', 'Second of Nine', 'Third of Nine', 'Fourth of Nine']
['First of Eleven', 'Second of Eleven', 'Third of Eleven']
12,1,5,First of Twelve
11,1,3,First of Eleven
9,1,4,First of Nine
12,2,5,Second of Twelve
9,2,4,Second of Nine
11,2,3,Second of Eleven
12,3,5,Third of Twelve
9,3,4,Third of Nine
12,4,5,Fourth of Twelve
12,5,5,Fifth of Twelve
9,4,4,Fourth of Nine
11,3,3,Third of Eleven
11,2,3,Second of Eleven
11,1,3,First of Eleven
9,1,4,First of Nine
12,1,5,First of Twelve
9,3,4,Third of Nine
9,2,4,Second of Nine
12,3,5,Third of Twelve
12,4,5,Fourth of Twelve
12,2,5,Second of Twelve

12,5,5,Fifth of Twelve
9,4,4,Fourth of Nine
11,3,3,Third of Eleven

Exercise: compare files

Compare two files line-by-line, and create a 3rd file listing the lines that are different.

One
Two
Three
Four
Five
One
Two
Tree
Four
Five

Expected output:

2,Three,Tree

Solution: iterators - limit Fibonacci

class Fibonacci:
    def __init__(self, limit=0):
        self.values = [] 
        self.limit = limit
    def __iter__(self):
        return self
    def next(self):
        if self.limit and len(self.values) >= self.limit:
            raise StopIteration 
        if len(self.values) == 0:
            self.values.append(1)
            return 1
        if len(self.values) == 1:
            self.values.append(1)
            return 1
        self.values.append(self.values[-1] + self.values[-2])
        return self.values[-1]
import fibonacci
f = fibonacci.Fibonacci(limit = 10)
print(f)
for v in f:
    print(v)

print('-----')
f = fibonacci.Fibonacci()
for v in f:
    print(v)
    if v > 30:
        break

Solution: iterators - Fibonacci less memory

class Fibonacci:
    def __init__(self, limit=0):
        self.values = ()
        self.limit = limit
    def __iter__(self):
        return self
    def next(self):
        if self.limit and len(self.values) and self.values[-1] >= self.limit:
            raise StopIteration 
        if len(self.values) == 0:
            self.values = (1,)
            return 1
        if len(self.values) == 1:
            self.values = (1, 1)
            return 1
        self.values = (self.values[-1], self.values[-1] + self.values[-2])
        return self.values[-1]
import fibonacci
f = fibonacci.Fibonacci(limit = 10)
print(f)
for v in f:
    print(v)

print('-----')
f = fibonacci.Fibonacci()
for v in f:
    print(v)
    if v > 30:
        break

Solution: read section

import re

class SectionReader():
    def __init__(self, filename):
        self.filename = filename
        self.fh       = open(filename)

    def __iter__(self):
        return self

    def __next__(self):
        self.section = []
        while True:
            line = self.fh.readline()
            if not line:
                if self.section:
                    return self.section
                else:
                    self.fh.close()
                    raise StopIteration
            if re.search(r'\A\s*\Z', line):
                if self.section:
                    return self.section
                else:
                    continue
            self.section.append(line)


filename = 'planets.txt'
for sec in SectionReader(filename):
    print(sec)

Solution: compare files

import sys

def main():
    if len(sys.argv) != 4:
        exit(f"Usage: {sys.argv[0]} IN_FILE IN_FILE OUT_FILE")
    infile_a, infile_b = sys.argv[1:3]
    outfile = sys.argv[3]

    with open(outfile, 'w') as out_fh, open(infile_a) as in_a, open(infile_b) as in_b:
        cnt = 0
        for lines in zip(in_a, in_b):
            #print(lines)
            lines = list(map(lambda s: s.rstrip('\n'), lines))
            #print(lines)
            if lines[0] != lines[1]:
                out_fh.write(f"{cnt},{lines[0]},{lines[1]}\n")
            cnt += 1

main()
python diff.py first.txt second.txt diff.txt

Solution: collect packets

The implementation

class Packets():
    def __init__(self, filename):
        self.filename = filename
        self.fh = open(filename)
        self.packets = {}
        self.max = {}

    def __iter__(self):
        return self

    def __next__(self):
        while True:
            line = self.fh.readline()
            #print(f"line: {line}")
            if line == '':
                raise StopIteration

            line = line.rstrip("\n")
            if line == '':
                continue

            pid, seqid, maxseq, content = line.split(",")
            pid = int(pid)
            seqid = int(seqid)
            maxseq = int(maxseq)
            if pid not in self.packets:
                self.packets[pid] = {}
                self.max[pid] = maxseq
            if seqid in self.packets[pid]:
                raise Exception("pid arrived twice")
            if maxseq != self.max[pid]:
                raise Exception("maxseq changed")
            self.packets[pid][seqid] = content
            if len(self.packets[pid].keys()) == self.max[pid]:
                content = list(map(lambda i: self.packets[pid][i+1], range(self.max[pid])))
                del(self.max[pid])
                del(self.packets[pid])
                return content


The use:

import sys
from packets import Packets

if len(sys.argv) < 2:
    exit(f"Usage: {sys.argv[0]} FILENAME")

for packet in Packets(sys.argv[1]):
    print(packet)

The test to verify it

import os
import json
import pytest

from packets import Packets

root = os.path.dirname(os.path.abspath(__file__))

with open(os.path.join(root, 'packets.json')) as fh:
    expected_results = json.load(fh)

@pytest.mark.parametrize('filename', ['packets.txt', 'packets1.txt', 'packets2.txt'])
def test_packetes(filename):
    filepath = os.path.join(root, filename)

    results = []
    for packet in Packets(filepath):
        results.append(packet)
    assert results == expected_results

Expected result:

[["First of Twelve", "Second of Twelve", "Third of Twelve", "Fourth of Twelve", "Fifth of Twelve"], ["First of Nine", "Second of Nine", "Third of Nine", "Fourth of Nine"], ["First of Eleven", "Second of Eleven", "Third of Eleven"]]

Generators and Generator Expressions

Generators Glossary

Iterators vs Generators

  • a generator is an iterator
  • an iterator is an iterable
from collections.abc import Iterator, Iterable
from types import GeneratorType

print( issubclass(GeneratorType, Iterator) )  # True
print( issubclass(Iterator, Iterable) )       # True

  • Genarators are a simpler way to create an iterable object than iterators, but iterators allow for more complex iterables.
  • To create an iterator we need a class with two methods: __iter__ and __next__, and a raise StopIteration.
  • To create a generator we only need a single function with `yield .

List comprehension and Generator Expression

However, before learning about yield let's see an even simpler way to create a generator. What we call a generator expression.

You are probably already familiar with "list comprehensions" where you have a for expression inside square brackets. That returns a list of values.

If you replace the square brackets with parentheses then you get a generator expression.

You can iterate over either of those. So what's the difference?

The generator expression delays the execution of the loop and the expression inside to the point where you actually need the value and it is only an object, not the full list.

So you also save memory.

a_list = [i*2 for i in range(3)]
print(a_list)
for x in a_list:
   print(x)
print()

a_generator = (i*2 for i in range(3))
print(a_generator)
for x in a_generator:
   print(x)

Output:

[0, 2, 4]
0
2
4

<generator object <genexpr> at 0x7f0af6f97a50>
0
2
4

List comprehension vs Generator Expression - less memory

Let's use a bigger range of numbers and create the corresponding list and generator. Then check the size of both of them. You can see the list is much bigger. That's becuse the list already contains all the elements, while the generator contains only the promise to give you all the elements.

As we could see in the previous example, this is not an empty promise, you can indeed iterate over the elements of a generator just as you can iterate over the elements of a list.

However, you cannot access an arbitrary element of a generator because the generator is not subscriptable.

import sys

lst = [n*2 for n in range(1000)] # List comprehension
gen = (n*2 for n in range(1000)) # Generator expression

print(sys.getsizeof(lst))
print(sys.getsizeof(gen))
print()

print(type(lst))
print(type(gen))
print()

print(lst[4])
print()

print(gen[4])

Output:

9016
112

<class 'list'>
<class 'generator'>

8

Traceback (most recent call last):
  File "generator_expression.py", line 17, in <module>
    print(gen[4])
TypeError: 'generator' object is not subscriptable

List Comprehension vs Generator Expressions

getsizeof

List comprehension vs Generator Expression - lazy evaluation

The second big difference between list comprehension and generator expressions is that the latter has lazy evaluation. In this example you can see that once we assign to list comprehension to a variable the sqr function is called on each element.

In the case of the generator expression, only when we iterate over the elements will Python call the sqr function. If we exit from the loop before we go over all the values than we saved time by not executing the expression on every element up-front. If the computation is complex and if our list is long, this can have a substantial impact.

def sqr(n):
    print(f"sqr {n}")
    return n ** 2

numbers = [1, 3, 7]

# list comprehension
n1 = [ sqr(n) for n in numbers ]
print("we have the list")
for i in n1:
    print(i)
print("-------")

# generator expression
n2 = ( sqr(n) for n in numbers )
print("we have the generator")
for i in n2:
    print(i)

Output:

sqr 1
sqr 3
sqr 7
we have the list
1
9
49
-------
we have the generator
sqr 1
1
sqr 3
9
sqr 7
49

Generator: function with yield - call next

  • next
  • yield
  • StopIteration

We can create a function that has multiple yield expressions inside. We call the function and what we get back is a generator. A generator is also an iterator so we can call the next function on it and it will give us the next yield value.

If we call it one too many times we get a StopIteration exception.

def number():
    yield 42
    yield 19
    yield 23

num = number()
print(type(num))
print(next(num))
print(next(num))
print(next(num))
print(next(num))

Output:

<class 'generator'>
42
19
23
Traceback (most recent call last):
  File "simple_generator_next.py", line 11, in <module>
    print(next(num))
StopIteration

Generators - call next

We can also use a for loop on the generator and then we don't need to worry about the exception.

def number():
    yield 42
    yield 19
    yield 23

num = number()
print(type(num))
for n in num:
    print(n)

Output:

<class 'generator'>
42
19
23

Generator with yield

We don't even need to use a temporary variable for it.

def number():
    yield 42
    yield 19
    yield 23

for n in number():
  print(n)

Output:

42
19
23

Generators - fixed counter

def counter():
    n = 1
    yield n

    n += 1
    yield n

    n += 1
    yield n

for c in counter():
    print(c)

Output:

1
2
3

Generators - counter

def counter():
    n = 1
    while True:
        yield n
        n += 1

for c in counter():
    print(c)
    if c >= 10:
        break

Output:

1
2
3
4
5
6
7
8
9
10

Generators - counter with parameter

def counter(n = 1):
    while True:
        yield n
        n += 1

for c in counter():
    print(c)
    if c >= 4:
        break
print()

for c in counter(8):
    print(c)
    if c >= 12:
        break

Output:

1
2
3
4

8
9
10
11
12

Generators - my_range

import sys

def my_range(limit = 1):
    n = 0
    while n < limit:
        yield n
        n += 1

for i in my_range(5):
    print(i)
print()

print(sum(my_range(10)))
print()

x = my_range(10000)
print(x)
print(sys.getsizeof(x))

Output:

0
1
2
3
4

45

<generator object my_range at 0x7f36f6089930>
120

Fibonacci - generator

def fibonacci():
    a, b = 0, 1
    while True:
        a, b = b, a+b
        yield a


for a in fibonacci():
    print(a)
    if a % 17 == 0:
        print('found')
        break

    if a > 20000:
        print('not found')
        break

Output:

1
1
2
3
5
8
13
21
34
found

When we enter the for loop the fibonacci() function is called and returns a generator object. Then next is called on this generator object which executes the content of the function up to the call of yield and return the value. Then ever time the for-loop calls next the function continues running till it reaches yield again or till it reaches the end of the function. (Which will never happen in this case.)

Infinite series

  • The Fibonacci was already infinite, let's see a few more.

Integers

from series import integers

for i in integers():
    print(i)
    if i >= 10:
        break

Output:

1
2
3
4
5
6
7
8
9
10

Integers + 3

from series import integers

n3  = (n+3 for n in integers())
# n3 = integers(3)
for i in n3:
    print(i)
    if i >= 10:
        break

Output:

4
5
6
7
8
9
10

Integers + Integers

from series import integers

def mysum(nums):
    print(nums)
    total = 0
    for n in nums:
        total += n
    return total

n3 = integers(3)
n7 = integers(7)
d = ( mysum(p) for p in zip(n3, n7) )

print("start")
for i in d:
    print(i)
    if i >= 20:
        break

Output:

start
(3, 7)
10
(4, 8)
12
(5, 9)
14
(6, 10)
16
(7, 11)
18
(8, 12)
20
  • itertools
  • zip

Filtered Fibonacci

from series import fibonacci

even = ( fib for fib in fibonacci() if fib % 2 == 0 )
for e in even:
    print(e)
    if e > 40:
        break

Output:

0
2
8
34
144

The series.py

This is the module behind the previous examples.

def integers(n = 1):
   while True:
       yield n
       n += 1

def fibonacci():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a+b


def gfibonacci(size = 2):
    """Generalized Fibonacci. """
    values = [0]
    while True:
        yield values[-1]
        if len(values) < size:
            values.append(1)
        else:
            values.append(sum(values))
            values = values[1:]

def pascal():
    values = [1]
    while True:
        yield values
        new = [1]
        for i in range(0, len(values)-1):
            new.append(values[i] + values[i+1])
        new.append(1)
        values = new

generator - unbound count (with yield)

def count(start=0, step=1):
    n = start
    while True:
        yield n
        n += step


for c in count(start=19, step=1):
    print(c)
    if c > 23:
        break

Output:

19
20
21
22
23
24

iterator - cycle

def cycle(values=[]):
    my_values = []
    for v in values:
        my_values.append(v)
        yield v
    while True:
        for v in my_values:
            yield v

i = 0
for c in cycle(['A', 'B', 'C']):
    print(c)
    i += 1
    if i >= 4:
        break

Output:

A
B
C
A

Exercise: Alternator

Create a generator for the following number series: 1, -2, 3, -4, 5, -6, ...

Exercise: Prime number generator

Create a generator that will return the prime numbers: 2, 3, 5, 7, 11, 13, 17, ...

Exercise: generator

Take the two generator examples (increment number and Fibonacci) and change them to provide infinite iterations. Then try to run them in a for loop. Just make sure you have some other condition to leave the for-loop.

Exercise: Binary file reader

Create a generator that given a filename and a number n will return the content of the file in chunks of n characters.

Exercise: File reader with records

In a file we have "records" of data. Each record starts with three bytes in which we have the length of the record. Then the content.

8 ABCDEFGH 5 XYZQR

Given this source file

First line
Second record
Third row of the records
Fourth
5
END

using this code

filename = "rows.txt"
records  = "records.txt"

with open(filename) as in_fh:
    with open(records, 'w') as out_fh:
        for line in in_fh:
            line = line.rstrip("\n")
            out_fh.write("{:>3}{}".format(len(line), line))

we can create this file:

 10First line 13Second record 24Third row of the records  6Fourth  15  3END

The exercise is to create an iterator/generator that can read such a file record-by-record.

yield outside of functions

  • It has no meaning there
print("one")
yield "ok";
print("two")

Output:

  File "examples/generators/yield_outside_any_function.py", line 2
    yield "ok";
    ^
SyntaxError: 'yield' outside function

Operations on infinite lists

In this example we can say that the fibonacci() function returns the infinite list of Fibonacci numbers. In normal circumstances only mathematicians can work with "infinite lists", programmers are constrained by memory and time. However generators in Python are lazy so they only pretend to be infinite lists. They are only the promise of having an infinite list. So you can do all kinds of interesting operations on them.

In this example we multiply each value by 2. (I know it is not the most sophisticated mathematical problem, but it will work for this example.)

The variable double_fibonacci holds the values of the Fibonacci series multiplied by 2. More precisely it holds the possibility to iterate over that infinite list.

So in reality we don't operate on the infinite lists, only on the "potential of the lists", but the former sounds cooler.

Try running it without the if and break statements and see what happens. (Ctrl-C will stop the program.)

def fibonacci():
    a, b = 0, 1
    while True:
        a, b = b, a+b
        yield a

double_fibonacci = (value * 2 for value in fibonacci())

for a in double_fibonacci:
    print(a)

    if a > 200:
        break

Output:

2
2
4
6
10
16
26
42
68
110
178
288

Infinite random number generator

This example was created to show that iterators (generators) don't necessarily return a fixed list of values.

import random

def miranda():
    while True:
        yield random.random()

total = 0
for val in miranda():
    print(val)
    total += val
    if total > 10:
        break

Output:

Infinite random set generator

This example was created to show that iterators (generators) don't necessarily return a fixed list of values.

import random

def choices(values, **kw):
    while True:
        yield random.choices(values, **kw)

count = 0
for val in choices(['a', 'b', 'c', 'd', 'e', 'f'], k=2):
    print(val)
    count += 1
    if count > 10:
        break

Output:

Simple function (before generators)

TODO: probably not that interesting

def number():
    return 42

print(number())    # 42
print(number())    # 42
print(number())    # 42
def number():
    return 42
    return 19
    return 23

print(number()) # 42
print(number()) # 42
print(number()) # 42

Filtered Fibonacci with ifilter

itertools ifilter

from series import fibonacci
from itertools import ifilter

even = ifilter(  lambda f: f % 2 == 0, fibonacci()  )
for e in even:
    print(e)
    if e > 200:
        break