Functional programming in Python
Programming Paradigms
There are a number of programming paradigms or "styles of code" that can be categorized in a number of ways. The most popular, most commonly used ones are the following:
- Procedural
- Object Oriented
- Declarative
- Functional
Procedural programming is the most common one, putting one statement after the other. Using functions (that are called procedures in some languages). Maybe splitting out some code to libraries (aka. modules). C is probably the most popular procedural language.
Object Oriented is when you divide your code into classes. Create instances of the classes. Internally, inside the classes, you still have functions (procedures), though these are now called methods. Java is probably the most popular object oriented language.
Declarative is very different. SQL is probably the most well-known example. Here you describe the type of result you are looking for and the engine behind figures out how to go about to get what you asked for.
Functional is a style in which you basically write lots of functions. Haskell might be the most well-known example.
In Python we can use Procedural, Object Oriented, and Functional paradigms. It is also rather different from the other languages. You don't pick one paradigm, you just write code in hopefully the most appropriate way for the given task and given environment.
When you write len(text)
this is procedural. You call the len
function (procedure) on a variable called text
.
When you write text.lower()
this is basically Object Oriented, even though you don't yet defined your own classes and instances. You are using the fact that text
is an instance object and it has a method called lower
.
When using map
, filter
, or list-comprehension you use the functional paradigm of Python.
So you see you probably have already use some of the functional aspects of Python even without realizing it.
In the next few pages we'll show a lot of examples in Python using the functional paradigm and by the end of this booklet you'll understand how things work much better and you'll feel a lot more comfortable writing these constructs.
Functional programming
Functional programming has several key aspects or we might call them "preferences".
- Immutability: Variables, despite their name, don't change.
- Separation of data and functions.
- Pure functions: Functions have no side-effects.
- First-class functions: You can assign a function to another name. You can pass function to other functions and return them as well. You can also manipulate functions.
- Higher order functions: A functions that either takes a function as a parameter or returns a function as a parameter.
In Python functional programming just blends in the rest of the code. You already used some functional constructs without probably realizing that they functional.
It also seems that Object Oriented programming and Functional programming are two very different styles. One could say they are ortoghonal to each other, but in Python you can combine them.
Let's start with some simple things.
Iterators (Iterables)
You already know that there are all kinds of objects in Python that you can iterate over using the for-in construct. For example, you can iterate over the characters of a string, or the elements of a list, or whatever range() returns. You can also iterate over the lines of a file and you have probably seen the for in construct in other cases as well. The objects that can be iterated over are collectively called iterables. You can do all kind of interesting things on such iterables. We'll see a few now.
- There are several data types that we can iterate over using the for ... in ... construct. For example, strings, lists, tuples, filehandles, and range:
string
We can iterate over the "elements" of a string that are the characters. We can also print the whole string.
name = "Learn 🐍 Python"
for cr in name:
print(cr)
print(name)
Output:
L
e
a
r
n
🐍
P
y
t
h
o
n
Learn 🐍 Python
list
We can iterate over the elements of a list. We can also print the whole list at once.
numbers = [101, 2, 3, 42]
for num in numbers:
print(num)
print(numbers)
Output:
101
2
3
42
[101, 2, 3, 42]
tuple
We can iterate over the elements of a tuple. We can also print the whole tuple at once.
planets = ("Mars", "Earth", "Venus")
for planet in planets:
print(planet)
print(planets)
Output:
Mars
Earth
Venus
('Mars', 'Earth', 'Venus')
filehandle
We can iterate over a filehandle. On each iteration we'll get one line from the file.
import sys
file = sys.argv[0]
with open(file) as fh:
for row in fh:
print(row, end="")
Running this program will print itself.
We can also print the filehandle: print(fh)
, but the output will be:
<_io.TextIOWrapper name='iterable_fh.py' mode='r' encoding='UTF-8'>
not the content of the file.
So the "thing" that the open
function returned is iterable, but it is different from the previous 3 types.
range
The range
function returns something, we can use the for-in loop to iterate over it and we'll get back the expected numbers.
However, if we print the value that we got back from the range
function it looks strange. It is the range
object.
Unlike in Python 2, here in Python 3 the range
function does not return a list of numbers.
It returns an object that allows us to iterate over the numbers, but it does not hold the numbers.
rng = range(3, 9, 2)
for num in rng:
print(num)
print(rng)
Output:
3
5
7
range(3, 9, 2)
range
is interesting. We are going to take a closer look in the next few pages.
range
So what does range really return?
Instead of returning the list of numbers (as it used to do in Python 2), now it returns a range object that provides "the opportunity to go over
the specific series of numbers" without actually creating the list of numbers. Getting an object instead of the whole list has a number of advantages.
One is space in the memory. In a later example we'll see how much memory is needed for the object returned by the range function and
how much would it take to have the corresponding list of numbers in memory. For now let's see how we can use it. range
can have 1, 2, or 3 parameters.
range(start, end, step)
- Go fromstart
(included) toend
(not included) everystep
value.range(start, end)
- Wherestep
defaults to 1.range(end)
- Wherestart
defaults to 0,step
defaults to 1.
range with 3 parameters
Go from start
(included) to end
(not included) every step
value.
rng = range(3, 11, 2)
print(rng)
print()
for num in rng:
print(num)
Output:
range(3, 11, 2)
3
5
7
9
range with 2 parameters
Go from start
(included) to end
(not included) every step=1
.
rng = range(3, 7)
for num in rng:
print(num)
Output:
3
4
5
6
range with 1 parameter
Go from start=0
(included) to end
(not included) every step=1
.
rng = range(3)
for num in rng:
print(num)
Output:
0
1
2
range index and len
Similar to strings, lists, and tuple, but unlike filehandles, we can use the len
function on the range
object and we can also access a specific element
using the square-bracket indexing without actually iterating over all the elements.
rng = range(10, 10000, 2)
print(rng[2])
print(len(rng))
Output:
14
4995
filehandle index and len
On a filehandle (returned by the open
function) we cannot call the len
function, nor can we access arbitrary element (row) using index in a square bracket []
.
We can only use it as an iterator.
len
import sys
file = sys.argv[0]
with open(file) as fh:
print(len(fh))
Error:
Traceback (most recent call last):
File ".../fh_len.py", line 5, in <module>
print(len(fh))
~~~^^^^
TypeError: object of type '_io.TextIOWrapper' has no len()
index
import sys
file = sys.argv[0]
with open(file) as fh:
print(fh[2])
Error:
Traceback (most recent call last):
File ".../fh_index.py", line 5, in <module>
print(fh[2])
~~^^^
TypeError: '_io.TextIOWrapper' object is not subscriptable
range with list
Using the list function we can tell the range object to generate the whole list immediately. Either using the variable that holds the range object, or wrapping the range() call in a list() call.
The sys
module provides function called sys.getsizeof() that returns the size of a Python object in the memory. As you can see the size of the range object is 48 bytes
while the size of the 3-element list is 112 bytes. It seems the range object uses less memory than even such a short lists.
On the next page we'll see a more detailed analyzis.
import sys
rng = range(3, 9, 2)
numbers = list(rng)
print(rng) # range(3, 9, 2)
print(numbers) # [3, 5, 7]
others = list(range(2, 11, 3))
print(others) # [2, 5, 8]
print(sys.getsizeof(rng)) # 48
print(sys.getsizeof(numbers)) # 112
- range
- list
- getsizeof
range vs. list size using getsizeof
Showing that the range object remains the same size regardless of the size of the range, but if we convert it into a list then its memory footprint is proportional to its size. To the number of elements in it.
In this example we have a loop iterating over range(21), but that's only for the convenience, the interesting part is inside the loop. On every iteration we call range() with the current number, then we convert the resulting object into a list of numbers. Finally we print out the current number and the size of both the object returned by range() and the list generated from the object. As you can see the memory usage of the range object remains the same 48 bytes, while the memory usage of the list grows as the list gets longer.
The code we use to demonstrate the memory usage of a range
object vs. the list of numbers generated using it.
import sys
for ix in range(21):
rng = range(ix)
numbers = list(rng)
print("{:>3} {:>3} {:>4}".format(
ix,
sys.getsizeof(rng),
sys.getsizeof(numbers)))
Output:
0 48 64
1 48 96
2 48 104
3 48 112
4 48 120
5 48 128
6 48 136
7 48 144
8 48 160
9 48 192
10 48 200
11 48 208
12 48 216
13 48 224
14 48 232
15 48 240
16 48 256
17 48 264
18 48 272
19 48 280
20 48 288
for loop with transformation
There are many cases when we have a list of some values and we need to apply some transformation to each value. At the end we would like to have the list of the resulting values.
A very simple such transformation would be to double each value. Other, more interesting examples might be reversing each string, computing some more complex function on each number, etc.)
In this example we just double the values and use append to add each value to the list containing the results.
def double(n):
return 2 * n
numbers = [7, 2, 4, 1]
double_numbers = []
for num in numbers:
double_numbers.append( double(num) )
print(double_numbers)
Output:
[14, 4, 8, 2]
There are better ways to do this.
map
map(function, iterable, ...)
The map function of Python applies a function to every item in an iterable and returns an iterator that can be used to iterate over the results. Wow, how many times I repeated the word iter...something. Instead of trying to untangle that sentence, let's look at the following example:
We have a list of numbers in the brilliantly named variable numbers
with 7, 2, 4, 1 as the content. We would like to create a list of all the doubles (so that would be 14, 4, 8, 2 in this case)
and then iterate over them printing them on the screen. Sure, you probably have some more complex operation to do on the numbers than to simply double them, but in this example I did not want to complicate
that part. Suffice to say that you have some computation to do on every element of the original list.
So you encapsulate your computation in a regular Python function (in our case the function is called double). Then you call map and pass to it two parameters. The first parameter is the double function itself, the second parameter is the list of the values you would like to work on. map
will go over all the values in the numbers list, call the double function with
each number and return you somthing that will allow you to iterate over the results. Something like this:
double_numbers = [ double(7), double(2), double(4), double(1)]
Except, that the above is not true.
When Python executes the double_numbers = map(double, numbers)
line, no computation happens and no resulting list is created. Python only prepars "the possibility to do the computations". In the upcoming examples we'll see what does this sentence really mean, for now let's see what do we have in this example: double_numbers
contains a map object, but when you iterate
over it using the for num in double_numbers
expression you get the expected values.
def double(n):
return 2 * n
numbers = [7, 2, 4, 1]
double_iterable = map(double, numbers)
print(double_iterable)
for num in double_iterable:
print(num)
Output:
<map object at 0x76a59d19e560>
14
4
8
2
map with list
Just as with the range
object, here with the map
object too you can use the list
function to convert all the values at once.
However, there are two advantages of keeping it as a map object and not using list
to flatten it into a list
.
- Memory usage.
- Delayed (lazy) execution of code.
def double(num):
return 2 * num
numbers = [7, 2, 4, 1]
double_numbers = list(map(double, numbers))
print(double_numbers)
Output:
[14, 4, 8, 2]
size of map
One advantage of keeping the result of map
as it is is the memory footprint of the map object vs. the generated list.
The map
function returns a map
object that takes up a fixed amount of memory, regardless of the size of the original list.
If we flatten it using the list
function, the size of the generated list will be similar to the size of the original list.
import sys
def double(num):
return 2 * num
numbers = [7, 2, 4, 1]
double_iterable = map(double, numbers)
double_numbers = list(map(double, numbers))
print(sys.getsizeof(double_iterable))
print(sys.getsizeof(double_numbers))
Output:
48
120
map delaying function call
Not only the size that we already saw with the range case, but also the processing time saved by not calculating the results till you actually need it.
Let's see how it works, before getting in more explanation.
def double(n):
print(f"double {n}")
return 2 * n
numbers = [1, 4, 2, -1]
double_numbers = map(double, numbers)
print(double_numbers)
for num in double_numbers:
print(num)
Output:
<map object at 0x7f90df760f98>
double 1
2
double 4
8
double 2
4
double -1
-2
Imagine a case where you apply several expensive (time consuming) transformations to some original list and then you iterate over the end-results looking for the first value that matches some condition. What if you find the value you were looking for after only a few iteration. Then making all that expensive calculations to the whole list was a waste of time.
This lazy evaluation can help you save both memory and time and you always have the option to force the immediate calculation by calling the list function.
In this example we have added a call to print
in the double function in order to see when is it really executed. You can see that the first output
comes from the print
statement that was after the map
call. Then on each iteration we see the output from inside the "double" function and then the
result from the loop. In a nutshell Python does not execute the "double" function at the point where we called map
. It only executes it when we iterate over
the resulting object.
map on many values
Now imagine you have a very long list. I know this is not such a long list, but I trust you can imagin a long list of numbers. We would like to run
some function on each element and then iterate over the results, but what if at one point in the iteration we decide to break
out of the loop?
import sys
def double(n):
print(f"double {n}")
return 2 * n
numbers = [1, 4, 2, -1, 23, 12, 5, 6, 34, 143123, 98, 213]
double_numbers = map(double, numbers)
print(double_numbers)
for num in double_numbers:
print(num)
if num > 42:
break
print()
print(sys.getsizeof(numbers))
print(sys.getsizeof(double_numbers))
<map object at 0x7fe5c5270d68>
double 1
2
double 4
8
double 2
4
double -1
-2
double 23
46
160
56
You can see that it did not need to waste time calculating the doubles of all the values, as it was calculating on-demand. You can also see that the object returned
from map
takes up only 56 bytes. Regardless of the size of the original array.
double with lambda
There are many other cases besides map where we need to pass a function as a parameter to some other function. Many cases the function we pass is some almost trivial function with a single operation in it. In those cases creating a named function like the "double" function in the previous examples is an overkill.
In this example we also used the list function to force the full evaluation of the map object to make it easier to show the results. Normally you probably would not use the list function here.
numbers = [1, 2, 3, 4]
double_numbers = list( map( lambda n: n * 2, numbers) )
print(double_numbers)
Output:
[2, 4, 6, 8]
What is lambda in Python?
Lambda creates simple anonymous function. It is simple because it can only have one statement in its body. It is anonymous because usually it does not have a name.
The usual use is as we saw earlier when we passed it as a parameter to the map
function. However, in the next example we show that you can assign the
lambda-function to a name and then you could used that name just as any other function you would define using def.
def dbl(n):
return 2*n
print(dbl(3))
double = lambda n: 2*n
print(double(3))
Output:
6
6
lambda returning tuple
A lambda function can return complex data structures as well. e.g. a tuple.
dbl = lambda n: (n, 2*n)
ret = dbl(12)
print(ret)
Output:
(12, 24)
map returning tuples
numbers = [1, 2, 3, 4]
pairs = map(lambda n: (n, 2*n), numbers)
print(pairs)
for pair in pairs:
print(pair)
Output:
<map object at 0x7fcd264a15f8>
(1, 2)
(2, 4)
(3, 6)
(4, 8)
lambda with two parameters
A lambda-function can have more than one parameters:
add = lambda x,y: x+y
print(add(2, 3))
Output:
5
map for more than one iterable
Lets "add" together two lists of numbers. Using + will just join the two lists together, but we can use the "map" function to add the values pair-wise.
v1 = [1, 3, 5, 9]
v2 = [2, 6, 4, 8]
v3 = v1 + v2
print(v3)
sums = map(lambda x,y: x+y, v1, v2)
print(sums)
print(list(sums))
Output:
[1, 3, 5, 9, 2, 6, 4, 8]
<map object at 0x7fcbecc8c668>
[3, 9, 9, 17]
map on uneven lists
In Python 3 the iterator stops when the shortest iterable is exhausted.
In Python 2 it used to extend the shorter lists by None values.
v1 = [1, 3, 5, 9]
v2 = [2, 6, 4, 8, 10]
sums = map(lambda x,y: x+y, v1, v2)
print(sums)
print(list(sums))
Output:
<map object at 0x7ff9469a8da0>
[3, 9, 9, 17]
replace None (for Python 2)
In Python 2 map used to extend the shorter lists by None values. So to avoid exceptions, we had some exra code replacing the None values by 0, using the ternary operator.
v1 = [1, 3, 5, 9]
v2 = [2, 6, 4, 8, 10]
print(map(lambda x,y: (0 if x is None else x) + (0 if y is None else y), v1, v2))
# [3, 9, 9, 17, 10]
Output:
map on uneven lists - fixed (for Python 2)
A nicer fix was this:
v1 = [1, 3, 5, 9]
v2 = [2, 6, 4, 8, 10]
print(map(lambda x,y: (x or 0) + (y or 0), v1, v2))
# [3, 9, 9, 17, 10]
map mixed iterators
map works on any iterable, so we might end up passing one list and one string to it.
v1 = ['foo', 'bar', 'baz']
v2 = 'abc'
result = map(lambda x,y: x+y, v1, v2)
print(result)
print( list(result) )
Output:
<map object at 0x7fc5e9ff4e80>
['fooa', 'barb', 'bazc']
map fetch value from dictionary
people = [
{
'name': 'Foo',
'phone': '123',
},
{
'name': 'Bar',
'phone': '456',
},
{
'name': 'SnowWhite',
'phone': '7-dwarfs',
}
]
names = map(lambda d: d['name'], people)
print(names)
print(list(names))
Output:
<map object at 0x7f5afffaeb00>
['Foo', 'Bar', 'SnowWhite']
Exercise: string to length
- Given a list of strings, create an iterable that will provide the length of each string.
- If the input is
['moon', venus', 'jupyter']
then the resulting iterable should return 4, 5, and 7.
Exercise: row to length
- Given a file, create an iterable that will provide the length of each row.
- Can you do it without reading the file?
Exercise: compare rows
-
Create an iterable that given two files will return true for each line where the first space in the first file is earlier than the first space in the second file. So
-
given: "ab cd" vs "abc d" the value is true
-
given: "ab cd" vs "ab cd" the value is false
-
given: "ab cd" vs "a bcd" the value is false
Solution: string to length
animals = ['chicken', 'cow', 'snail', 'elephant', 'pig', 'zebra', 'gnu', 'praying mantiss', 'snake']
length = map(len, animals)
print(length)
#print(list(length))
for ln in length:
print(ln)
Solution: row to length
import sys
if len(sys.argv) != 2:
exit(f"Usage: {sys.argv[0]} FILENAME")
filename = sys.argv[1]
with open(filename) as fh:
length = map(len, fh)
print(length)
for ln in length:
print(ln)
# if ln > 10:
# break
Solution: compare rows
import sys
if len(sys.argv) != 3:
exit(f"Usage: {sys.argv[0]} FILE FILE")
file_a, file_b = sys.argv[1:]
def compare(row_a, row_b):
a = row_a.find(' ')
b = row_b.find(' ')
return a < b
with open(file_a) as fh_a, open(file_b) as fh_b:
results = map(compare, fh_a, fh_b)
print(results)
for res in results:
print(res)
<map object at 0x7f0858d3f8d0>
56
[False, True, False, True, True]
128
filter in a loop
numbers = [1, 3, 27, 10, 38]
def big(n):
return n > 10
big_numbers = []
for num in numbers:
if big(num):
big_numbers.append(num)
print(big_numbers)
Output:
[27, 38]
filter
filter(function, iterable)
filter
will return an iterable object that will return all the items of the original iterable that evaluate the function to True.
This can have only one iterable!
numbers = [1, 3, 27, 10, 38]
def big(n):
return n > 10
big_numbers = filter(big, numbers)
print(big_numbers)
#print(list(big_numbers))
for num in big_numbers:
print(num)
Output:
<filter object at 0x7f4bc37355c0>
[27, 38]
filter with lambda
filter grep
numbers = [1, 3, 27, 10, 38]
big_numbers = filter(lambda n: n > 10, numbers)
print(big_numbers)
print(list(big_numbers))
Output:
<filter object at 0x7faed0fe57b8>
[27, 38]
filter - map example
numbers = [1, 7, 19, 5, 57, 23, 8]
def big(x):
print(f"filtering {x}")
return x > 10
def double(y):
print(f"double {y}")
return 2*y
big_numbers = filter(big, numbers)
print(big_numbers)
doubles = map(double, big_numbers)
print(doubles)
for num in doubles:
print(num)
Output:
<filter object at 0x7ffad9f82f28>
<map object at 0x7ffad9f829e8>
filtering 1
filtering 7
filtering 19
double 19
38
filtering 5
filtering 57
double 57
114
filtering 23
double 23
46
filtering 8
filter - map in one expression
numbers = [1, 7, 19, 5, 57, 23, 8]
def big(x):
print(f"filtering {x}")
return x > 10
def double(y):
print(f"double {y}")
return 2*y
for num in map(double, filter(big, numbers)):
print(num)
Output:
filtering 1
filtering 7
filtering 19
double 19
38
filtering 5
filtering 57
double 57
114
filtering 23
double 23
46
filtering 8
filter a dictionary using dict comprehension
We have a dictionary, we would like to create another dictionary based on this one that contains only key-value pairs that meet a certain condition.
def main():
grades = {
"math": 100,
"biology": 53,
"chemistry": 62,
"art": 78,
"history": 20,
}
print(grades)
good_grades = {key: value for (key, value) in grades.items() if value >= 60}
print(good_grades)
bad_grades = {key: grades[key] for key in grades if grades[key] < 60}
print(bad_grades)
main()
Output:
{'math': 100, 'biology': 53, 'chemistry': 62, 'art': 78, 'history': 20}
{'math': 100, 'chemistry': 62, 'art': 78}
{'biology': 53, 'history': 20}
Get indices of values
filter
can help us get a sublist of values from an iterable, eg. from a list that match some condition.
In this example we see how to get all the names that are exactly 3 characters long.
What if, however if instead of the values themselves, you would like to know their location? The indexes of the
places where these value can be found. In that case, you would run the filter
on the indexes from 0 till the last
valid index of the list. You can do that using the range
function.
Finally there is another example that shows how to get the indexes of all the names that have an "e" in them. Just to show you that we can use any arbitray condition there.
names = ["Helen", "Ann", "Mary", "Harry", "Joe", "Peter"]
names3 = filter(lambda w: len(w) == 3, names)
print( list(names3) )
loc3 = filter(lambda i: len(names[i]) == 3, range(len(names)))
print( list(loc3) )
has_e = filter(lambda i: "e" in names[i], range(len(names)))
print( list(has_e) )
Output:
['Ann', 'Joe']
[1, 4]
[0, 4, 5]
reduce
In Python 2 it was still part of the language.
reduce(function, iterable[, initializer])
from functools import reduce
numbers = [1, 2, 3, 4]
print(reduce(lambda x,y: x+y, numbers)) # 10 = ((1+2)+3)+4
print(reduce(lambda x,y: x*y, numbers)) # 24 = ((1*2)*3)*4
print(reduce(lambda x,y: x/y, [8, 4, 2])) # 1.0 = (8/4)/2
print(reduce(lambda x,y: x+y, [2])) # 2
print(reduce(lambda x,y: x*y, [2])) # 2
Output:
10
24
1.0
2
2
The initializer is used as the 0th element returned by the iterable. It is mostly interesting in case the iterable is empty.
reduce empty list
from functools import reduce
numbers = []
print(reduce(lambda x,y: x+y, numbers))
# TypeError: reduce() of empty sequence with no initial value
Output:
reduce with default
from functools import reduce
print(reduce(lambda x,y: x+y, [], 0)) # 0
print(reduce(lambda x,y: x+y, [1, 2], 0)) # 3
print(reduce(lambda x,y: x+y, [3, 4], 2)) # 9
print( reduce(lambda x,y: x*y, [1, 2], 0) ) # 0
print( reduce(lambda x,y: x*y, [2, 3], 1) ) # 6
print( reduce(lambda x,y: x*y, [], 0) ) # 0
print( reduce(lambda x,y: x*y, [], 1) ) # 1
Output:
0
18
0
3
0
6
0
reduce list of dictionaries
- Combining
map
andfilter
- The reduce-only solution requires the default value
from functools import reduce
grades = [
{
'subject': 'math',
'grade': 97,
},
{
'subject': 'chemistry',
'grade': 60,
},
{
'subject': 'grammar',
'grade': 75,
}
]
print(reduce(lambda x,y: x+y, map(lambda s: s['grade'], grades))) # 232
print(reduce(lambda acc,y: acc+y['grade'], grades, 0)) #232
Output:
zip
zip
can iterate over a number of lists (or iterables in general) in parallel. The iteration will stop when the first iterator stops.
fname = ['Graham', 'Eric', 'Terry',
'Terry', 'John', 'Michael']
lname = ['Chapman', 'Idle', 'Gilliam',
'Jones', 'Cleese', 'Palin']
born = ['8 January 1941', '29 March 1943', '22 November 1940',
'1 February 1942', '27 October 1939', '5 May 1943']
for f_name, l_name, b_date in zip(fname, lname, born):
print(f"{f_name:7} {l_name:7} was born {b_date}")
Output:
Graham Chapman was born 8 January 1941
Eric Idle was born 29 March 1943
Terry Gilliam was born 22 November 1940
Terry Jones was born 1 February 1942
John Cleese was born 27 October 1939
Michael Palin was born 5 May 1943
Combining two lists using zip
names = ['Jan', 'Feb', 'Mar', 'Apr', 'May']
days = [31, 28, 31, 30]
zipped = zip(names, days)
print(zipped)
pairs = list(zipped)
print(pairs)
Output:
<zip object at 0x7fc1b4464d40>
[('Jan', 31), ('Feb', 28), ('Mar', 31), ('Apr', 30)]
Creating dictionary from two lists using zip
names = ['Jan', 'Feb', 'Mar', 'Apr']
days = [31, 28, 31, 30]
zipped = zip(names, days)
print(zipped)
print(list(zipped))
zipped = zip(names, days)
month = dict(zipped)
print(month)
Output:
{'Jan': 31, 'Feb': 28, 'Mar': 31, 'Apr': 30}
all, any
all any
all(iterable)
- returns True if all the elements of iterable return Trueany(iterable)
- returns True if any of the elements in iterable return True
a = [True, True]
b = [True, False]
c = [False, False]
print(all(a)) # True
print(all(b)) # False
print(all(c)) # False
print()
print(any(a)) # True
print(any(b)) # True
print(any(c)) # False
Output:
Compare elements of list with scalar
print(2 > 1) # True
print(0 > 1) # False
print()
numbers = [2, 4]
# Comparing different types does not make sense, but nevertheless Python 2 would still do it.
# Python 3 raises exception:
# TypeError: '>' not supported between instances of 'list' and 'int'
# print(numbers > 1) # True
# print(numbers > 7) # True
# print()
# compare each element with the scalar and then check if 'all' were True
print(all(map(lambda x: x > 1, numbers))) # True
print(all(map(lambda x: x > 2, numbers))) # False
Output:
List comprehension - double
[]
We take the original example where we had a function called double, and this time we write a different expression to run the function on every element of an iterable.
def double(n):
return 2*n
numbers = [1, 2, 3, 4]
name = "FooBar"
double_numbers = [double(n) for n in numbers]
print(double_numbers) # [2, 4, 6, 8]
double_chars = [double(n) for n in name]
print(double_chars) # ['FF', 'oo', 'oo', 'BB', 'aa', 'rr']
Output:
List comprehension - simple expression
import sys
numbers = [0, 1, 2, 3]
sqrs = map(lambda n: n*n, numbers)
print(sqrs) # <map object at 0x7fdcab2f5940>
print(list(sqrs)) # [0, 1, 4, 9]
print(sys.getsizeof(sqrs))
print()
squares = [n*n for n in numbers]
print(squares) # [0, 1, 4, 9]
print(sys.getsizeof(squares))
Output:
<map object at 0x7fa9cf2eb9e8>
[0, 1, 4, 9]
56
[0, 1, 4, 9]
96
List generator
()
Going over the values of the generator will empty the generator.
import sys
numbers = [0, 1, 2, 3, 4, 5, 6]
gn = (n*n for n in numbers)
print(gn)
print(sys.getsizeof(gn))
print()
for num in gn:
print(num)
print()
gn = (n*n for n in numbers)
squares = list(gn)
print(sys.getsizeof(squares))
print(squares)
print(list(gn)) # the generator was already exhausted
Output:
<generator object <genexpr> at 0x7f8c0bda2930>
120
0
1
4
9
16
25
36
160
[0, 1, 4, 9, 16, 25, 36]
[]
List comprehension
mapcar
text = ['aaaa', 'bb', 'ccc ccc']
length_1 = map(lambda x: len(x), text)
print(length_1) # <map object at 0x7f60ceb90f98>
print(list(length_1)) # [4, 2, 7]
length_2 = map(len, text)
print(length_2) # <map object at 0x7f60ceb90fd0>
print(list(length_2)) # [4, 2, 7]
length_3 = [ len(s) for s in text ]
print(length_3) # [4, 2, 7]
Output:
In LISP this would be a mapcar.
Dict comprehension
people = {
'Foo': 123,
'Bar': 456,
'SnowWhite': 7,
}
doubles = { k:v*2 for (k, v) in people.items() }
print(doubles) # {'Foo': 246, 'Bar': 912, 'SnowWhite': 14}
Output:
Lookup table with lambda
lambda
import sys
table = {
"cat" : lambda : print("miau"),
"dog" : lambda : print("hauhau"),
"duck" : lambda : print("hap hap"),
}
def main():
if len(sys.argv) != 2:
exit(f"Usage: {sys.argv[0]} NAME")
animal = sys.argv[1]
if animal in table:
table[animal]()
main()
Output:
Read lines without newlines
readlines map lambda
import sys
if len(sys.argv) != 2:
exit(f"Usage: {sys.argv[0]}")
filename = sys.argv[1]
with open(filename) as fh:
rows = map(lambda s: s.rstrip("\n"), fh.readlines())
print("SIZE:", sys.getsizeof(rows))
for row in rows:
print(row)
with open(filename) as fh:
rows = map(lambda s: s.rstrip("\n"), fh)
print("SIZE:", sys.getsizeof(rows))
for row in rows:
print(row)
#with open(filename) as fh:
# #for row in fh:
# # ...
# #rows = map(lambda s: s.rstrip("\n"), fh.readlines())
# rows = map(lambda s: s.rstrip("\n"), fh)
#
# for row in rows:
# print(row)
Output:
Read key-value pairs
- readlines
- map
- lambda
- split
- dict
name=Foo Bar
email=foo@bar.com
address=Foo street 42
Output:
import sys
if len(sys.argv) != 2:
exit(f"Usage: {sys.argv[0]}")
filename = sys.argv[1]
with open(filename) as fh:
pairs = dict(map(lambda x: x.split('='), map(lambda s: s.rstrip("\n"), fh.readlines())))
print(pairs)
{'name': 'Foo Bar', 'email': 'foo@bar.com', 'address': 'Foo street 42'}
Create index-to-value mapping in a dictionary based on a list of values
- zip
- dict
planned_order = ('b', 'c', 'd', 'a')
plan = dict(zip(range(len(planned_order)), planned_order))
print(plan)
Output:
{0: 'b', 1: 'c', 2: 'd', 3: 'a'}
Exercise: min, max, factorial
- Implement an expression to calculate "min", and another expression to calculate "max" of lists.
- Implement an expression that will calculate factorial. f(n) should return the value of n! (n! = n * (n-1) * (n-2) * ... * 1)
- Implement an expression that given 2 lists will return a new list in which each element is the max() for each pair from the input lists. E.g. given [1, 3, 6] and [2, 4, 5] the result is [2, 4, 6]
- Use reduce, map, lambda
Exercise: Prime numbers
Calculate and print the prime numbers between 2 and N. Use filter.
Exercise: Many validator functions
Given several validator functions (that get a parameter and return True or False), and given a list of values, return a sublist of values that pass all the validation checks. See the sekeleton:
def is_big(x):
return x > 100
def is_even(x):
return not x % 2
numbers = [90, 102, 101, 104]
cond = [is_big, is_even]
# z = ...
print(z) # [102, 104]
Exercise: Calculator using lookup table
Write a script that will accept a math expression such as python calc.py 2 + 3
and will print the result.
Use lookup tables select the implementation of the actual computation. (supporting +, - , *, /) is enought
Exercise: parse file
In the following file we have lines:
SOURCE/FILENAME.json,TARGET
read in the file and create
- a single dictionary where the
SOURCE/FILENAME.json
is the key and the TARGET is the value. - list of dictionaries in which the keys are 'source', 'filename', and 'target' and the values are from the respective columns (SOURCE, FILENAME.json, and TARGET)
You can solve this for
-loop or with map
and list-comprehensions. Do it in both ways.
{% embed include file="src/examples/functional/books.txt)
Solution: min, max, factorial
from functools import reduce
numbers = [2, 1, 4, 3]
# min
print(reduce(lambda x,y: x if x < y else y, numbers)) # 1
# max
print(reduce(lambda x,y: x if x > y else y, numbers)) # 4
# factorial
n = 4
print(reduce(lambda x,y: x*y, range(1, n+1), 1)) # 24
# The 1 at the end is the initializor of reduce to provide
# correct results for n = 0.
a = [1, 3, 6]
b = [2, 4, 5]
c = map(lambda x,y: x if x > y else y, a, b)
print(list(c)) # [2, 4, 6]
Solution: Prime numbers
Calculating the prime numbers
n = 50
nums = range(2, n)
for i in range(2, 1+int(n ** 0.5)):
nums = filter(lambda x: x == i or x % i, nums)
print(nums)
Solution: Many validator functions
def is_big(x):
return x > 100
def is_even(x):
return not x % 2
numbers = [90, 102, 101, 104]
cond = [is_big, is_even]
z = filter( lambda n: all([f(n) for f in cond]), numbers)
print(z) # [102, 104]
Solution: Calculator using lookup table
import sys
table = {
"+" : lambda x, y: x+y,
"-" : lambda x, y: x-y,
"*" : lambda x, y: x*y,
"/" : lambda x, y: x/y,
}
def main():
if len(sys.argv) != 4:
exit(f"Usage: {sys.argv[0]} NUMBER OP NUMBER")
action = table[sys.argv[2]]
print( action(int(sys.argv[1]), int(sys.argv[3])) )
main()
map with condtion
- map
The conversion function can do anything. It can have a condition inside.
numbers = [1, 2, 3, 4]
def cond_double(n):
if n % 2:
return 2*n
else:
return n
cd = map(cond_double, numbers)
print(cd) # [2, 2, 6, 4]
Output:
map with lambda
numbers = [1, 2, 3, 4]
def dbl(x):
return 2*x
d1 = map(dbl, numbers)
print(d1) # [2, 4, 6, 8]
double = lambda x: 2*x
d2 = map(double, numbers)
print(d2) # [2, 4, 6, 8]
d3 = map(lambda n: 2*n, numbers)
print(d3) # [2, 4, 6, 8]
Output:
map with lambda with condition
numbers = [1, 2, 3, 4]
a = map(lambda n: 2*n if n % 2 else n, numbers)
print(a) # [2, 2, 6, 4]
Output:
List comprehension - complex
numbers = [1, 3, 2, 4]
t = filter(lambda n: n > 2, numbers)
print(t) # [3, 4]
n1 = map(lambda n: n*n, t)
print(n1) # [9, 16]
n2 = map(lambda n: n*n, filter(lambda n: n > 2, numbers))
print(n2) # [9, 16]
n3 = [ n*n for n in numbers if n > 2 ]
print(n3) # [9, 16]
Output:
Change list before iteration over map object
A small warning. Because map only connects to our iterator, but does not iterate over it till it is called, if we change the content of the underlying iterator, that will be reflected in the results of the iteration over the map object.
def double(n):
return 2 * n
numbers = [1, 4, 2]
double_numbers = map(double, numbers)
numbers.append(5)
for num in double_numbers:
print(num)
2
8
4
10
Replace list before iteration over map object
2
8
4
def double(n):
return 2 * n
numbers = [1, 4, 2]
double_numbers = map(double, numbers)
numbers = [3, 7]
for num in double_numbers:
print(num)
Iterators - with and without Itertools
Advantages of iterators and generators
- Lazy evaluation
- Save processing (or at least delay the use)
- Save memory
- Handle an infinite series of information
- Turn complex operations into a simple matter of
for
loop.
The Fibonacci research institute
-
We have a bunch of mathematicians who research the Fibonacci series.
-
We have a bunch of people who research a series of DNA sequences.
-
???
Fibonacci plain
- We don't call this as this has an infinite loop
def fibonacci():
a, b = 0, 1
while True:
a, b = b, a+b
# fibonacci()
Fibonacci copy-paste
def fibonacci():
a, b = 0, 1
while True:
a, b = b, a+b
print(a)
if a % 17 == 0:
print('found')
break
if a > 200:
print('not found')
break
fibonacci()
Iterators Glossary
- iterable (Can be iterated over using a
for
loop.) - iterator
- Every iterator is also iterable
- Iterators (and iterables) are not necessarily addressable like lists with the
thing[index]
construct. - Iterator Types
- The standard type hierarchy
What are iterators and iterables?
- All of them are iterables
- A filehandle and the map object are also iterators. (Side note: You should always
open
files using thewith
statement and not like this.) iter()
would return the iterator from an iterable. We don't need this.
from collections.abc import Iterator, Iterable
a_string = "Hello World"
a_list = ["Tiger", "Mouse"]
a_tuple = ("Blue", "Red")
a_range = range(10)
a_fh = open(__file__)
a_map = map(lambda x: x*2, a_list)
for thing in [a_string, a_list, a_tuple, a_range, a_map, a_fh]:
print(thing.__class__.__name__)
print(issubclass(thing.__class__, Iterator))
print(issubclass(thing.__class__, Iterable))
zorg = iter(thing)
print(zorg.__class__.__name__)
print(issubclass(zorg.__class__, Iterator))
print()
a_fh.close()
Output:
str
False
True
str_iterator
True
list
False
True
list_iterator
True
tuple
False
True
tuple_iterator
True
range
False
True
range_iterator
True
TextIOWrapper
True
True
TextIOWrapper
True
A file-handle is an iterator
- collections
- Iterator
- Iterable
- io
- TextIOWrapper
- issubclass
This slightly a repetition of the previous statement, that filehandles are iterators.
from collections.abc import Iterator, Iterable
from io import TextIOWrapper
with open(__file__) as fh:
print(fh.__class__.__name__)
print(issubclass(fh.__class__, TextIOWrapper))
print(issubclass(fh.__class__, Iterator))
print(issubclass(fh.__class__, Iterable))
for line in fh:
pass
#print(line, end="")
Output:
TextIOWrapper
True
True
True
range is iterable but it is not an iterator
Just as a string or a list, the range
function in Python is also an "iterable" but it is not an "iterator".
In many aspects it behaves as an iterator. Specifically it allows us to iterate over numbers.
Range Is Not An Iterator
for n in range(2, 12, 3):
print(n)
print()
for n in range(3):
print(n)
print()
for n in range(2, 5):
print(n)
print()
from collections.abc import Iterator, Iterable
rng = range(2, 5)
print(issubclass(rng.__class__, Iterator))
print(issubclass(rng.__class__, Iterable))
Output:
2
5
8
11
0
1
2
2
3
4
False
True
Range with floating point steps
import it
r = it.Range(1, 6, 2.2)
for n in r:
print(n)
Output:
class Range():
def __init__(self, start, end, step):
self.current = start
self.end = end
self.step = step
def __iter__(self):
return self
def __next__(self):
if self.current >= self.end:
raise StopIteration
v = self.current
self.current += self.step
return v
1
3.2
5.4
import it
def test_range():
r = it.Range(1, 6, 2.2)
result = list(r)
assert result == [1, 3.2, 5.4]
Iterator: a counter
- StopIteration
- next
- iter
- next
We can create a iterator using a class. We are required to implement the __iter__
method that returns the iterator object
and the __next__
method that returns the next element in our iteration. We can indicated that the iteration was exhaused
by raising a StopIteration
exception.
The instance-object that is created from this class-object is the iterator, not the class-object itself!
__iter__
__next__
(in Python 2 this used to callednext
)raise StopIteration
class Counter():
def __init__(self):
self.count = 0
def __iter__(self):
return self
def __next__(self):
self.count += 1
if self.count > 3:
raise StopIteration
return self.count
Using iterator
The class returned an iterator, we could use a for
loop to iterate over the element.
We tried to run through the iterator again, but it did not print anything. It was exhausted.
from counter import Counter
cnt = Counter()
for c in cnt:
print(c)
for c in cnt:
print(c)
Output:
1
2
3
Iterator without temporary variable
from counter import Counter
for c in Counter():
print(c)
Output:
1
2
3
The type of the iterator
How can we know it is an iterator? We check it.
from collections.abc import Iterator, Iterable
from counter import Counter
cnt = Counter()
print(cnt.__class__.__name__)
print(issubclass(cnt.__class__, Iterator))
print(issubclass(cnt.__class__, Iterable))
Output:
Counter
True
True
Using iterator with next
A feature of any iterator is that we could iterate over it using the next
call.
from counter import Counter
cnt = Counter()
while True:
try:
a = next(cnt)
print(a)
except Exception as ex:
print(ex.__class__.__name__)
break
Output:
1
2
3
StopIteration
Mixing for and next
You can even use next
inside a for
loop, but then you will have to handle the StopIteration
exception
that migh happen during your call of next
.
I am not really sure when would we want to use this.
from counter import Counter
cnt = Counter()
for i in cnt:
print(f"i: {i}")
try:
n = next(cnt)
print(f"n: {n}")
except Exception as ex:
print(ex.__class__.__name__)
break
Output:
i: 1
n: 2
i: 3
StopIteration
Iterable which is not an iterator
from counter import Counter
class GetMyIterable():
def __init__(self):
pass
def __iter__(self):
return Counter()
thing = GetMyIterable()
from collections.abc import Iterator, Iterable
print(issubclass(thing.__class__, Iterator))
print(issubclass(thing.__class__, Iterable))
for i in thing:
print(i)
Output:
False
True
1
2
3
Iterator returning multiple values
class SquareCounter():
def __init__(self):
self.count = 0
def __iter__(self):
return self
def __next__(self):
self.count += 1
if self.count > 5:
raise StopIteration
return self.count, self.count ** 2
for cnt, sqr in SquareCounter():
print(f"{cnt} {sqr}")
Output:
1 1
2 4
3 9
4 16
5 25
Range-like iterator
class Range():
def __init__(self, start, end):
self.current = start
self.end = end
def __iter__(self):
return self
def __next__(self):
if self.current >= self.end:
raise StopIteration
v = self.current
self.current += 1
return v
import it
r = it.Range(1, 4)
for n in r:
print(n)
print('---')
for n in it.Range(2, 5):
print(n)
Output:
1
2
3
---
2
3
4
Unbound or infinite iterator
So far each iterator had a beginning and an end. However we can also create infinte or unbounded iterators. The nice thing about them is that we can pass them around as we do with any other object and we can execute operations on them without burning our CPU.
Of course the user will have to be carefull not to try to flatten the iterator, not to try to get all the values from it, as that will only create an infinite loop or a never ending operation.
In this very simple example we count from 0 and we never stop.
When we use the Counter
in the for
loop we need to include a stop-condition, otherwise our loop will never end.
class Counter():
def __init__(self):
self.count = 0
def __iter__(self):
return self
def __next__(self):
self.count += 1
return self.count
for c in Counter():
print(c)
if c > 10:
break
Output:
1
2
3
4
5
6
7
8
9
10
11
Unbound iterator Fibonacci
Now we can get back to our original problem, the slightly more complex Fibonacci series. In this example we created an unbounded iterator that on every iteration will return the next element of the Fibonacci series.
class Fibonacci():
def __init__(self):
self.values = []
def __iter__(self):
return self
def __next__(self):
if len(self.values) == 0:
self.values.append(1)
return 1
if len(self.values) == 1:
self.values.append(1)
return 1
self.values.append(self.values[-1] + self.values[-2])
self.values.pop(0)
return self.values[-1]
from fibonacci import Fibonacci
for v in Fibonacci():
print(v)
if v > 10:
break
Output:
1
1
2
3
5
8
13
Operations on Unbound iterator
from fibonacci import Fibonacci
fib = Fibonacci()
#odd = [x for x in fib if x % 2 == 1]
odd = filter(lambda x: x % 2 == 1, fib)
print("Let's see")
for v in odd:
print(v)
if v > 10:
break
Output:
Let's see
1
1
3
5
13
itertools
itertools yield
itertools is a standard Python library that provides a number of interesting iterators. We are going to see a few examples here:
itertools - count
- Unbound counter: Count from N to infinity.
import itertools
for c in itertools.count(start=19, step=1):
print(c)
if c > 23:
break
# 19
# 20
# 21
# 22
# 23
# 24
itertools - cycle
import itertools
ix = 0
for c in itertools.cycle(['A', 'B', 'C']):
print(c)
ix += 1
if ix >= 5:
break
print('')
ix = 0
for c in itertools.cycle('DEF'):
print(c)
ix += 1
if ix >= 5:
break
Output:
A
B
C
A
B
D
E
F
D
E
Exercise: iterators - reimplement the range function
In one of the first slides of this chapter we saw a partial implementation of the range
function.
Change that code to have a full implementation, that can accept 1, 2, or 3 parameters.
Exercise: iterators - cycle
- Reimplement the cycle functions of itertools using iterator class.
Exercise: iterators - alter
- Implement the alter functions as an iterator that will return
1
-2
3
-4
5
-6
...
- Optionally provide a start and end parameters
- start defaults to 1
- end defaults to unlimited
Exercise: iterators - limit Fibonacci
Change the Iterator version of the Fibonacci series so optionally you will be able to provide a parameter called "limit" to the constructor. If the limit is provided, the iterator should stop when the value passes the limit.
Exercise: iterators - Fibonacci less memory
Change the Iterator version of the Fibonacci series so it will NOT hold the previous values in memory.
Exercise: read char
Create an iterator that given a filename will return an object that on every iteration will return a single character. As an option let the user skip newlines, or maybe any pre-defined character.
Exercise: read section
- Create an iterator that given the name of a file like the following, will return once section at a time.
- It will return a list one each iteration and each element of the list will be a line from the current section.
- Other ideas what should be returned on each iteration?
name = Mercury
distance = 0.4
mass = 0.055
name = Venus
distance = 0.7
mass = 0.815
name = Earth
distance = 1
mass = 1
name = Mars
distance = 1.5
mass = 0.107
Exercise: collect packets
- You get a series of packets (e.g. lines in a file)
- In each line you have several fields: id, seqid, maxseq, content
- id is a unique identifier of a series of packets (lines)
- seqid is the seuence id of a packet in a series. (an integer)
- maxseq is the length of the sequence.
- content is the actual content.
In each iteration return a message that is built up from all the packages in the given sequence.
12,1,5,First of Twelve
12,2,5,Second of Twelve
12,3,5,Third of Twelve
12,4,5,Fourth of Twelve
12,5,5,Fifth of Twelve
9,1,4,First of Nine
9,2,4,Second of Nine
9,3,4,Third of Nine
9,4,4,Fourth of Nine
11,1,3,First of Eleven
11,2,3,Second of Eleven
11,3,3,Third of Eleven
Output:
['First of Twelve', 'Second of Twelve', 'Third of Twelve', 'Fourth of Twelve', 'Fifth of Twelve']
['First of Nine', 'Second of Nine', 'Third of Nine', 'Fourth of Nine']
['First of Eleven', 'Second of Eleven', 'Third of Eleven']
12,1,5,First of Twelve
11,1,3,First of Eleven
9,1,4,First of Nine
12,2,5,Second of Twelve
9,2,4,Second of Nine
11,2,3,Second of Eleven
12,3,5,Third of Twelve
9,3,4,Third of Nine
12,4,5,Fourth of Twelve
12,5,5,Fifth of Twelve
9,4,4,Fourth of Nine
11,3,3,Third of Eleven
11,2,3,Second of Eleven
11,1,3,First of Eleven
9,1,4,First of Nine
12,1,5,First of Twelve
9,3,4,Third of Nine
9,2,4,Second of Nine
12,3,5,Third of Twelve
12,4,5,Fourth of Twelve
12,2,5,Second of Twelve
12,5,5,Fifth of Twelve
9,4,4,Fourth of Nine
11,3,3,Third of Eleven
Exercise: compare files
Compare two files line-by-line, and create a 3rd file listing the lines that are different.
One
Two
Three
Four
Five
One
Two
Tree
Four
Five
Expected output:
2,Three,Tree
Solution: iterators - limit Fibonacci
class Fibonacci:
def __init__(self, limit=0):
self.values = []
self.limit = limit
def __iter__(self):
return self
def next(self):
if self.limit and len(self.values) >= self.limit:
raise StopIteration
if len(self.values) == 0:
self.values.append(1)
return 1
if len(self.values) == 1:
self.values.append(1)
return 1
self.values.append(self.values[-1] + self.values[-2])
return self.values[-1]
import fibonacci
f = fibonacci.Fibonacci(limit = 10)
print(f)
for v in f:
print(v)
print('-----')
f = fibonacci.Fibonacci()
for v in f:
print(v)
if v > 30:
break
Solution: iterators - Fibonacci less memory
class Fibonacci:
def __init__(self, limit=0):
self.values = ()
self.limit = limit
def __iter__(self):
return self
def next(self):
if self.limit and len(self.values) and self.values[-1] >= self.limit:
raise StopIteration
if len(self.values) == 0:
self.values = (1,)
return 1
if len(self.values) == 1:
self.values = (1, 1)
return 1
self.values = (self.values[-1], self.values[-1] + self.values[-2])
return self.values[-1]
import fibonacci
f = fibonacci.Fibonacci(limit = 10)
print(f)
for v in f:
print(v)
print('-----')
f = fibonacci.Fibonacci()
for v in f:
print(v)
if v > 30:
break
Solution: read section
import re
class SectionReader():
def __init__(self, filename):
self.filename = filename
self.fh = open(filename)
def __iter__(self):
return self
def __next__(self):
self.section = []
while True:
line = self.fh.readline()
if not line:
if self.section:
return self.section
else:
self.fh.close()
raise StopIteration
if re.search(r'\A\s*\Z', line):
if self.section:
return self.section
else:
continue
self.section.append(line)
filename = 'planets.txt'
for sec in SectionReader(filename):
print(sec)
Solution: compare files
import sys
def main():
if len(sys.argv) != 4:
exit(f"Usage: {sys.argv[0]} IN_FILE IN_FILE OUT_FILE")
infile_a, infile_b = sys.argv[1:3]
outfile = sys.argv[3]
with open(outfile, 'w') as out_fh, open(infile_a) as in_a, open(infile_b) as in_b:
cnt = 0
for lines in zip(in_a, in_b):
#print(lines)
lines = list(map(lambda s: s.rstrip('\n'), lines))
#print(lines)
if lines[0] != lines[1]:
out_fh.write(f"{cnt},{lines[0]},{lines[1]}\n")
cnt += 1
main()
python diff.py first.txt second.txt diff.txt
Solution: collect packets
The implementation
class Packets():
def __init__(self, filename):
self.filename = filename
self.fh = open(filename)
self.packets = {}
self.max = {}
def __iter__(self):
return self
def __next__(self):
while True:
line = self.fh.readline()
#print(f"line: {line}")
if line == '':
raise StopIteration
line = line.rstrip("\n")
if line == '':
continue
pid, seqid, maxseq, content = line.split(",")
pid = int(pid)
seqid = int(seqid)
maxseq = int(maxseq)
if pid not in self.packets:
self.packets[pid] = {}
self.max[pid] = maxseq
if seqid in self.packets[pid]:
raise Exception("pid arrived twice")
if maxseq != self.max[pid]:
raise Exception("maxseq changed")
self.packets[pid][seqid] = content
if len(self.packets[pid].keys()) == self.max[pid]:
content = list(map(lambda i: self.packets[pid][i+1], range(self.max[pid])))
del(self.max[pid])
del(self.packets[pid])
return content
The use:
import sys
from packets import Packets
if len(sys.argv) < 2:
exit(f"Usage: {sys.argv[0]} FILENAME")
for packet in Packets(sys.argv[1]):
print(packet)
The test to verify it
import os
import json
import pytest
from packets import Packets
root = os.path.dirname(os.path.abspath(__file__))
with open(os.path.join(root, 'packets.json')) as fh:
expected_results = json.load(fh)
@pytest.mark.parametrize('filename', ['packets.txt', 'packets1.txt', 'packets2.txt'])
def test_packetes(filename):
filepath = os.path.join(root, filename)
results = []
for packet in Packets(filepath):
results.append(packet)
assert results == expected_results
Expected result:
[["First of Twelve", "Second of Twelve", "Third of Twelve", "Fourth of Twelve", "Fifth of Twelve"], ["First of Nine", "Second of Nine", "Third of Nine", "Fourth of Nine"], ["First of Eleven", "Second of Eleven", "Third of Eleven"]]
Generators and Generator Expressions
Generators Glossary
-
generator (a function that returns a "generator iterator")
-
generator-iterator (an object created by a generator)
-
Generators are basically a way to create iterators without a class.
Iterators vs Generators
- a generator is an iterator
- an iterator is an iterable
from collections.abc import Iterator, Iterable
from types import GeneratorType
print( issubclass(GeneratorType, Iterator) ) # True
print( issubclass(Iterator, Iterable) ) # True
- Genarators are a simpler way to create an iterable object than iterators, but iterators allow for more complex iterables.
- To create an iterator we need a class with two methods:
__iter__
and__next__
, and araise StopIteration
. - To create a generator we only need a single function with `yield .
List comprehension and Generator Expression
However, before learning about yield
let's see an even simpler way to create a generator. What we call a generator expression.
You are probably already familiar with "list comprehensions" where you have a for
expression inside square brackets. That returns a list
of values.
If you replace the square brackets with parentheses then you get a generator expression.
You can iterate over either of those. So what's the difference?
The generator expression delays the execution of the loop and the expression inside to the point where you actually need the value and it is only an object, not the full list.
So you also save memory.
a_list = [i*2 for i in range(3)]
print(a_list)
for x in a_list:
print(x)
print()
a_generator = (i*2 for i in range(3))
print(a_generator)
for x in a_generator:
print(x)
Output:
[0, 2, 4]
0
2
4
<generator object <genexpr> at 0x7f0af6f97a50>
0
2
4
List comprehension vs Generator Expression - less memory
Let's use a bigger range of numbers and create the corresponding list and generator. Then check the size of both of them. You can see the list is much bigger. That's becuse the list already contains all the elements, while the generator contains only the promise to give you all the elements.
As we could see in the previous example, this is not an empty promise, you can indeed iterate over the elements of a generator just as you can iterate over the elements of a list.
However, you cannot access an arbitrary element of a generator because the generator is not subscriptable.
import sys
lst = [n*2 for n in range(1000)] # List comprehension
gen = (n*2 for n in range(1000)) # Generator expression
print(sys.getsizeof(lst))
print(sys.getsizeof(gen))
print()
print(type(lst))
print(type(gen))
print()
print(lst[4])
print()
print(gen[4])
Output:
9016
112
<class 'list'>
<class 'generator'>
8
Traceback (most recent call last):
File "generator_expression.py", line 17, in <module>
print(gen[4])
TypeError: 'generator' object is not subscriptable
List Comprehension vs Generator Expressions
getsizeof
List comprehension vs Generator Expression - lazy evaluation
The second big difference between list comprehension and generator expressions is that the latter has lazy evaluation.
In this example you can see that once we assign to list comprehension to a variable the sqr
function is called on each element.
In the case of the generator expression, only when we iterate over the elements will Python call the sqr
function.
If we exit from the loop before we go over all the values than we saved time by not executing the expression on every
element up-front. If the computation is complex and if our list is long, this can have a substantial impact.
def sqr(n):
print(f"sqr {n}")
return n ** 2
numbers = [1, 3, 7]
# list comprehension
n1 = [ sqr(n) for n in numbers ]
print("we have the list")
for i in n1:
print(i)
print("-------")
# generator expression
n2 = ( sqr(n) for n in numbers )
print("we have the generator")
for i in n2:
print(i)
Output:
sqr 1
sqr 3
sqr 7
we have the list
1
9
49
-------
we have the generator
sqr 1
1
sqr 3
9
sqr 7
49
Generator: function with yield - call next
- next
- yield
- StopIteration
We can create a function that has multiple yield
expressions inside.
We call the function and what we get back is a generator
.
A generator
is also an iterator
so we can call the next
function on it and it will give us the next yield
value.
If we call it one too many times we get a StopIteration
exception.
def number():
yield 42
yield 19
yield 23
num = number()
print(type(num))
print(next(num))
print(next(num))
print(next(num))
print(next(num))
Output:
<class 'generator'>
42
19
23
Traceback (most recent call last):
File "simple_generator_next.py", line 11, in <module>
print(next(num))
StopIteration
Generators - call next
We can also use a for
loop on the generator
and then we don't need to worry about the exception.
def number():
yield 42
yield 19
yield 23
num = number()
print(type(num))
for n in num:
print(n)
Output:
<class 'generator'>
42
19
23
Generator with yield
We don't even need to use a temporary variable for it.
def number():
yield 42
yield 19
yield 23
for n in number():
print(n)
Output:
42
19
23
Generators - fixed counter
def counter():
n = 1
yield n
n += 1
yield n
n += 1
yield n
for c in counter():
print(c)
Output:
1
2
3
Generators - counter
def counter():
n = 1
while True:
yield n
n += 1
for c in counter():
print(c)
if c >= 10:
break
Output:
1
2
3
4
5
6
7
8
9
10
Generators - counter with parameter
def counter(n = 1):
while True:
yield n
n += 1
for c in counter():
print(c)
if c >= 4:
break
print()
for c in counter(8):
print(c)
if c >= 12:
break
Output:
1
2
3
4
8
9
10
11
12
Generators - my_range
import sys
def my_range(limit = 1):
n = 0
while n < limit:
yield n
n += 1
for i in my_range(5):
print(i)
print()
print(sum(my_range(10)))
print()
x = my_range(10000)
print(x)
print(sys.getsizeof(x))
Output:
0
1
2
3
4
45
<generator object my_range at 0x7f36f6089930>
120
Fibonacci - generator
def fibonacci():
a, b = 0, 1
while True:
a, b = b, a+b
yield a
for a in fibonacci():
print(a)
if a % 17 == 0:
print('found')
break
if a > 20000:
print('not found')
break
Output:
1
1
2
3
5
8
13
21
34
found
When we enter the for loop the fibonacci() function is called and returns a generator object.
Then next
is called on this generator object which executes the content of the function up to the call of yield
and return the value. Then ever time the for-loop calls next
the function continues running till it reaches yield
again or till it reaches the end of the function. (Which will never happen in this case.)
Infinite series
- The Fibonacci was already infinite, let's see a few more.
Integers
from series import integers
for i in integers():
print(i)
if i >= 10:
break
Output:
1
2
3
4
5
6
7
8
9
10
Integers + 3
from series import integers
n3 = (n+3 for n in integers())
# n3 = integers(3)
for i in n3:
print(i)
if i >= 10:
break
Output:
4
5
6
7
8
9
10
Integers + Integers
from series import integers
def mysum(nums):
print(nums)
total = 0
for n in nums:
total += n
return total
n3 = integers(3)
n7 = integers(7)
d = ( mysum(p) for p in zip(n3, n7) )
print("start")
for i in d:
print(i)
if i >= 20:
break
Output:
start
(3, 7)
10
(4, 8)
12
(5, 9)
14
(6, 10)
16
(7, 11)
18
(8, 12)
20
- itertools
- zip
Filtered Fibonacci
from series import fibonacci
even = ( fib for fib in fibonacci() if fib % 2 == 0 )
for e in even:
print(e)
if e > 40:
break
Output:
0
2
8
34
144
The series.py
This is the module behind the previous examples.
def integers(n = 1):
while True:
yield n
n += 1
def fibonacci():
a, b = 0, 1
while True:
yield a
a, b = b, a+b
def gfibonacci(size = 2):
"""Generalized Fibonacci. """
values = [0]
while True:
yield values[-1]
if len(values) < size:
values.append(1)
else:
values.append(sum(values))
values = values[1:]
def pascal():
values = [1]
while True:
yield values
new = [1]
for i in range(0, len(values)-1):
new.append(values[i] + values[i+1])
new.append(1)
values = new
generator - unbound count (with yield)
def count(start=0, step=1):
n = start
while True:
yield n
n += step
for c in count(start=19, step=1):
print(c)
if c > 23:
break
Output:
19
20
21
22
23
24
iterator - cycle
def cycle(values=[]):
my_values = []
for v in values:
my_values.append(v)
yield v
while True:
for v in my_values:
yield v
i = 0
for c in cycle(['A', 'B', 'C']):
print(c)
i += 1
if i >= 4:
break
Output:
A
B
C
A
Exercise: Alternator
Create a generator for the following number series: 1, -2, 3, -4, 5, -6, ...
Exercise: Prime number generator
Create a generator that will return the prime numbers: 2, 3, 5, 7, 11, 13, 17, ...
Exercise: generator
Take the two generator examples (increment number and Fibonacci) and change them to provide infinite iterations. Then try to run them in a for loop. Just make sure you have some other condition to leave the for-loop.
Exercise: Binary file reader
Create a generator that given a filename and a number n will return the content of the file in chunks of n characters.
Exercise: File reader with records
In a file we have "records" of data. Each record starts with three bytes in which we have the length of the record. Then the content.
8 ABCDEFGH 5 XYZQR
Given this source file
First line
Second record
Third row of the records
Fourth
5
END
using this code
filename = "rows.txt"
records = "records.txt"
with open(filename) as in_fh:
with open(records, 'w') as out_fh:
for line in in_fh:
line = line.rstrip("\n")
out_fh.write("{:>3}{}".format(len(line), line))
we can create this file:
10First line 13Second record 24Third row of the records 6Fourth 15 3END
The exercise is to create an iterator/generator that can read such a file record-by-record.
yield outside of functions
- It has no meaning there
print("one")
yield "ok";
print("two")
Output:
File "examples/generators/yield_outside_any_function.py", line 2
yield "ok";
^
SyntaxError: 'yield' outside function
Operations on infinite lists
In this example we can say that the fibonacci()
function returns the infinite list of Fibonacci numbers.
In normal circumstances only mathematicians can work with "infinite lists", programmers are constrained by memory and time.
However generators in Python are lazy so they only pretend to be infinite lists. They are only the promise of having an
infinite list. So you can do all kinds of interesting operations on them.
In this example we multiply each value by 2. (I know it is not the most sophisticated mathematical problem, but it will work for this example.)
The variable double_fibonacci
holds the values of the Fibonacci series multiplied by 2. More precisely it holds the
possibility to iterate over that infinite list.
So in reality we don't operate on the infinite lists, only on the "potential of the lists", but the former sounds cooler.
Try running it without the if
and break
statements and see what happens. (Ctrl-C will stop the program.)
def fibonacci():
a, b = 0, 1
while True:
a, b = b, a+b
yield a
double_fibonacci = (value * 2 for value in fibonacci())
for a in double_fibonacci:
print(a)
if a > 200:
break
Output:
2
2
4
6
10
16
26
42
68
110
178
288
Infinite random number generator
This example was created to show that iterators (generators) don't necessarily return a fixed list of values.
import random
def miranda():
while True:
yield random.random()
total = 0
for val in miranda():
print(val)
total += val
if total > 10:
break
Output:
Infinite random set generator
This example was created to show that iterators (generators) don't necessarily return a fixed list of values.
import random
def choices(values, **kw):
while True:
yield random.choices(values, **kw)
count = 0
for val in choices(['a', 'b', 'c', 'd', 'e', 'f'], k=2):
print(val)
count += 1
if count > 10:
break
Output:
Simple function (before generators)
TODO: probably not that interesting
def number():
return 42
print(number()) # 42
print(number()) # 42
print(number()) # 42
def number():
return 42
return 19
return 23
print(number()) # 42
print(number()) # 42
print(number()) # 42
Filtered Fibonacci with ifilter
itertools ifilter
from series import fibonacci
from itertools import ifilter
even = ifilter( lambda f: f % 2 == 0, fibonacci() )
for e in even:
print(e)
if e > 200:
break