Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Python Programming

First steps

What is Python?

  • A snake.
  • A British comedy group called Monty Python.
  • A programming language. The definition of the language: words, punctuation (operators) and grammar (syntax).
  • The compiler/interpreter of the Python programming language. (aka. CPython).

When people say they Python in relation to programming they either mean the Python programming language or they mean the tool that can translate some text (code) written in the Python programming language to the language a computer can actually understand. On MS Windows this is the python.exe you need to install. On Linux/Mac it is usually called python or python3. The generic name of the tool that translates a programming language for the computer is either called a compiler or an interpreter. We'll talk about this later on.

What is needed to write a program?

  • An editor where we can write in a language.
  • A compiler or interpreter that can translate our text to the language of the computer.

In order to write and run a program you basically need two things. A text editor in which you can write the program and a compiler or interpreter that can translate this program to the computer.

The source (code) of Python

Python 2 vs. Python 3

  • Python 2.x - old, legacy code at companies, answers on the Internet. Retires on January 1, 2020.

  • Python 3.x - the one that you should use. (not fully backward compatible) Available since December 3, 2008.

  • Releases of Python versions

Python has two major lines the version 2.x and the version 3.x. In a nutshell you should always use Python 3 if possible.

Unfortunately you can still encounter many companies and many projects in companies that are stuck on Python 2. In such cases you probably will have to write in Python 2.

In addition when you search for solutions on the Internet in many cases you'll encounter solution that were written for Python 2. Luckily in most of the cases it is almost trivial to convert these small examples to work on Python 3. You just need to be able to recognize that the code was originally written for Python 2 and you need to be able to make the adjustments.

For this reason, while the majority of these pages cover Python 3, we are going to point out the places where it might be useful to know how Python 2 works.

You are free to skip these parts and come back to them when the need arises.

Installation

  • MS Windows
  • Linux
  • Apple/Mac OSX

We are going to cover how to install Python on all 3 major operating systems.

Installation on Linux

  • On Linux you usually have Python 2 installed in /usr/bin/python
  • Python 3 in /usr/bin/python3.
  • If they are not installed, you can install them with the appropriate yum or apt-get command of your distribution.
  • An alternative is to install Anaconda with Python 3.x
$ which python3

$ sudo apt-get install python3
$ sudo yum install python3

Installation on Apple Mac OSX

  • On Mac OSX you can have Python 2 installed in /usr/bin/python and Python 3 installed as /usr/bin/python3.
  • Homebrew
  • An alternative is to install Anaconda with Python 3.x
$ which python3

$ brew install python3

Installation on MS Windows

  • Download.

  • Make sure the "Add Python 3.10 to PATH" check-box is checked.

Alternatively, if Python was installed without that checkbox, one can re-run the installation, select "Modify installation" and then check the box on "Add Python to environment variables".

Installation of Anaconda

Anaconda is a package that includes Python and a bunch of other tools. I used to recommend it, but these days I prefer a plain installation of Python from python.org.

Editors, IDEs

Basically you can use any text editor to write Python code. The minimum I recommend is to have proper syntax highlighting. IDEs will also provide intellisense, that is, in most of the cases they will be able to understand what kind of objects do you have in your code and will be able to show you the available methods and their parameters. Even better, they provide powerful debuggers.

PyCharm seems to be the most popular IDE. It has a free version called community edition.

Linux

Windows

Mac

All platforms

IDEs

Documentation

Program types

  • Desktop application (MS Word, MS Excel, calculator, Firefox, Chrome, ...
  • Mobile applications - whatever runs on your phone.
  • Embedded applications - software in your car or in your shoelace.
  • Web applications - they run on the web server and send you HTML that your browser can show.
  • Command Line Applications
  • Scripts and programs are the same for our purposes
  • ...

Python on the command line

  • -V|options
  • -c|options

More or less the only thing I do on the command line with python is to check the version number:

python -V
python --version

You can run some Python code without creating a file, but I don't remember ever needing this. If you insist

python -c "print 42"
python3 -c "print(42)"

Type the following to get the details:

man python

cmdline

First script - hello world

print("Hello World")
  • Create a file called hello.py with the above content.
  • Open your terminal or the Anaconda Prompt on MS Windows in the directory (folder)
  • Change to the directory where you saved the file.
  • Run it by typing python hello.py or python3 hello.py
  • The extension is .py - mostly for the editor (but also for modules).
  • Parentheses after print() are required in Python 3, but use them even if you are stuck on Python 2.

Examples

git clone https://github.com/szabgab/slides.git
'slides'... fatal: unable to access 'https://github.com/szabgab/slides.git/':
SSL certificate problem: self signed certificate in certificate chain

Sometimes people get an error:

The soulution is then to do the following: (on Windows)

set GIT_SSL_NO_VERIFY=true
git clone https://github.com/szabgab/slides.git

Later, after I update the slides you can also update your local copy of the files by running

cd slides
git pull

Comments

# marks single line comments.

There are no real multi-line comments in Python, but we will see a way to have them anyway.

print("hello")

# Comments for other developers

print("world") # more comments

# print("This is not printed")

Variables

greeting = "Hello World!"
print(greeting)

Exercise: Hello world

Try your environment:

  • Make sure you have access to the right version of Python.
  • Install Python if needed.
  • Check if you have a good editor with syntax highlighting.
  • Write a simple script called hello.py that prints Hello Foo Bar! replacing Foo Bar with your own name.
  • Add some comments to your code.
  • Create a variable, assign some text to it and then print out the content of the variable.

What is programming?

  • Use some language to tell the computer what to do.
  • Like a cooking recipe it has step-by-step instructions.
  • Taking a complex problem and dividing it into small steps a computer can do.

What are the programming languages

  • A computer CPU is created from transistors, 1 and 0 values. (aka. bits)
  • Its language consists of numbers. (e.g 37 means move the content of ax register to bx register)
  • English? too complex, too much ambiguity.
  • Programming languages are in-between.

A written human language

  • Words
  • Punctuation: - . , ! ?
  • Grammar
  • ...

A programming language

Words and punctuation matter!

  • What did you chose? (Correctly: choose, but people will usually understand.)

  • Lets do the homework. (Correctly: Let's, but most people will understand.)

  • Let's eat, grandpa!

  • Let's eat grandpa!

  • see more

  • Programming languages have a lot less words, but they are very strict on the grammar (syntax).

  • A missing comma can break your code.

  • A missing space will change the meaning of your code.

  • An incorrect word can ruin your day.

Types matter to Python (a bit)

  • Python differntiates between strings, integers, and floating point numbers.
  • "2" is not the same as 2
  • "3.14" is not the same as 3.14

String vs int


x = 2
y = "2"

print(x)
print(y)

print(x + 1)
print(y + 1)

Output:

2
2
3
Traceback (most recent call last):
  File "/home/gabor/work/slides/python/examples/basics/str_int.py", line 9, in <module>
    print(y + 1)
TypeError: can only concatenate str (not "int") to str

String vs float


x = 3.14
y = "3.14"

print(x)
print(y)

print(x + 1.1)
print(y + 1.1)

Output:

3.14
3.14
4.24
Traceback (most recent call last):
  File "/home/gabor/work/slides/python/examples/basics/str_float.py", line 9, in <module>
    print(y + 1.1)
TypeError: can only concatenate str (not "float") to str

int and float


x = 2
y = 3.14

print(x + 1.5)
print(y + 1)

Output:

3.5
4.140000000000001

Literals, Value Types in Python

  • int
  • str
  • float
  • bool
print( type(23) )        # int
print( type(3.14) )      # float
print( type("hello") )   # str

print( type("23") )      # str
print( type("3.24") )    # str

print( type(None) )      # NoneType
print( type(True) )      # bool
print( type(False) )     # bool

print( type([]) )        # list
print( type({}) )        # dict

print( type(hello) )     #  NameError: name 'hello' is not defined
print("Still running")

Output:

Traceback (most recent call last):
  File "python/examples/basics/types.py", line 15, in <module>
    print( type(hello) )   # str
NameError: name 'hello' is not defined
  • Strings must be enclosed in quotes.
  • Numbers must be NOT enclosed in quotes.

Floating point limitation

print(0.1 + 0.2)   # 0.30000000000000004

x = 0.1 + 0.2
y = 0.3
print(x)   # 0.30000000000000004
print(y)   # 0.3

if x == y:
    print("They are equal")
else:
    print("They are NOT equal")

Floating point -compare using round

  • round
x = 0.1 + 0.2
y = 0.3

print(x)   # 0.30000000000000004
print(y)   # 0.3

print(round(x, 10))
if round(x, 10) == round(y, 10):
    print("They are equal")
else:
    print("They are NOT equal")

round

  • round
pi = 3.141592653589793

print(pi)             # 3.141592653589793
print(round(pi, 10))  # 3.1415926536
print(round(pi, 5))   # 3.14159
print(round(pi, 2))   # 3.14

Value Types in Numpy

Numpy but also other programming languages might have them.

  • int8
  • int32
  • float32
  • float64
  • ...

Rectangle (numerical operations)

  • =

In this example we create two variables width and height containing the numbers 23 and 17 respectively.

Unlike in math, in programming in general where you see a single equal sign = it means assignment. It means we want the value on the right-hand-side to be in the variable on the left-hand-side.

Others might say make the word/name on the left-hand-side of the = sign refer to the value that is on the right-hand-side.

In any case this is not a mathematical statement of truth not an equation, but a statement of an action.

On the next line we multiply the values in two already existing variable and assign the result to a third variable called area.

At the end we use the print function that we have already seen, to print out the results on the screen.

A simple mathematical operation.

width = 23
height = 17

area = width * height
print(area)    # 391

Multiply string

What if we put the two numbers into quotation marks and this make them strings? Strings that look like number to the naked eyes, but nevertheless are strings for Python.

If we try to multiply them we get a nasty exception. Also known as a runtime error. The program stops running.

These exceptions might look nasty, but they are our friends. They tell us what went wrong and exactly where did that happen.

You just need to remember that, at least in Python, you need to read the whole thing from the bottom to top. The last line holds the error message. Above that you can usually see the content of the line where the problem was found. One line above that you'll see the name of the file and the line number where the problem occurred.

I strongly urge you to read the error message. If it is not yet clear what is the problem, then copy it to your favorite search engine and read the explanations you find.

Eventually you'll learn to recognize these messages much faster and it will be much easier to fix the problems.

What this current error message means is we tried to multiply two strings and Python cannot do that.

width = "23"
height = "17"
area = width * height
print(area)

Output:

Traceback (most recent call last):
  File "python/examples/basics/rectangular_strings.py", line 3, in <module>
    area = width * height
TypeError: can't multiply sequence by non-int of type 'str'

Add numbers

OK, so we know how to multiply two numbers. Let's now take a giant leap and try to add two numbers together.

It works as expected. We can move on to the next challenge.

a = 19
b = 23
c = a + b
print(c)    # 42

Add strings

  • concatenation

You guessed right, we now wrap the number in quotes and try to add them together.

Surprisingly it works. Though the result is a bit strange at first. As if Python put one string after the other.

Indeed the + operator is defined when we have two strings on the two sides. It is then called concatenation.

In general you'll have to learn what the mathematical operators do when they are applied to values other than numbers. Usually the operation they do is quite logical. You just need to find the right logic.

a = "19"
b = "23"

c = a + b
print(c)    # 1923

d = b + a
print(d)    # 2319

Exercise: Calculations

  • Extend the examples/basics/rectangle_basic.py file from the earlier example to print both the area and the circumference of the rectangle.
  • Write a script called basic_circle.py that has a variable holding the radius of a circle and prints out the area of the circle and the circumference of the circle.
  • Write a script called basic_calc.py that has two numbers a and b and prints out the results of a+b, a-b, a*b, a/b

Solution: Calculations

  • math
  • pi

In order to have the math operation work properly we had to put the addition in parentheses. Just as you would in math class.

width = 23
height = 17
area = width * height
print("The area is ", area)    # 391
circumference = 2 * (width + height)
print("The circumference is ", circumference)    # 80

In order to calculate the area and the circumference of a circle we need to have PI so we created a variable called pi and put in 3.14 which is a very rough estimation. You might want to have a more exact value of PI.

r = 7
pi = 3.14
print("The area is ", r * r * pi)           # 153.86
print("The circumference is ", 2 * r * pi)  # 43.96

Python has lots of modules (aka. libraries, aka. extensions), extra code that you can import and start using. For example it has a module called math that provides all kinds of math-related functions and attributes.

A function does something, an attribute just hold some value. More about this later.

Specifically it has an attribute you can call math.pi with the value 3.141592653589793. A much better proximation of PI.

In the following solution we used that.

  • The documentation of the math module.
import math

r = 7
print("The area is ", r * r * math.pi)           # 153.9380400258998
print("The circumference is ", 2 * r * math.pi)  # 43.982297150257104

The expression r * r might also bothered your eyes. Well don't worry in Python there is an operator to express exponential values It is the double star: **. This is how we can use it to say r-square: r ** 2.

r = 7
pi = 3.14
print("The area is ", r ** 2 * pi)           # 153.86
print("The circumference is ", 2 * r * pi)  # 43.96

I don't have much to say about the calculator. I think it is quite straight forward.

a = 3
b = 2

print(a+b)   # 5
print(a-b)   # 1
print(a*b)   # 6
print(a/b)   # 1.5

Second steps

Modules

When we program in Python we basically have 3 main pieces. The base-language itself. A set of standard modules. A set of 3rd party modules.

All the modules provide additional functionality to the base-language and without them we would not be able to do much. The standard modules come installed with Python, the 3rd party modules we need to install. Once installed however they behave in the same way. We need to import them and then we can use them. We'll discuss these even more later, but we already would like to use some so let's see some basic ideas.

I know we already used the math module in the solution of the earlier exercises, but some people might have missed those.

In this example we import the sys module that contains various attributes and operations related to the Python system. (There is another module called os that provides functionality related to the Operating System.)

A few examples:

The executable attribute pointing to where the currently running Python executable is located. On MS Windows this will be a path to a python.exe file.

platform is going to be win32 on any Windows machine.

We are going to discuss the whole sys.argv thing a lot more, but for now look sys.argv[0] contains path to the current Python file.

sys.version_info contains the version information about the currently running Python. Specifically sys.version_info.major contains the major version number which 3 for Python 3 and 2 for Python 2. If really needed, you could use this to recognize when someone is trying to run your program on an unsupported version of Python.

These were all attributes that contain some fixed value.

There is also the getsizeof function that comes with the sys module. You know it is a function because you see a pair of parentheses at the end. The attributes above did not have parentheses. Functions do something. This specific function calculates the number of bytes being used by an object.

You can see an integer (both 1 and 42) use 28 bytes.

A floating point number uses 24 bytes.

An empty string uses 49 bytes.

Then each character takes another byte. (Actually this is only true in the case of Latin letters, but let's not get ahead of ourselves.)

  • The documentation of the sys module.
import sys

print( sys.executable )                 # /home/gabor/venv3/bin/python
print( sys.platform )                   # linux
print( sys.argv[0] )                    # examples/basics/modules.py
print( sys.version_info.major )         # 3

print( sys.getsizeof( 1 ) )             # 28
print( sys.getsizeof( 42 ) )            # 28
print( sys.getsizeof( 1.0 ) )           # 24

print( sys.getsizeof( "" ) )            # 49
print( sys.getsizeof( "a" ) )           # 50
print( sys.getsizeof( "ab" ) )          # 51
print( sys.getsizeof( "abcdefghij" ) )  # 59

A main function

  • main
  • def

You could write your code in the main body of your Python file, but using functions and passing arguments to it will make your code easier to maintain and understand. Therefore I recommend that you always write every script with a function called "main".

  • Function definition starts with the def keyword, followed by the name of the new function ("main" in this case), followed by the list of parameters in parentheses (nothing in this case).
  • The content or body of the function is then indented to the right.
  • The function definition ends when the indentation stops.

If you execute this file you might be surprised that nothing happens. This is so because we only defined the function, we never used it. We'll do that next.

def main():
    print("Hello")
    print("World")

This won't run as the main function is declared, but it is never called (invoked).

The main function - called

  • main
  • def

In this example I added 3 lines to the previous file. The line main() calls the main function. We sometimes also say "runs the function" or "invokes the function". In this context they mean the same.

The two print-statements are not necessary to call the function, I only added them so it will be easy to see the order of the operations that you can observe by looking at the output.

def main():
    print("Hello")
    print("World")

print("before")
main()
print("after")
before
Hello
World
after
  • Use a main function to avoid globals and better structure your code.
  • Python uses indentation for blocks instead of curly braces, and it uses the colon : to start a block.

Indentation

  • indentation

  • Standard recommendations: 4 spaces on every level.

Conditional main

  • main
  • name

def main():
    print("Hello World")

if __name__ == "__main__":
    main()
  • We'll cover this later but in case you'd like, you can include this conditional execution of the main function.

Input - Output I/O

Input

  • Keyboard (Standard Input, Command line, GUI)
  • Mouse (Touch pad)
  • Touch screen
  • Files, Filesystem
  • Network (e.g. in Web applications)

Output

  • Screen
  • File
  • Network

print in Python 2

print is one of the keywords that changed between Python 2 and Python 3. In Python 2 it does not need parentheses, in Python 3 it is a function and it needs to have parentheses.

print "hello"
print "world"
print "Foo", "Bar"
hello
world
Foo Bar
print "hello",
print "world"
print "Foo", "Bar"
hello world
Foo Bar

No newline, but a space is added at the end of the output and between values.

import sys

sys.stdout.write("hello")
sys.stdout.write("world")
helloworld

write takes exactly one parameter

print in Python 3

  • print
  • end
  • sep
print("hello")
print("world")
print("Foo", "Bar")
hello
world
Foo Bar
print("hello", end=" ")
print("world")
print("Foo", "Bar")

print("hello", end="")
print("world")


print("hello", end="-")
print("world")
hello world
Foo Bar
helloworld
hello-world

end will set the character added at the end of each print statement.

print("hello", end="")
print("world")

print("Foo", "Bar", sep="")
print("END")
helloworld
FooBar
END

sep will set the character separating values.

print in Python 2 as if it was Python 3

  • future
  • print_function
from __future__ import print_function
print("hello", end="")
print("world")
helloworld

Exception: SyntaxError: Missing parentheses in call

What if we run some code with print "hello" using Python 3?

  File "examples/basics/print.py", line 1
    print "hello"
                ^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print("hello")?

Prompting for user input in Python 2

  • raw_input
  • prompt
  • STDIN
from __future__ import print_function

def main():
    print("We have a question!")
    name = raw_input('Your name: ')
    print('Hello', name, ', how are you?')
    print('Hello ' + name + ', how are you?')

main()
/usr/bin/python2 prompt2.py

We have a question!
Your name: Foo Bar
Hello Foo Bar , how are you?
Hello Foo Bar, how are you?

What happens if you run this with Python 3 ?

/usr/bin/python3 prompt2.py
We have a question!
Traceback (most recent call last):
  File "prompt2.py", line 7, in <module>
    main()
  File "prompt2.py", line 4, in main
    name = raw_input('Your name: ')
NameError: name 'raw_input' is not defined

Prompting for user input in Python 3

  • input
  • prompt
  • STDIN

In Python 3 the raw_input() function was replaced by the input() function.

def main():
    print("We have a question!")
    name = input('Your name: ')
    print('Hello ' + name + ', how are you?')

main()

What happens if you run this using Python 2 ?

/usr/bin/python2 prompt3.py
  • What happens if we type in "Foo Bar"
We have a question!
Your name: Foo Bar
Your name: Traceback (most recent call last):
  File "prompt3.py", line 5, in <module>
    main()
  File "prompt3.py", line 2, in main
    name = input('Your name: ')
  File "<string>", line 1
    Foo Bar
          ^
SyntaxError: unexpected EOF while parsing
  • What happens if we type in just "Foo" - no spaces:
We have a question!
Your name: Foo
Your name: Traceback (most recent call last):
  File "prompt3.py", line 5, in <module>
    main()
  File "prompt3.py", line 2, in main
    name = input('Your name: ')
  File "<string>", line 1, in <module>
NameError: name 'Foo' is not defined
  • The next example shows a way to exploit the input function in Python 2 to delete the currently running script. You know, just for fun.
We have a question!
Your name: __import__("os").unlink(__file__) or "Hudini"
Hello Hudini, how are you?

Python2 input or raw_input?

In Python 2 always use raw_input() and never input().

Prompting both Python 2 and Python 3

  • raw_input
  • input
from __future__ import print_function
import sys

def main():
    if sys.version_info.major < 3:
        name = raw_input('Your name: ')
    else:
        name = input('Your name: ')
    print('Hello ' + name + ', how are you?')

main()

Add numbers entered by the user (oups)

def main():
    a = input('First number: ')
    b = input('Second number: ')
    print(a + b)

main()
First number: 2
Second number: 3
23

When reading from the command line using input(), the resulting value is a string. Even if you only typed in digits. Therefore the addition operator + concatenates the strings.

Add numbers entered by the user (fixed)

def main():
    a = input("First number: ")
    b = input("Second number: ")
    print(int(a) + int(b))
    print(a + b)


main()
First number: 2
Second number: 3
5

In order to convert the string to numbers use the int() or the float() functions. Whichever is appropriate in your situation.

Can we convert a string to int or float?

  • isdigit
  • int
  • float

for var in ["23", "2.3", "a", "2.3.4", "2x"]:
    print(var)
    if var.isdigit():
        print(f"{var} can be converted to int:", int(var))
    if var.replace(".", "", 1).isdigit():
        print(f"{var} can be converted to float:", float(var))
    print('-----')




23
23 can be converted to int: 23
23 can be converted to float: 23.0
-----
2.3
2.3 can be converted to float: 2.3
-----
a
-----
2.3.4
-----
2x
-----

How can I check if a string can be converted to a number?

  • isdecimal

  • isnumeric

  • This solution only works for integers. Not for floating point numbers.

  • We'll talk about this later. For now assume that the user enters something that can be converted to a number.

  • Wrap the code in try-except block to catch any exception raised during the conversion.

  • Use Regular Expressions (regexes) to verify that the input string looks like a number.

  • Unicode Characters in the 'Number, Decimal Digit' Category

  • isdecimal Decimal numbers (digits) (not floating point)

  • isnumeric Numeric character in the Unicode set (but not floating point number)

  • In your spare time you might want to check out the standard types of Python at stdtypes.

val = input("Type in a number: ")
print(val)
print(val.isdecimal())
print(val.isnumeric())

if val.isdecimal():
    num = int(val)
    print(num)
Type in a number: 42
42
True
True
42
Type in a number: 4.2
4.2
False
False

val = '11'
print(val.isdecimal()) # True
print(val.isnumeric()) # True

val = '1.1'
print(val.isdecimal()) # False
print(val.isnumeric()) # False

val = '٣'  # arabic 3
print(val.isdecimal()) # True
print(val.isnumeric()) # True
print(val)
print(int(val))  # 3

val = '½' # unicode 1/2
print(val.isdecimal()) # False
print(val.isnumeric()) # True
# print(float(val))  # ValueError: could not convert string to float: '½'

val = '②' # unicode circled 2
print(val.isdecimal()) # False
print(val.isnumeric()) # True
# print(int(val)) # ValueError: invalid literal for int() with base 10: '②'

Converting string to int

  • int
a = "23"
print(a)          # 23
print( type(a) )  # <class 'str'>


b = int(a)
print(b)          # 23
print( type(b) )  # <class 'int'>
a = "42 for life"
print(a)                # 42 for life
print( type(a) )        # <class 'str'>

b = int(a)
print(b)
print( type(b) )

# Traceback (most recent call last):
#   File "converting_string_to_int.py", line 5, in <module>
#     b = int(a)
# ValueError: invalid literal for int() with base 10: '42 for life'

Converting float to int

a = 2.1
print( type(a) )   # <class 'float'>
print(a)           # 2.1

b = int(2.1)
print( type(b) )   # <class 'int'>
print(b)           # 2
a = "2.1"
print(a)          # 2.1
print( type(a) )  # <class 'str'>

b = int(a)
print(b) 
print( type(b) )

# Traceback (most recent call last):
#   File "converting_floating_string_to_int.py", line 5, in <module>
#     b = int(a)
# ValueError: invalid literal for int() with base 10: '2.1'
a = "2.1"
b = float(a)
c = int(b)
print(c)                   # 2
print( type(a) )           # <class 'str'>
print( type(b) )           # <class 'float'>
print( type(c) )           # <class 'int'>

d = int( float(a) )
print(d)                   # 2
print( type(d) )           # <class 'int'>

print( int( float(2.1) ))  # 2
print( int( float("2") ))  # 2
print( int( float(2) ))    # 2

How can I check if a string can be converted to a number?

  • int
  • float
  • is_int
  • is_float

There is no is_int, we just need to try to convert and catch the exception, if there is one.

def is_float(val):
    try:
        num = float(val)
    except ValueError:
        return False
    return True

def is_int(val):
    try:
        num = int(val)
    except ValueError:
        return False
    return True

print( is_float("23") )      # True
print( is_float("23.2") )    # True
print( is_float("23x") )     # False
print( '-----' )             # -----
print( is_int("23") )        # True
print( is_int("23.2") )      # False
print( is_int("23x") )       # False

Conditionals: if

  • if
def main():
    expected_answer = "42"
    inp = input('What is the answer? ')

    if inp == expected_answer:
        print("Welcome to the cabal!")
        print("Still here")

    print("This always happens")

main()

Conditionals: if - else

  • if
  • else
def main():
    expected_answer = "42"
    inp = input('What is the answer? ')

    if inp == expected_answer:
        print("Welcome to the cabal!")
    else:
        print("Read the Hitchhiker's guide to the galaxy!")

    print("This always happens")

main()

Divide by 0

  • ZeroDivisionError

  • Another use-case for if and else:


def main():
    a = input('First number: ')
    b = input('Second number: ')

    print("Dividing", a, "by",  b)
    print(int(a) / int(b))
    print("Still running")

main()
First number: 3
Second number: 0
Dividing 3 by 0
Traceback (most recent call last):
  File "examples/basics/divide_by_zero.py", line 9, in <module>
    main()
  File "examples/basics/divide_by_zero.py", line 7, in main
    print(int(a) / int(b))
ZeroDivisionError: division by zero

Conditionals: if - else (other example)

  • if
  • else
def main():
    a = input('First number: ')
    b = input('Second number: ')

    if int(b) == 0:
        print("Cannot divide by 0")
    else:
        print("Dividing", a, "by",  b)
        print(int(a) / int(b))

    print("Still running")


main()

Conditionals: else if

  • else if
def main():
    a = input('First number: ')
    b = input('Second number: ')

    if int(a) == int(b):
        print('They are equal')
    else:
        if int(a) < int(b):
            print(a + ' is smaller than ' + b)
        else:
            print(a + ' is bigger than ' + b)

main()

Conditionals: elif

  • elif
  • else if
def main():
    a = input("First number: ")
    b = input("Second number: ")

    if int(a) == int(b):
        print("They are equal")
    elif int(a) < int(b):
        print(f"{a} is smaller than {b}")
    else:
        print(f"{a} is bigger than {b}")


main()

Ternary operator (Conditional Operator)

  • ?:

x = 3
answer = 'positive' if x > 0 else 'negative or zero'
print(answer)   # positive

x = -3
answer = 'positive' if x > 0 else 'negative or zero'
print(answer)   # negative or zero
x = 3
if x > 0:
    answer = "positive"
else:
    answer = "negative or zero"
print(answer)  # positive

x = -3
if x > 0:
    answer = "positive"
else:
    answer = "negative or zero"
print(answer)  # negative or zero

In other languages this is the ?: construct.

Case or Switch in Python: match pattern matching

  • case
  • switch
  • match
import sys

if len(sys.argv) != 2:
    print("Usage: python switch.py <status_code>")
    sys.exit(1)

status_code = int(sys.argv[1])

match status_code:
    case 100:
        print("100")
    case 200:
        print("200")
    case 200:
        print("200 again")
    case 401 | 302:
        print("401 or 302")
    case _:
        print("other")

Exercise: Rectangle

  • Write a script called basic2_rectangle_input.py that will ask for the sides of a rectangle and print out the area.
  • Provide error messages if either of the sides is negative.
python rect.py
Side: 3
Side: 4
The area is 12

Exercise: Calculator

Create a script called basic2_calculator_input.py that accepts 2 numbers and an operator (+, -, *, /), and prints the result of the operation.

python calc.py
Operand: 19
Operand: 23
Operator: +
Results: 42

Exercise: Age limit

  • Create a script called basic2_age_limit_input.py

  • Ask the user what is their age.

  • If it is above 18, tell them they can legally drink alcohol.

  • If is is above 21, tell them they can also legally drink in the USA.

  • Extra:

  • Create a separate file basic2_age_limit_all_input.py

  • Ask the user for an age and a country name tell them if they can legally drink alcohol.

  • See the Legal drinking age list.

  • Don't worry if this seems to be too difficult now to solve it in a nice way.

Exercise: What is this language?

  • Create a script called basic2_language.py

  • Ask the user the name of this programing language.

  • If they type in Python, welcome them.

  • If they type in something else, correct them.

Exercise: Standard Input

  • In the previous exercises we expected the user-input to come in on the "Standard Input" aka. STDIN.

  • If you would like to practice this more, come up with other ideas, try to solve them and tell me about the task. (in person or via e-mail.)

  • (e.g. you could start building an interactive role-playing game.)

  • Name the file basic2_stdin.py

Solution: Area of rectangle

def main():
    length = int(input('Length: '))
    width = int(input('Width: '))

    if length <= 0:
        print("length is not positive")
        return

    if width <= 0:
        print("width is not positive")
        return

    area = length * width
    print("The area is ",  area)

main()
  • For historical reasons we also have the solution in Python 2
from __future__ import print_function

def main():
    length = int(raw_input('Length: '))
    width = int(raw_input('Width: '))

    if length <= 0:
        print("length is not positive")
        return

    if width <= 0:
        print("width is not positive")
        return

    area = length * width
    print("The area is ",  area)

main()

Solution: Calculator

Here I used the format method of the strings to insert the value of op in the {} placeholder. We'll learn about this later on.


def main():
    a = float(input("Number: "))
    b = float(input("Number: "))
    op = input("Operator (+-*/): ")

    if op == '+':
        res = a+b
    elif op == '-':
        res = a-b
    elif op == '*':
        res = a*b
    elif op == '/':
        res = a/b
    else:
        print(f"Invalid operator: '{op}'")
        return

    print(res)
    return


main()

  • For historical reasons we also have the solution in Python 2
from __future__ import print_function

a = float(raw_input("Number: "))
b = float(raw_input("Number: "))
op = raw_input("Operator (+-*/): ")

if op == '+':
    res = a+b
elif op == '-':
    res = a-b
elif op == '*':
    res = a*b
elif op == '/':
    res = a/b
else:
    print("Invalid operator: '{}'".format(op))
    exit()


print(res)


Solution: Calculator eval

import os

def main():
    a = input("Number: ")
    b = input("Number: ")
    op = input("Operator (+-*/): ")

    command = a + op + b
    print(command)
    res = eval(command)
    print(res)

main()
$ python examples/basics/calculator_eval.py

Number: 2
Number: 3
Operator (+-*/): +
2+3
5

Try Again, this time:

$ python examples/basics/calculator_eval.py

Number: os.system("ls -l")
Number:
Operator (+-*/):

And then you could try it with rm -rf / or if you are on Windows try os.system("dir") or this: os.system("rm -f calculator_eval.py") and on windows it would be os.system("del calculator_eval.py").

  • Now forget this and don't use eval for the next few years!

Solution: Age limit

age = float(input('Please type in your age: '))
if 21 <= age:
    print('You can already drink alcohol. In the USA as well.')
elif 18 <= age:
    print('You can already drink alcohol. (But not in the USA.)')
else:
    print('You cannot legally drink alcohol.')

Solution: What is this language?

language = input('What is the name of this programing language? ')

if language == 'Python':
    print('Welcome!')
else:
    print(f'No. It is not "{language}", it is Python.')

STDIN vs Command line arguments

If we run this script without any command-line parameters it will print out usage information.

If we give it two parameters it will treat the first one as the name of an input file and the second as the name of an output file.

  • First try this; Then repeate. We must type in the same path again and again. Boring and error-prone.

input_file = input("Input file: ")
output_file = input("Output file: ")

print(f"This code will read {input_file}, analyze it and then create {output_file}")
...
  • We could use a Tk-based dialog:
  • Still boring (though maybe less error-prone)
from tkinter import filedialog

# On recent versions of Ubuntu you might need to install python3-tk in addition to python3 using
# sudo apt-get install python3-tk

input_file = filedialog.askopenfilename(filetypes=(("Excel files", "*.xlsx"), ("CSV files", "*.csv"), ("Any file", "*")))
output_file = filedialog.asksaveasfilename(filetypes=(("Excel files", "*.xlsx"), ("CSV files", "*.csv"), ("Any file", "*")))

print(f"This code will read {input_file}, analyze it and then create {output_file}")
...

  • The command line has
  • History!
import sys

if len(sys.argv) != 3:
    exit(f"Usage: {sys.argv[0]} INPUT_FILE OUTPUT_FILE")

input_file = sys.argv[1]
output_file = sys.argv[2]

print(f"This code will read {input_file}, analyze it and then create {output_file}")
...

Command line arguments

  • sys
  • argv
import sys

def main():
    print(sys.argv)
    print(sys.argv[0])
    print(sys.argv[1])
    print(sys.argv[2])

main()
$ python examples/basic/cli.py one two
['examples/basics/cli.py', 'one', 'two']
examples/basics/cli.py
one
two
$ python examples/basic/cli.py
['examples/basics/cli.py']
examples/basics/cli.py
Traceback (most recent call last):
  File "examples/basics/cli.py", line 6, in <module>
    print(sys.argv[1])
IndexError: list index out of range

Command line arguments - len

  • len
import sys

def main():
    print(sys.argv)
    print(len(sys.argv))

main()

Command line arguments - exit

  • exit
  • !=
import sys

def main():
    if len(sys.argv) != 2:
        exit("Usage: " + sys.argv[0] + " VALUE")
    print("Hello " + sys.argv[1])

main()
echo %errorlevel%
echo $?

Exercise: Rectangle (argv)

  • Create a script called basic2_rectangle_argv.py
  • Change the above script that it will accept the arguments on the command line like this: python basic2_rectangle_argv.py 2 4

Exercise: Calculator (argv)

  • Create a script called basic2_calculator_argv.py that accepts 2 numbers and an operator (+, -, *, /), on the command line and prints the result of the operation.
  • python basic2_calculator_argv.py 2 + 3
  • python basic2_calculator_argv.py 6 / 2
  • python basic2_calculator_argv.py 6 * 2

Solution: Area of rectangle (argv)

import sys

def main():
    if len(sys.argv) != 3:
        exit("Needs 2 arguments:  width length")

    width  = int( sys.argv[1] )
    length = int( sys.argv[2] )

    

    if length <= 0:
        exit("length is not positive")

    if width <= 0:
        exit("width is not positive")

    area = length * width
    print("The area is ",  area)

main()

Solution: Calculator (argv)

import sys


def main():
    if len(sys.argv) < 4:
        exit("Usage: " + sys.argv[0] + " OPERAND OPERATOR OPERAND")

    a = float(sys.argv[1])
    b = float(sys.argv[3])
    op = sys.argv[2]

    if op == '+':
        res = a + b
    elif op == '-':
        res = a - b
    elif op == '*':
        res = a * b
    elif op == '/':
        res = a / b
    else:
        print("Invalid operator: '{}'".format(op))
        exit()

    print(res)

main()

The multiplication probably won't work because the Unix/Linux shell replaces the * by the list of files in your current directory and thus the python script will see a list of files instead of the *. This is not your fault as a programmer. It is a user error. The correct way to run the script is python calc.py 2 '*' 3.

Solution: Calculator eval

import sys

def main():
    if len(sys.argv) != 4:
        exit(f"Usage: {sys.argv[0]} NUMBER OPERATOR NUMBER")

    command = sys.argv[1] + sys.argv[2] + sys.argv[3]
    print(command)
    res = eval(command)
    print(res)

main()
$ python examples/basics/calculator_argv_eval.py 2 + 3
5

$ python examples/basics/calculator_argv_eval.py 2 '*' 3
6
  • Now forget this and don't use eval for the next few years!

Compilation vs. Interpretation

Compiled

  • Languages: C, C++
  • Development cycle: Edit, Compile (link), Run.
  • Strong syntax checking during compilation and linking.
  • Result: Stand-alone executable code.
  • Need to compile to each platform separately. (Windows, Linux, Mac, 32bit vs 64bit).

Interpreted

  • Shell, BASIC
  • Development cycle: Edit, Run.
  • Syntax check only during run-time.
  • Result: we distribute the source code.
  • Needs the right version of the interpreted on every target machine.

Both?

  • Java (running on JVM - Java Virtual Machine)
  • C# (running on CLR - Common Language Runtime)

Is Python compiled or interpreted?

There are syntax errors that will prevent your Python code from running

x = 2
print(x)

if x > 3

File "examples/other/syntax_error.py", line 4
    if x > 3
           ^
SyntaxError: invalid syntax

There are other syntax-like errors that will be only caught during execution

x = 2
print(x)
print(y)
y = 13
print(42)

2
Traceback (most recent call last):
  File "compile.py", line 5, in <module>
    print y
NameError: name 'y' is not defined
def f():
    global y
    y = "hello y"
    print("in f")

x = 2
print(x)
f()
print(y)
y = 13
print(42)

2
in f
hello y
42
  • Python code is first compiled to bytecode and then interpreted.
  • CPython is both the compiler and the interpreter.
  • Jython and IronPython are mostly just compiler to JVM and CLR respectively.

Flake8 checking

pip install flake8


flake8 --ignore= compile.py
compile.py:3:7: F821 undefined name 'y'
compile.py:6:1: W391 blank line at end of file

If you used Anaconda you can install with:

conda install flake8

Pylint checking

pip install pylint

len = 42
print(len)
pylint bad.py
************* Module bad
bad.py:1:0: C0114: Missing module docstring (missing-module-docstring)
bad.py:2:0: W0622: Redefining built-in 'len' (redefined-builtin)
bad.py:2:0: C0103: Constant name "len" doesn't conform to UPPER_CASE naming style (invalid-name)

--------------------------------------------------------------------
Your code has been rated at -5.00/10 (previous run: -5.00/10, +0.00)

Numbers

Numbers

a = 42     # decimal
h = 0xA3C  # 2620 - hex           - staring with 0x
o = 0o171  # 121  - octal         - starting with 0o
           # 011 works in Python 2.x but not in Python 3.x
           # requires the o that works in
           # (recent versions of) Python 2.x
b = 0b101  # 5  - binary numbers - starting with 0b

r = 2.3

print(a)  #   42
print(h)  # 2620
print(o)  #  121
print(b)  #    5
print(r)  #  2.3

In Python numbers are stored as decimals, but in the source code you can also use hexadecimal, octal, or binary notations. This is especially useful if the domain you are programming in is using those kinds of numbers. For example hardware engineers often talk in hexadecimal values. In that case you won't need to constantly translate between the form used in the current domain and decimal numbers.

Operators for Numbers

  • +=
  • -=

  • ++
  • --
  • %
  • /
  • //
a = 2
b = 3
c = 2.3

d = a + b
print(d)       # 5
print(a + b)   # 5
print(a + c)   # 4.3
print(b / a)   # 1.5  # see the __future__
print(b // a)  # 1    # floor division
print(a * c)   # 4.6

print(a ** b)  # 8   (power)

print(17 % 3)  # 2   (modulus)

a += 7         # is the same as a = a + 7
print(a)       # 9

# a++          # SyntaxError: invalid syntax
# a--          # SyntaxError: invalid syntax

a += 1
print(a)       # 10
a -= 1
print(a)       # 9

There is no autoincrement (++) and autodecrement (--) in Python, because they can be expressed by += 1 and -= 1 respectively.

Integer division and the future

  • future
from __future__ import print_function

print(3/2)
$ python divide.py
1

$ python3 divide.py
1.5
from __future__ import print_function
from __future__ import division

print(3/2)     # 1.5

If you need to use Python 2, remember that by default division is integer based so 3/2 would return 1. Importing the 'division' directive from future changes this to the behavior that we usually expect 3/2 being 1.5. This is also the behavior we have in Python 3. In case you already use Python 3 and would like to get the "old" behavior, that is to get the integer part of the division, you can always call the "int" function: int(b/a).

Pseudo Random Number (uniform distribution)

  • random
import random

a = random.random()
print(a) # 0.5648261676148922  a value between 0.0 <=   < 1.0
print(random.random())
print(random.random())

Fixed random numbers

  • random
  • seed
import random

random.seed(37)

print(random.random()) # 0.6820045605879779
print(random.random()) # 0.09160260807956389
print(random.random()) # 0.6178163488614024

Rolling dice - randrange

  • randrange
import random

print( 1 + int( 6 * random.random() ))

print(random.randrange(1, 7))

# One of the following: 1, 2, 3, 4, 5, 6

Random choice

  • choice
import random

letters = "abcdefghijklmno"
print(random.choice(letters))     # pick one of the letters

fruits = ["Apple", "Banana", "Peach", "Orange", "Durian", "Papaya"]
print(random.choice(fruits))
     # pick one of the fruits

built-in method

  • A common mistake. Not calling the method.
import random

rnd = random.random
print(rnd)            # <built-in method random of Random object at 0x124b508>



y = rnd()
print(y)              # 0.7740737563564781

print(random.random)  # <built-in method random of Random object at 0x124b508>

x = rnd
print(x)              # <built-in method random of Random object at 0x124b508>
print(x())            # 0.5598791496813703

When you see a string like the above "built-in method ..." you can be almost certainly sure that you have forgotten the parentheses at the end of a method call.

Exception: TypeError: 'module' object is not callable

  • A common mistake. Calling the class and not the method.
import random

print("hello")
x = random()
print(x)
hello
Traceback (most recent call last):
  File "examples/numbers/rnd.py", line 3, in <module>
    x = random()
TypeError: 'module' object is not callable

Fixing the previous code

import random

x = random.random()
print(x)
from random import random

x = random()
print(x)

Exception: AttributeError: module 'random' has no attribute

  • A common mistake. Using the wrong filename.

This works fine:

print("Hello World")

This gives an error

import random
print(random.random())
Traceback (most recent call last):
  File "rnd.py", line 2, in <module>
    print(random.random())
AttributeError: module 'random' has no attribute 'random'

Make sure the names of your files are not the same as the names of any of the python packages.

Exercise: Number guessing game - level 0

Level 0

  • Create a file called number_guessing_game_0.py
  • Using the random module the computer "thinks" about a whole number between 1 and 20.
  • The user has to guess the number. After the user types in the guess the computer tells if this was bigger or smaller than the number it generated, or if it was the same.
  • The game ends after just one guess.

Level 1-

  • Other levels in the next chapter.

Exercise: Fruit salad

  • Write a script called fruit_salad.py based on the following skeleton, that will pick 3 fruits from a list of fruits like the one we had in one of the earlier slides. Print the 3 names.

  • Could you make sure the 3 fruits are different?

  • Use the following skeleton:

fruits = ["Apple", "Banana", "Peach", "Orange", "Durian", "Papaya"]

Solution: Number guessing game - level 0

import random

hidden = random.randrange(1, 21)
print("The hidden values is", hidden)

user_input = input("Please enter your guess: ")
print(user_input)

guess = int(user_input)
if guess == hidden:
    print("Hit!")
elif guess < hidden:
    print("Your guess is too low")
else:
    print("Your guess is too high")

Solution: Fruit salad

  • random
  • sample
import random

fruits = ["Apple", "Banana", "Peach", "Orange", "Durian", "Papaya"]
salad = random.sample(fruits, 3)
print(salad)

Comparison and Boolean

if statement again

  • if
  • ==
x = 2

if x == 2:
    print("it is 2")
else:
    print("it is NOT 2")


if x == 3:
    print("it is 3")
else:
    print("it is NOT 3")

# it is 2
# it is NOT 3


Comparison operators

  • ==
  • !=
  • <
  • <=
  • =

==             equal
!=             not equal

<              less than
<=             less than or equal
>              greater than
>=             greater than or equal

Compare numbers, compare strings

x = 2
y = 3

if x < y:
    print("x is less than y")

# x is less than y
x = "Snake"
y = "Stake"

if x < y:
    print("x is less than y")

# x is less than y

z = "מלון"
q = "בלון"

if z < q:
    print(f"{z} in z is less than {q}")
else:
    print(f"{q} in q is less than {z}")

print(x < z)


x = "👸"
y = "💂"

if x < y:
    print(f"{x} in x is less than {y}")
else:
    print(f"{y} in y is less than {x}")


print(1 < 2)      # True
print("1" < "2")  # True


print(2 < 11)      # True
print("2" < "11")  # False

Do NOT Compare different types!

x = 12
y = 3
result = "Yes" if x > y else "No"
print(result) # Yes

x = "12"
y = "3"
print("Yes" if x > y else "No") # No

x = "12"
y = 3
print("Yes" if x > y else "No") # Yes

x = 12
y = "3"
print("Yes" if x > y else "No")  # No

In Python 2 please be careful and only compare the same types. Otherwise the result will look strange.

Yes
No
Yes
No

In Python 3, comparing different types raises exception:

Yes
No
Traceback (most recent call last):
  File "examples/other/compare.py", line 11, in <module>
    print("Yes" if x > y else "No") # Yes
TypeError: '>' not supported between instances of 'str' and 'int'

Complex if statement with boolean operators

  • Boolean operators or Logical operators
  • and
  • or
  • not
age = 16
name = "Foo"

if 0 < age and age <= 18:
    print("age is bewteen 0 and 18")
else:
    print("age is NOT between 0 and 18")

if age < 18 or 65 < age:
    print("Young or old")
else:
    print("Working age")


if age < 18 and not name == "Foo":
    print("True")
else:
    print("False")

Chained expressions

age = 16
name = "Foo"

if 0 < age and age <= 18:
    print("age is bewteen 0 and 18")
else:
    print("age is NOT between 0 and 18")

if 0 < age <= 18:
    print("age is bewteen 0 and 18")
else:
    print("age is NOT between 0 and 18")



Boolean operators

  • and

  • or

  • not

  • and

  • or

  • not

if COND:
   do something
else:
   do something other

if not COND:
    do something other

if COND1 and COND2:
    do something

if COND1 or COND2:
    do something

if COND1 and not COND2:
    do something

Boolean truth tables

COND1 and COND2     Result
True      True      True
True      False     False
False     True      False
False     False     False
COND1 or COND2      Result
True      True      True
True      False     True
False     True      True
False     False     False
not COND     Result
True         False
False        True

Boolean values: True and False

  • True
  • False

In this chapter we are going to talk about boolean values and operations on boolean values.

Unlike in some other languages Python actually has 2 special symbols to represent True and False.

(In those languages 0 usually represents False and 1 represents True.)

  • True
  • False

Using True and False in variables

x = True
y = False


if x:
    print("X is True")
else:
    print("X is False")


if y:
    print("Y is True")
else:
    print("Y is False")


# X is True
# Y is False

Comparison returns True or False

a = "42"
b = 42

print(a)             # 42
print(b)             # 42
print(a == b)        # False
print(a != b)        # True
print(b == 42.0)     # True

print(None == None)  # True
print(None == False) # False

Assign comparisons to variables

  • True and False are real boolean values.

  • False

  • True

x = 2

v = x == 2
print(v)
if v:
    print(v, "is true - who would thought? ")

v = x == 3
print(v)
if v:
    print(v, "is true - who would thought? ")
else:
    print(v, "is false - who would thought? ")

# True
# True is true - who would thought?
# False
# False is false - who would thought?

Flag


correct = False

name = input("The name of this language: ")
if name == "Python":
    correct = True

if correct:
    print("The input was correct")

Use flag to skip first few lines

We have a series of rows that we might read from a file and would like to process the sections of rows that start with a well-defined row. Unfortunately the file does not always start with a row that matches the definition. In some cases there are a few lines at the beginning of the file that we need to throw away before we can start our processing.

In this exacmple we use series of numbers to represent the rows of that file and the "well defined condtion to start the series is a number being "big".

We can use a variable as a "flag" to indicate if we are still before the first good section or if the sections already started.


def print_series(series):
    #started = False;
    for val in series:
        if val > 10:
        #    started = True
            print("start new series")
            print(val)

        #if not started:
        #    continue

        if val <= 10:
            print(val)


print_series([20, 2, 3, 30, 1, 7])
print()
print_series([1, 4, 20, 2, 3, 30, 1, 7])

Toggle

  • not

machine_is_on = False
print(machine_is_on)   # False

# Instead of this:

if machine_is_on:
    machine_is_on = False
else:
    machine_is_on = True

# Write this:

machine_is_on = not machine_is_on
print(machine_is_on)   # True

machine_is_on = not machine_is_on
print(machine_is_on)   # False

Short circuit

def check_money():
    return money > 1000000

def check_salary():
    salary += 1
    return salary >= 1000

while True:
    if check_money() or check_salary():
        print("I can live well")

Short circuit fixed

def check_money():
    return money > 1000000

def check_salary():
    salary += 1
    return salary >= 1000

while True:
    has_good_money = check_money()
    has_good_salary = check_salary()

    if has_good_money or has_good_salary:
        print("I can live well")

Does this value count as True or False?

x = 23

if x:
    print("23 is true")

if x != 0:
    print("23 is true")

y = 0
if y:
    print("0 is true")
else:
    print("0 is false")

if y != 0:
    print("0 is true")
else:
    print("0 is false")

# 23 is true
# 0 is false

True and False values in Python

  • None
  • 0
  • "" (empty string)
  • False
  • []
  • {}
  • ()

Everything else is true.

values = [None, 0, "", False, [], (), {}, "0", True]

for v in values:
    if v:
        print("True value:  ", v)
    else:
        print("False value: ", v)

# False value:  None
# False value:  0
# False value:
# False value:  False
# False value:  []
# False value:  ()
# False value:  {}
# True value:   0
# True value:   True

None is like undef or Null or Nill in other languages.

Incorrect use of conditions

In your normal speech you could probably say something like "If status_code is 401 or 302, do something.". Meaning status_cone can be either 401 or 302.

If you tried to translate this into code directly you would write something like this:

if status_code == 401 or 302:
    pass

Python treats it as if we wrote:

if (status_code == 401) or 302:
    pass

However, this is incorrect. This condition will always be true as this is actually same as if you wrote: if (status_code == 401) or (302) so it will compare status_code to 401, and it will separately check if 302 is True, but any number different from 0 is considered to be True so the above expression will always be True.

What you probably meant is this:

if status_code == 401 or status_code == 302:
    pass

Alternative way:

An alternative way to achieve the same results would be though probably at this point we have not learned the "in" operator, nor lists (comma separated values in square brackets):

if status_code in [401, 302]:
    pass

Exercise: compare numbers

  • Create a file called bool_compare_numbers.py
  • Ask the user to enter two numbers and tell us which one is bigger.

Exercise: compare strings

  • Create a file called bool_compare_strings.py
  • You can use the len() function to get the length of the string.
  • Ask the user to enter two strings
  • Then ask the user to select if she wants to compare them based on Unicode or based on their length
  • Then tell us which one is bigger.
Input a string: (user types string and ENTER)
Input another string: (user types string and ENTER)
How to compare:
1) Unicode
2) Length
(user types 1 or 2 and ENTER)

Solution: compare numbers

a_in = input("Please type in a whole number: ")
b_in = input("Please type in another whole number: ")

if not a_in.isdecimal():
    exit("First input was not a whole number")
if not b_in.isdecimal():
    exit("Second input was not a whole number")


a_num = float(a_in)
b_num = float(b_in)

if a_num > b_num:
    print("First number is bigger")
elif a_num < b_num:
    print("First number is smaller")
else:
    print("They are equal")

Solution: compare strings

a_in = input("Please type in a string: ")
b_in = input("Please type in another string: ")
print("How to compare:")
print("1) ASCII")
print("2) Length")
how = input()

if how == '1':
    first_is_bigger = a_in > b_in
    second_is_bigger = a_in < b_in
elif how == '2':
    first_is_bigger = len(a_in) > len(b_in)
    second_is_bigger = len(a_in) < len(b_in)

if first_is_bigger:
    print("First number is bigger")
elif second_is_bigger:
    print("First number is smaller")
else:
    print("They are equal")

Strings

Single quoted and double quoted strings

In Python, just as in most of the programming languages you must put any free text inside a pair of quote characters. Otherwise Python will try to find meaning in the text.

These pieces of texts are called "strings".

In Python you can put string between two single quotes: '' or between two double quotes: "". Which one, does not matter.

soup = "Spiced carrot & lentil soup"
salad = 'Ceasar salad'

print(soup)
print(salad)
Spiced carrot & lentil soup
Ceasar salad

Long lines

text = "abc" "def"
print(text)

other = "abcdef"
print(other)


long_string = "one" "two" "three"
print(long_string)

short_rows = "one" \
    "two" \
    "three"
print(short_rows)

long_string = "first row second row third row"
print(long_string)

shorter = "first row \
second row \
third row"
print(shorter)
abcdef
abcdef
onetwothree
onetwothree
first row second row third row
first row second row third row

Multiline strings

  • We would like to print the number one under the other
text = "Joe:   23\nJane:   7 \nJacqueline  19\n"

print(text)
Joe:   23
Jane:   7 
Jacqueline  19

Triple quoted strings (multiline)

  • """
  • '''

If you would like to create a string that spreads on multiple lines, there is a possibility to put the text between 3 quotes on both sides. Either 23 single-quotes or 23 double-quotes.

text = """
Joe:        23
Jane:        7
Jacqueline  19
"""

print(text)

Joe:        23
Jane:        7
Jacqueline  19

Can spread multiple lines.

first row
second row
third row

Triple quoted comments - documentation

"""
Documentation of the module
"""

def some_funcion():
    "Documentation of the function"
    pass

text = """first row
second row
third row"""

"a string"

"""another
   longer
string with code:
print("this is not printed")
"""


print("Hello World")

String length (len)

  • len

The len function returns the length of the string in number of characters.

line = "Hello World"
hw = len(line)
print(hw)  # 11

text = """Hello 
World"""
print(len(text))  # 12

String repetition and concatenation

You might be used to the fact that you can only multiply numbers, but in python you can also "multiply" a string by a number. It is called repetition. In this example we have a string "Jar " that we repeat twice.

We can also add two strings to concatenate them together.

I don't think the repetition operator is used very often, but in one case it could come in very handy. When you are writing some text report and you'd like to add a long line of dashes that would be exactly the same length as your title.

name = 2 * 'Jar '
print(name)        # Jar Jar

full_name = name + 'Binks'
print(full_name)   # Jar Jar Binks


title = "We have some title"
print(title)
print('-' * len(title))

# We have some title
# ------------------

A character in a string

  • []
text = "Hello World"

a = text[0]
print(a)      # H

b = text[6]
print(b)      # W

String slice (instead of substr)

  • slice
  • substr
  • [:]
  • :
text = "Hello World"

b = text[1:4]
print(b)          # ell

print(text[2:])   # llo World
print(text[:2])   # He

start = 1
end = 4
print(text[start:end])  # ell

Change a string

  • immutable

In Python strings are "immutable", meaning you cannot change them. You can replace a whole string in a variable, but you cannot change it.

In the following example we wanted to replace the 3rd character (index 2), and put "Y" in place. This raised an exception

text = "abcd"
print(text)     # abcd

text[2] = 'Y'

print("done")
print(text)
abcd
Traceback (most recent call last):
  File "string_change.py", line 4, in <module>
    text[2] = 'Y'
TypeError: 'str' object does not support item assignment

Replace part of a string

  • Strings in Python are immutable - they never change.

How to change a string

text = "abcd"
print(text)      # abcd

text = text[:2] + 'Y' + text[3:]
print(text)      # abYd

String copy

text = "abcd"
print(text)     # abcd

text = text + "ef"
print(text)     # abcdef

other = text
print(other)     # abcdef
text = "xyz"
print(text)     # xyz
print(other)     # abcdef

When assigning a variable pointing to a string, the new variable is pointing to the same string.. If we then assign some other string to either of the variables, then they will point to two different strings.

String functions and methods (len, upper, lower)

  • len
  • upper
  • lower
a = "xYz"
print(len(a))     # 3

b = a.upper()
print(b)          # XYZ
print(a)          # xYz   - immutable!
print(a.lower())  # xyz

index in string

  • index
  • ValueError
text = "The black cat climbed the green tree."
print(text.index("bl"))     # 4
print(text.index("The"))    # 0
print(text.index("the"))    # 22
print(text.index("dog"))
4
0
22
Traceback (most recent call last):
  File "index.py", line 5, in <module>
    print(text.index("dog"))
ValueError: substring not found

index in string with range

  • index
text = "The black cat climbed the green tree."
print(text.index("c"))      # 7
print(text.index("c", 8))   # 10

print(text.index("gr", 8))      # 26
print(text.index("gr", 8, 16))
7
10
26
Traceback (most recent call last):
  File "examples/strings/index2.py", line 8, in <module>
    print a.index("gr", 8, 16) 
ValueError: substring not found

Find all in the string

Later, when we learned loops.

rindex in string with range

  • rindex
text = "The black cat climbed the green tree."
print(text.rindex("c"))         # 14
print(text.rindex("c", 8))      # 14
print(text.rindex("c", 8, 13))  # 10

print(text.rindex("gr", 8))     # 26
print(text.rindex("gr", 8, 16))
14
14
10
26
Traceback (most recent call last):
  File "examples/strings/rindex.py", line 10, in <module>
    print(a.rindex("gr", 8, 16))
ValueError: substring not found

find in string

  • find
  • rfind

Alternatively use find and rfind that will return -1 instead of raising an exception.

text = "The black cat climbed the green tree."
print(text.find("bl"))     # 4
print(text.find("The"))    # 0
print(text.find("dog"))    # -1

print(text.find("c"))      # 7
print(text.find("c", 8))   # 10

print(text.find("gr", 8))      # 26
print(text.find("gr", 8, 16))  # -1


print(text.rfind("c", 8))   # 14

in string

  • in

Check if a substring is in the string?

txt = "hello world"
if "wo" in txt:
    print('found wo')

if "x" in txt:
    print("found x")
else:
    print("NOT found x")
found wo
NOT found x

index if in string

  • index
  • in
sub = "cat"
txt = "The black cat climbed the green tree"

if sub in txt:
    loc = txt.index(sub)
    print(sub + " is at " + str(loc))

sub = "dog"
if sub in txt:
    loc = txt.index(sub)
    print(sub + " is at " + str(loc))
    
# cat is at 10

Encodings: ASCII, Windows-1255, Unicode

raw strings

  • r
# file_a = "c:\Users\Foobar\readme.txt"
# print(file_a)

# Python2:  eadme.txtFoobar
# Python3:
#   File "examples/strings/raw.py", line 6
#     file_a = "c:\Users\Foobar\readme.txt"
#             ^
# SyntaxError: (unicode error) 'unicodeescape' codec
#    can't decode bytes in position 2-3: truncated \UXXXXXXXX escape


file_b = "c:\\Users\\Foobar\\readme.txt"
print(file_b)  # c:\Users\Foobar\readme.txt

file_c = r"c:\Users\Foobar\readme.txt"
print(file_c)  # c:\Users\Foobar\readme.txt

text = r"text \n \d \s \ and more"
print(text)    # text \n \d \s \ and more

Escape sequences are kept intact and not escaped. Used in regexes.

ord


print( ord('a') )   # 97
print( ord('=') )   # 61
print( ord('\r') )  # 13
print( ord('\n') )  # 10
print( ord(' ') )   # 32

print( ord('á') )   # 225    Hungraian
print( ord('ó') )   # 243
print( ord('א') )   # 1488   Hebrew alef
print( ord('أ') )   # 1571   Arabic/Farsi
print( ord('α') )   # 945    Greek
print( ord('ㅏ') )  # 12623  Korean
print( ord('😈') )  # 128520

chr - number to character


print( chr(33) )  # !
print( chr(48) )  # 0
print( chr(65) )  # A

print( chr(225) )    # á   Hungraian
print( chr(243) )    # ó   Hungraian
print( chr(1489) )   # ב   Hebrew bet
print( chr(1572) )   # ؤ   Arabic/Farsi
print( chr(945) )    # α   Greek
print( chr(959) )    # ο   Greek omicron
print( chr(937) )    # Ω   Greek omega
print( chr(931) )    # Σ   Greek sigma
print( chr(4632) )   # መ   Amharic
print( chr(12624) )  # ㅐ  Korean
print( chr(128519) ) # 😇
print( chr(128520) ) # 😈

Exercise: one string in another string

  • Write script called string_in_another_string.py that accepts two strings and tells if one of them can be found in the other and where?

Exercise: Character to Unicode-8 - CLI

Write script called char_to_unicode.py that gets a character on the command line and prints out the Unicode code of it.

Maybe even:

Write script that gets a string on the command line and prints out the Unicode code of each character.

Exercise: from Unicode to character - CLI

Write script called unicode_to_char.py that accepts a number on the command line and prints the character represented by that number.

Exercise: ROT13

  • rot13

  • Implement ROT13:

  • Create a script called rot13.py that given a string on the command line will print the ROT13 version of the string.

  • It should work like this:

$ python rot13.py "Hello World!"
Uryyb Jbeyq!

$ python rot13.py "Uryyb Jbeyq!"
Hello World!

Solution: one string in another string

import sys

if len(sys.argv) != 3:
    exit(f"Usage: {sys.argv[0]} short-STRING long-STRING")

string = sys.argv[1]
text   = sys.argv[2]

if string in text:
    loc = text.index(string)
    print(string, "can be found in ", text, "at", loc)
else:
    print(string, "can NOT be found in ", text)

Solution: compare strings

mode = input("Mode of comparision: [length|ascii]")
if mode != "length" and mode != "ascii":
    print("Not good")
    exit()

str1 = input("String 1:")
str2 = input("String 2:")

if mode == "length":
    if len(str1) > len(str2):
        print("First is longer")
    elif len(str1) < len(str2):
        print("Second is longer")
    else:
        print("They are of equal length")
elif mode == "ascii":
    if str1 > str2:
        print("First is later in the ABC order")
    elif str1 < str2:
        print("Second is later in the ABC order")
    else:
        print("The strings are equal")

Solution: to Unicode CLI

import sys

if len(sys.argv) != 2:
    exit(f"Usage: {sys.argv[0]} CHARACTER")

print( ord( sys.argv[1]) )
import sys

if len(sys.argv) != 2:
    exit(f"Usage: {sys.argv[0]} STRING")

for cr in sys.argv[1]:
    print( ord( cr ) )

Solution: from Unicode CLI

import sys

if len(sys.argv) != 2:
    exit(f"Usage: {sys.argv[0]} NUMBER")

print( chr( int(sys.argv[1]) ) )

Solution: Show characters based on Unicode code-points

import sys

if len(sys.argv) != 3:
    exit(f"Usage: {sys.argv[0]} START END")

start, end = sys.argv[1:]
for decimal in range(int(start), int(end)+1):
    print(f"{decimal} {chr(decimal)}")

# Emojis:
# 127744 -
# 128506 - 128591

Solution: ROT13

  • rot13
  • codecs
  • encoding
import sys

if len(sys.argv) != 2:
    exit(f"Usage: {sys.argv[0]} TEXT")

original = sys.argv[1]

encoded = ''
for char in original:
    code = ord(char)
    if 'a' <= char <= 'z':
    #if ord('a') <= code and code <= ord('z'):
        new_char = chr((code-ord('a') + 13 ) % 26 + ord('a'))
    elif 'A' <= char <= 'Z':
        new_char = chr((code-65 + 13 ) % 26 + 65)
    else:
        new_char = char

    encoded += new_char

print(encoded)

Of course instead of implementing all the calculations by yourself you can also rely on a module that comes with Python:

import sys
import codecs

if len(sys.argv) != 2:
    exit(f"Usage: {sys.argv[0]} TEXT")

original = sys.argv[1]

encoded = codecs.encode(original, encoding='rot_13')

print(encoded)

Infinite loop

i = 0
while True:
    i += 1
    print(i)

print("done")

break

  • break
i = 0
while True:
    print(i)
    i += 1
    if i >= 7:
        break

print("done")
0
1
2
3
4
5
6
done

continue

  • continue
i = 0
while True:
    i += 1

    if i > 3 and i < 8:
        continue

    if i > 10:
        break
    print(i)
1
2
3
8
9
10

While with many conditions

while (not found_error) and (not found_warning) and (not found_exit):
    do_the_real_stuff()

while True:
    line = get_next_line()

    if found_error:
        break

    if found_warning:
        break

    if found_exit:
        break

    do_the_real_stuff()

while loop with many conditions


while True:
   line = get_next_line()

   if last_line:
       break

   if line is empty:
      continue

   if line_has_a_hash: # at the  beginning:
      continue

   if line_has_two_slashes: // at the beginning:
      continue

   do_the_real_stuff()

ord in a file

  • ord
import sys

filename = sys.argv[1]

with open(filename) as fh:
   content = fh.read()

for c in content:
   print(ord(c))

Strings as Comments

  • '''

marks single line comments.

There are no real multi-line comments in Python, but we can use triple-quots to create multi-line strings and if they are not part of another statement, they will be disregarded by the Python interpreter. Effectively creating multi-line comments.

print("hello")

'A string which is disregarded'

print(42)

'''
  Using three single-quotes on both ends (a triple-quoted string)
  can be used as a multi-line comment.
'''

print("world")

Loops

Loops: for-in and while

  • for in - to iterate over a well defined list of values. (characters, range of numbers, shopping list, etc.)
  • while - repeat an action till some condition is met. (or stopped being met)

for-in loop on strings

  • for
txt = 'hello world'
for ch in txt:
    print(ch)
h
e
l
l
o
 
w
o
r
l
d

for-in loop on list

  • for
fruits = ["Apple", "Banana", "Peach", "Orange", "Durian", "Papaya"]
for fruit in fruits:
    print(fruit)
Apple
Banana
Peach
Orange
Durian
Papaya

for-in loop on range

  • range
for ix in range(3, 7):
    print(ix)
3
4
5
6

Iterable, iterator

for in loop with early end using break

  • break
txt = 'hello world'
for ch in txt:
    if ch == ' ':
        break
    print(ch)

print("Here")
h
e
l
l
o
Here

for in loop skipping parts using continue

  • continue
txt = 'hello world'
for ch in txt:
    if ch == ' ':
        continue
    print(ch)

print("done")

h
e
l
l
o
w
o
r
l
d
done

for in loop with break and continue

txt = 'hello world'
for cr in txt:
    if cr == ' ':
        continue
    if cr == 'r':
        break
    print(cr)

print('done')
h
e
l
l
o
w
o
done

while loop

  • while
import random

total = 0
while total <= 100:
    print(total)
    total += random.randrange(20)

print("done")
0
10
22
29
45
54
66
71
77
82
93
done

Infinite while loop

  • while
import random

total = 0
while total >= 0:
    print(total)
    total += random.randrange(20)

print("done")
...
1304774
1304779
1304797
^C1304803
Traceback (most recent call last):
  File "while_infinite.py", line 5, in <module>
    print(total)
KeyboardInterrupt

  • Don't do this!
  • Make sure there is a proper end-condition. (exit-condition)
  • Use Ctrl-C to stop it

While with complex expression

import random

def random_loop():
    total = 0
    while (total < 10000000) and (total % 17 != 1) and (total ** 2 % 23 != 7):
        print(total)
        total += random.randrange(20)

        # do the real work here

    print("done")

if __name__ == '__main__':
    random_loop()
0
12
25
26
34
50
65
77
done

While with break

import random

def random_loop():
    total = 0
    while total < 10000000:
        if total % 17 == 1:
            break

        if total ** 2 % 23 == 7:
            break

        print(total)
        total += random.randrange(20)

        # do the real work here

    print("done")

if __name__ == '__main__':
    random_loop()
0
12
25
26
34
50
65
77
done

While True

import random

def random_loop():
    total = 0
    while True:
        if total >= 10000000:
            break

        if total % 17 == 1:
            break

        if total ** 2 % 23 == 7:
            break

        print(total)
        total += random.randrange(20)

        # do the real work here

    print("done")

if __name__ == '__main__':
    random_loop()
0
12
25
26
34
50
65
77
done

Testing the refactoring of the while loop

import while_break
import while_complex_condition
import while_true

import random
import pytest

@pytest.mark.parametrize('seed', [0, 7, 9, 21])
def test_random_loop(capsys, seed):
    random.seed(seed)
    while_complex_condition.random_loop()
    out_complex, _ = capsys.readouterr()

    random.seed(seed)
    while_break.random_loop()
    out_break, _ = capsys.readouterr()

    assert out_complex == out_break

    random.seed(seed)
    while_true.random_loop()
    out_true, _ = capsys.readouterr()
    assert out_complex == out_true

    print(out_true)

def test_newest_random_loop_0(capsys):
    expected = """0
12
25
26
34
50
65
77
done
"""
    random.seed(0)
    while_true.random_loop()
    out_true, _ = capsys.readouterr()
    assert out_true == expected

def test_newest_random_loop_7(capsys):
    expected = """0
10
14
26
27
29
46
49
60
78
79
95
101
102
104
117
130
132
139
141
158
done
"""
    random.seed(7)
    while_true.random_loop()
    out_true, _ = capsys.readouterr()
    assert out_true == expected


pytest test_random_loop.py
pytest -s test_random_loop.py

Duplicate input call

  • Ask the user what is their ID number.
  • Check if it is a valid ID number. (To make our code more simple we only check the length of the string.)
  • Ask again if it was not a valid number.

id_str = input("Type in your ID: ")

if len(id_str) != 9:
   id_str = input("Type in your ID")

print("Your ID is " + id_str)
  • Realize, that if the user types in an incorrect string for the 2nd time, our code does not check it.

Duplicate input call with loop

  • A while loop would be a better solution.
  • This works, but now we have duplicated the input call and the text is different in the two cases. DRY
  • We can't remove the first call of input as we need the id_str variable in the condition of the while already.

id_str = input("Type in your ID: ")

while len(id_str) != 9:
   id_str = input("Type in your ID")

print("Your ID is " + id_str)

Eliminate duplicate input call

  • We can use the while True construct to avoid this duplication.
while True:
   id_str = input("Type in your ID: ")
   if len(id_str) == 9:
       break

print("Your ID is " + id_str)

do while loop

  • do while

  • There is no do ... while in Python but we can write code like this to have similar effect.


while True:
    answer = input("What is the meaning of life? ")
    if answer == '42':
        print("Yeeah, that's it!")
        break

print("done")

while with many continue calls

  • continue

while True:
   line = get_next_line()
   
   if last_line:
       break
   
   if line_is_empty:
      continue

   if line_has_an_hash_at_the_beginning: # #
      continue

   if line_has_two_slashes_at_the_beginning: # //
      continue

   do_the_real_stuff

Break out from multi-level loops

Not supported in Python. "If you feel the urge to do that, your code is probably too complex. Create functions!"

while external():
    while internal():
        if ...:
            break
        if ...:
            continue

For-else

The else part will be executed if the loop finished all the iterations without calling break.

found_number_bigger_than_10 = False

numbers = [2, 3, 4]
for num in numbers:
    if num > 10:
        found_number_bigger_than_10 = True
        break
    print(num)

if found_number_bigger_than_10:
    print("found number bigger than 10")

print('---------------------')

found_number_bigger_than_10 = False

numbers = [2, 3, 12, 4]
for num in numbers:
    if num > 10:
        found_number_bigger_than_10 = True
        break
    print(num)

if found_number_bigger_than_10:
    print("found number bigger than 10")

print('---------------------')



for num in [2, 3, 4]:
    if num > 10:
        break
    print(num)
else:
    print("in else - finished without calling break")
    print("not found number bigger than 10")

print('---------------------')

for num in [2, 3, 12, 4]:
    if num > 10:
        break
    print(num)
else:
    print("in else - finished after calling break")
    print("not found number bigger than 10")

2
3
4
---------------------
2
3
found number bigger than 10
---------------------
2
3
4
in else - finished without calling break
not found number bigger than 10
---------------------
2
3

Exercise: Print all the locations in a string

  • Create a file called location_in_string.py
  • Given a string like "The black cat climbed the green tree.", print out the location of every "c" character.

Expected:

7
10
14

Exercise: Number guessing game

  • Every level must include all the features from all the lower levels as well.

Level 0

  • Create a file called number_guessing_game_0.py
  • Using the random module the computer "thinks" about a whole number between 1 and 20.
  • The user has to guess the number. After the user types in the guess the computer tells if this was bigger or smaller than the number it generated, or if was the same.
  • The game ends after just one guess.

Level 1

  • Create a file called number_guessing_game_1.py
  • The user can guess several times. The game ends when the user guessed the right number.

Level 2

  • Create a file called number_guessing_game_2.py
  • If the user hits 'x', we leave the game without guessing the number.

Level 3

  • Create a file called number_guessing_game_3.py
  • If the user presses 's', show the hidden value (cheat)

Level 4

  • Create a file called number_guessing_game_4.py
  • Soon we'll have a level in which the hidden value changes after each guess. In order to make that mode easier to track and debug, first we would like to have a "debug mode".
  • If the user presses 'd' the game gets into "debug mode": the system starts to show the current number to guess every time, just before asking the user for new input.
  • Pressing 'd' again turns off debug mode. (It is a toggle each press on 'd' changes the value to to the other possible value.)

Level 5

  • Create a file called number_guessing_game_5.py
  • The 'm' button is another toggle. It is called 'move mode'. When it is 'on', the hidden number changes a little bit after every step (+/-2). That is, it is chaning by one of the following: -2, -1, 0, 1, 2. Pressing 'm' again will turn this feature off.

Level 6

  • Create a file called number_guessing_game_6.py
  • Let the user play several games.
  • Pressing 'n' will skip this game and start a new one. Generates a new number to guess.

Exercise: Count unique characters

  • Create file called count_unique_characters.py
  • Given a string on the command line, count how many different characters it has.
python count_unique.py abcdaaa
4

Exercise: Convert for-loop to while-loop

  • Update the following file.

  • Given a for-loop as in the following code, convert it to be using a while-loop.

  • Range with 3 parameters: from the first number, till the second number, with step the 3rd number range(from, to, step)


for ix in range(3, 25, 4):
    print(ix)
3
7
11
15
19
23

Solution: Print all the locations in a string

When you start thinking about this exercise, you probably call loc = text.find("c") and then you wonder how could you find the next element. After a while it might occur to you that the find method can get a second parameter to set the location where we start the search.

Basically you need to call loc = text.find("c", loc + 1) but that looks strange. How can you use loc (as a parameter of the function) and also assign to it. However programming languages don't have a problem with this as the assignment happens after the right-hand-side was fully executed.

The problem that now you have two different calls to find. The first one and all the subsequent calls.

How could we merge the two calls?

The trick is that you need to have an initial value for the loc variable and it has to be -1, so when we call find for the first time, it will start from the first character (index 0).

text = "The black cat climbed the green tree."
loc = -1
while True:
    loc = text.find("c", loc+1)
    if loc == -1:
        break
    print(loc)

Using an additional variable might make the code easier to read:

text = "The black cat climbed the green tree."
start = 0
while True:
    loc = text.find("c", start)
    if loc == -1:
        break
    print(loc)
    start = loc + 1

Solution 1 for Number Guessing

import random

hidden = random.randrange(1, 21)
while True:
    user_input = input("Please enter your guess: ")
    print(user_input)

    guess = int(user_input)
    if guess == hidden:
        print("Hit!")
        break

    if guess < hidden:
        print("Your guess is too low")
    else:
        print("Your guess is too high")

Solution 2 for Number Guessing (x)

The main trick is that you check for the input being "x" before you try to convert it to an integer.

import random

hidden = random.randrange(1, 201)
while True:
    user_input = input("Please enter your guess[x]: ")
    print(user_input)

    if user_input == 'x':
        print("Sad to see you leaving early")
        exit()

    guess = int(user_input)
    if guess == hidden:
        print("Hit!")
        break

    if guess < hidden:
        print("Your guess is too low")
    else:
        print("Your guess is too high")

Solution 3 for Number Guessing (s)

import random

hidden = random.randrange(1, 201)
while True:
    user_input = input("Please enter your guess [x|s|d]: ")
    print(user_input)

    if user_input == 'x':
        print("Sad to see you leaving early")
        exit()

    if user_input == 's':
        print("The hidden value is ", hidden)
        continue

    guess = int(user_input)
    if guess == hidden:
        print("Hit!")
        break

    if guess < hidden:
        print("Your guess is too low")
    else:
        print("Your guess is too high")

Solution for Number Guessing (debug)

One important thing is to remember that you can create a toggle by just calling not on a boolean variable every time you'd like to flip the switch.

The other one is that flipping the switch (pressing d) and printing the current value because debug mode is on, are two separate operations that are not directly related and so they can be implemented separately.

import random

hidden = random.randrange(1, 201)
debug = False
while True:
    if debug:
        print("Debug: ", hidden)

    user_input = input("Please enter your guess [x|s|d]: ")
    print(user_input)

    if user_input == 'x':
        print("Sad to see you leaving early")
        exit()

    if user_input == 's':
        print("The hidden value is ", hidden)
        continue

    if user_input == 'd':
        debug = not debug
        continue

    guess = int(user_input)
    if guess == hidden:
        print("Hit!")
        break

    if guess < hidden:
        print("Your guess is too low")
    else:
        print("Your guess is too high")

Solution for Number Guessing (move)

import random

UPPER_LIMIT = 200

hidden = random.randrange(1, UPPER_LIMIT + 1)
debug = False
move = False
while True:
    if debug:
        print(f"Debug: {hidden}")
        print(f"Move: {move}")

    if move:
        mv = random.randrange(-2, 3)
        if 1 <= hidden + mv <= UPPER_LIMIT:
            hidden = hidden + mv

    user_input = input("Please enter your guess [x|s|d|m]: ")
    print(user_input)

    if user_input == 'x':
        print("Sad to see you leaving early")
        exit()

    if user_input == 's':
        print("The hidden value is ", hidden)
        continue

    if user_input == 'd':
        debug = not debug
        continue

    if user_input == 'm':
        move = not move
        continue

    guess = int(user_input)
    if guess == hidden:
        print("Hit!")
        break

    if guess < hidden:
        print("Your guess is too low")
    else:
        print("Your guess is too high")

Solution for Number Guessing (multi-game)

import random

debug = False
move = False
while True:
    print("\nWelcome to another Number Guessing game")
    hidden = random.randrange(1, 201)
    while True:
        if debug:
            print("Debug: ", hidden)

        if move:
            mv = random.randrange(-2, 3)
            hidden = hidden + mv

        user_input = input("Please enter your guess [x|s|d|m|n]: ")
        print(user_input)

        if user_input == 'x':
            print("Sad to see you leaving early")
            exit()

        if user_input == 's':
            print("The hidden value is ", hidden)
            continue

        if user_input == 'd':
            debug = not debug
            continue

        if user_input == 'm':
            move = not move
            continue

        if user_input == 'n':
            print("Giving up, eh?")
            break

        guess = int(user_input)
        if guess == hidden:
            print("Hit!")
            break

        if guess < hidden:
            print("Your guess is too low")
        else:
            print("Your guess is too high")

Solution: Count unique characters

  • set
import sys

if len(sys.argv) != 2:
    exit("Need a string to count")

text = sys.argv[1]

unique = ''
for cr in text:
    if cr not in unique:
        unique += cr

print(len(unique))

The above solution works, but there is a better solution using sets that we have not learned yet. Nevertheless, let me show you that solution:

import sys

if len(sys.argv) != 2:
    exit("Need a string to count")

text = sys.argv[1]

set_of_chars = set(text)

print(len(set_of_chars))

Solution: Convert for-loop to while-loop


ix = 3
while ix < 25:
    print(ix)
    ix += 4

Formatted strings

format - sprintf

  • %
  • %s
  • f
  • {}
  • format
  • sprintf
age = 42.12
name = 'Foo Bar'

str_concatenate = "The user " + name + " was born " + str(age) + " years ago."
print(str_concatenate)

str_percentage = "The user %s was born %s years ago." % (name, age)
print(str_percentage)

str_format = "The user {} was born {} years ago.".format(name, age)
print(str_format)

str_f_string = f"The user {name} was born {age} years ago."
print(str_f_string)

The user Foo Bar was born 42.12 years ago.
The user Foo Bar was born 42.12 years ago.
The user Foo Bar was born 42.12 years ago.
The user Foo Bar was born 42.12 years ago.
  • When using % to print more than one value, put the values in parentheses forming a tuple.
  • In version 2.6 and below you need to write {0} {1} etc, as a placeholder of the format method.
  • f-strings were added in Python 3.6 (released on 2016-12-23)

printf using old %-syntax

  • printf

  • %

  • This slide is here only as a historical page so when you see the older ways of writing you'll know what you see.

  • It is recommended to use f-strings or if those cannot be used for some reason then use the format method.

v = 65
print("<%s>" % v)     # <65>
print("<%10s>" % v)   # <      65>
print("<%-10s>" % v)  # <65      >
print("<%c>" % v)     # <A>
print("<%d>" % v)     # <65>
print("<%0.5d>" % v)  # <00065>

Examples using format with names

txt = "Foo Bar"
num = 42.12

print("The user {name} was born {age} years ago.".format(name = txt, age = num))
The user Foo Bar was born 42.12 years ago.

Format columns

  • format

In this example we use a list of lists that we have not learned yet, but don't worry about that for now. Focus on the output of the two print statements.

data = [
    ["Foo Bar", 42],
    ["Bjorg", 12345],
    ["Roza", 7],
    ["Long Name Joe", 3],
    ["Joe", 12345677889],
]

for entry in data:
    print("{} {}".format(entry[0], entry[1]))

print('-' * 16)

for entry in data:
    print("{:<8}|{:>7}".format(entry[0], entry[1]))
Foo Bar 42
Bjorg 12345
Roza 7
Long Name Joe 3
Joe 12345677889
----------------
Foo Bar |     42
Bjorg   |  12345
Roza    |      7
Long Name Joe|      3
Joe     |12345677889

Examples using format - alignment

  • format
txt = "Some text"

print("'{}'".format(txt))     #  as is:   'Some text'
print("'{:12}'".format(txt))  #  left:    'Some text   '
print("'{:<12}'".format(txt)) #  left:    'Some text   '
print("'{:>12}'".format(txt)) #  right:   '   Some text'
print("'{:^12}'".format(txt)) #  center:  ' Some text  '

Format - string

  • format
  • :s
name = "Foo Bar"

print("{:s}".format(name))
print("{}".format(name))
Foo Bar
Foo Bar

Format characters and types (binary, octal, hexa)

  • format
  • :b
  • :c
  • :d
  • :o
  • :x
  • :X
  • :n
val = 42

print("{:b}".format(val)) #  binary:    101010
print("{:c}".format(val)) #  character: *
print("{:d}".format(val)) #  decimal:   42      (default)
print("{:o}".format(val)) #  octal:     52
print("{:x}".format(val)) #  hexa:      2a
print("{:X}".format(val)) #  hexa:      2A
print("{:n}".format(val)) #  number:    42


print("{}".format(val))   # 42 (same as decimal)


# Zero padding
print("'{:2n}'".format(3))  # ' 3'
print("'{:02n}'".format(3)) # '03'
print("'{:02n}'".format(14)) # '14'


# Zero padding hexa
print("'{:2X}'".format(3))  # ' 3'
print("'{:02X}'".format(3)) # '03'
print("'{:02X}'".format(14)) # '0E'
print("'{:02X}'".format(70)) # '46'

Format floating point number

  • :e
  • :E
  • :f
  • :F
  • :g
  • :G
  • :n
x = 412.345678901

print("{:e}".format(x))   #  exponent:     4.123457e+02
print("{:E}".format(x))   #  Exponent:     4.123457E+02
print("{:f}".format(x))   #  fixed point:  412.345679 (default precision is 6)
print("{:.2f}".format(x)) #  fixed point:  412.35 (set precision to 2)
print("{:F}".format(x))   #  same as f.    412.345679
print("{:g}".format(x))   #  generic:      412.346    (default precision is 6)
print("{:G}".format(x))   #  generic:      412.346
print("{:n}".format(x))   #  number:       412.346


print("{}".format(x))     # defaults to g  412.345678901

Examples using format - indexing

  • format
txt = "Foo Bar"
num = 42.12

print("The user {} was born {} years ago.".format(txt, num))
print("The user {0} was born {1} years ago.".format(txt, num))
print("The user {1} was born {0} years ago.".format(num, txt))


print("{0} is {0} and {1} years old.".format(txt, num))
The user Foo Bar was born 42.12 years ago.
The user Foo Bar was born 42.12 years ago.
The user Foo Bar was born 42.12 years ago.
Foo Bar is Foo Bar and 42.12 years old.

Format characters and types using f-format

val = 42

print(f"{val:b}") #  binary:    101010
print(f"{val:c}") #  character: *
print(f"{val:d}") #  decimal:   42      (default)
print(f"{val:o}") #  octal:     52
print(f"{val:x}") #  hexa:      2a
print(f"{val:X}") #  hexa:      2A
print(f"{val:n}") #  number:    42


print(f"{val}")   # 42 (same as decimal)

# Zero padding
val = 3
print(f"'{val:2n}'")  # ' 3'
print(f"'{val:02n}'") # '03'
val = 14
print(f"'{val:02n}'") # '14'


# Zero padding hexa
val = 3
print(f"'{val:2X}'")  # ' 3'
print(f"'{val:02X}'") # '03'
val = 14
print(f"'{val:02X}'") # '0E'
val = 70
print(f"'{val:02X}'") # '46'

f-format (formatted string literals)

  • f

Since Python 3.6

name = "Foo Bar"
age = 42.12
pi = 3.141592653589793
r = 2

print(f"The user {name} was born {age} years ago.")
print(f"The user {name:10} was born {age} years ago.")
print(f"The user {name:>10} was born {age} years ago.")
print(f"The user {name:>10} was born {age:>10} years ago.")

print(f"PI is '{pi:.3}'.")   # number of digits (defaults n = number)
print(f"PI is '{pi:.3f}'.")  # number of digits after decimal point

print(f"Area is {pi * r ** 2}")
print(f"Area is {pi * r ** 2:.3f}")

The user Foo Bar was born 42.12 years ago.
The user Foo Bar    was born 42.12 years ago.
The user    Foo Bar was born 42.12 years ago.
The user    Foo Bar was born      42.12 years ago.
PI is '3.14'.
PI is '3.142'.
Area is 12.566370614359172
Area is 12.566

Format floating point numbers using f-format

val = 412.345678901

print(f"{val:e}")   #  exponent:     4.123457e+02
print(f"{val:E}")   #  Exponent:     4.123457E+02
print(f"{val:f}")   #  fixed point:  412.345679 (default precision is 6)
print(f"{val:.2f}") #  fixed point:  412.35 (set precision to 2)
print(f"{val:F}")   #  same as f.    412.345679
print(f"{val:g}")   #  generic:      412.346    (default precision is 6)
print(f"{val:G}")   #  generic:      412.346
print(f"{val:n}")   #  number:       412.346


print(f"{val}")     # defaults to g  412.345678901

Format braces, bracket, and parentheses

These are just some extreme special cases. Most people won't need to know about them.

  • To print { include {{.
  • To print } include }}.
print("{{{}}}".format(42))   # {42}

print("{{ {} }}".format(42))   # { 42 }

print("[{}] ({})".format(42, 42))   # [42] (42)

print("%{}".format(42))   # %42

Anything that is not in curly braces will be formatted as they are.

parameterized formatter



def formatter(value, filler, width):
    return "{var:{fill}>{width}}".format(var=value, fill=filler, width=width)

text = formatter(23, "0", 7)
print(text)


print(formatter(42, " ", 7))
print(formatter(1234567, " ", 7))
0000023
     42
1234567

format binary, octal, hexa numbers

a = 42

text = "{:b}".format(a)
print(text)   # 101010

text = "{:#b}".format(a)
print(text)   # 0b101010

a = 42

text = "{:o}".format(a)
print(text)   # 52

text = "{:#o}".format(a)
print(text)   # 0o52

a = 42

text = "{:x}".format(a)
print(text)   # 2a 

text = "{:#x}".format(a)
print(text)   # 0x2a

text = "{:#X}".format(a)
print(text)   # 0x2A


Examples using format with attributes of objects

This is also a rather strange example, I don't think I'd use it in real code.

import sys

print("{0.executable}".format(sys))
print("{system.argv[0]}".format(system = sys))
/home/gabor/venv3/bin/python
formatted_attributes.py

raw f-format

  • f
  • r
name="foo"
print(r"a\nb {name}")
print(rf"a\nb {name}")
print(fr"a\nb {name}")  # this is better (for vim)

a\nb {name}
a\nb foo
a\nb foo

Format with conversion (stringifiation with str or repr)

Adding !s or !r in the place-holder we tell it to cal the str or repr method of the object, respectively.

  • repr (repr) Its goal is to be unambiguous
  • str (str) Its goal is to be readable
  • The default implementation of both are useless
  • Suggestion
  • Difference between str and repr
class Point:
    def __init__(self, a, b):
        self.x = a
        self.y = b

p = Point(2, 3)
print(p)                 # <__main__.Point object at 0x10369d750>
print("{}".format(p))    # <__main__.Point object at 0x10369d750>
print("{!s}".format(p))  # <__main__.Point object at 0x10369d750>
print("{!r}".format(p))  # <__main__.Point object at 0x10369d750>
class Point:
    def __init__(self, a, b):
        self.x = a
        self.y = b
    def __format__(self, spec):
        #print(spec) // empty string
        return("{{'x':{}, 'y':{}}}".format(self.x, self.y))
    def __str__(self):
        return("({},{})".format(self.x, self.y))
    def __repr__(self):
        return("Point({}, {})".format(self.x, self.y))

p = Point(2, 3)
print(p)                 # (2,3)
print("{}".format(p))    # {'x':2, 'y':3}
print("{!s}".format(p))  # (2,3)
print("{!r}".format(p))  # Point(2, 3)

Lists

Anything can be a list

  • Comma separated values
  • In square brackets
  • Can be any value, and a mix of values: Integer, Float, Boolean, None, String, List, Dictionary, ...
  • But usually they are of the same type:
  • Distances of astronomical objects
  • Chemical Formulas
  • Filenames
  • Names of devices
  • Objects describing attributes of a network device.
  • Actions to do on your data.
stuff = [42, 3.14, True, None, "Foo Bar", ['another', 'list'], {'a': 'Dictionary', 'language' : 'Python'}]
print(stuff)

Output:

[42, 3.14, True, None, 'Foo Bar', ['another', 'list'], {'a': 'Dictionary', 'language': 'Python'}]

Any layout

  • Layout is flexible
  • Trailing comma is optional. It does not disturb us. Nor Python.
more_stuff = [
    42,
    3.14,
    True,
    None,
    "Foo Bar",
    ['another', 'list'],
    {
        'a': 'Dictionary',
        'language' : 'Python',
    },
]
print(more_stuff)

Output:

[42, 3.14, True, None, 'Foo Bar', ['another', 'list'], {'a': 'Dictionary', 'language': 'Python'}]

Access elements of a list

  • []

  • len

  • Access single element: [index]

  • Access a sublist: [start:end]

  • Creates a copy of that sublist

planets = ['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn']

print(planets)            # ['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn']
print(len(planets))       # 6
print(type(planets))      # <class 'list'>

print(planets[0])         # Mercury
print(type(planets[0]))   # <class 'str'>
print(planets[3])         # Mars

print(planets[0:2])       # ['Mercury', 'Venus']
print(planets[1:4])       # ['Venus', 'Earth', 'Mars']

print(planets[0:1])       # ['Mercury']
print(type(planets[0:1])) # <class 'list'>

print(planets[2:])        # ['Earth', 'Mars', 'Jupiter', 'Saturn']
print(planets[:3])        # ['Mercury', 'Venus', 'Earth']

print(planets[:])         # ['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn']

List slice with steps

  • List slice with step: [start:end:step]
letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

print(letters[::])       # ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

print(letters[::1])      # ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

print(letters[::2])      # ['a', 'c', 'e', 'g', 'i']

print(letters[1::2])     # ['b', 'd', 'f', 'h', 'j']

print(letters[2:8:2])    # ['c', 'e', 'g']

print(letters[1:20:3])   # ['b', 'e', 'h']

print(letters[20:30:3])   # []

print(letters[8:3:-2])   # ['i', 'g', 'e']

Change a List

  • :
fruits = ['apple', 'banana', 'peach', 'strawberry']
print(fruits)      # ['apple', 'banana', 'peach', 'strawberry']
fruits[0] = 'orange'
print(fruits)      # ['orange', 'banana', 'peach', 'strawberry']

print(fruits[1:3]) # ['banana', 'peach']
fruits[1:3] = ['grape', 'kiwi']
print(fruits)      #  ['orange', 'grape', 'kiwi', 'strawberry']

print(fruits[1:3]) # ['grape', 'kiwi']
fruits[1:3] = ['mango']
print(fruits)      #  ['orange', 'mango', 'strawberry']

print(fruits[1:2]) # ['mango']
fruits[1:2] = ["banana", "peach"]
print(fruits)      # ['orange', 'banana', 'peach', 'strawberry']

print(fruits[1:1]) # []
fruits[1:1] = ['apple', 'pineapple']
print(fruits)      # ['orange', 'apple', 'pineapple', 'banana', 'peach', 'strawberry']
  • Unlike strings, lists are mutable. You can change the content of a list by assigning values to its elements.
  • You can use the slice notation to change several elements at once.
  • You can even have different number of elements in the slice and in the replacement. This will also change the length of the array.

Change sublist vs change element of a list

fruits = ['orange', 'mango', 'strawberry']

print(fruits[1:2]) # ['mango']
fruits[1:2] = ["banana", "peach"]
print(fruits)      # ['orange', 'banana', 'peach', 'strawberry']
print(fruits[1])
print(fruits[2])
fruits = ['orange', 'mango', 'strawberry']

print(fruits[1]) # mango
fruits[1] = ["banana", "peach"]
print(fruits)    # ['orange', ['banana', 'peach'], 'strawberry']
print(fruits[1]) # ['banana', 'peach']
print(fruits[2]) # strawberry

print(fruits[1][0]) # banana

Change with steps

numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
print(numbers)  # [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

print(numbers[1::2])   # [2, 4, 6, 8, 10, 12]
numbers[1::2] = [0, 0, 0, 0, 0, 0]
print(numbers)  # [1, 0, 3, 0, 5, 0, 7, 0, 9, 0, 11, 0]

numbers[1::2] = [42] * 6
print(numbers)  # [1, 42, 3, 42, 5, 42, 7, 42, 9, 42, 11, 42]

List assignment and list copy

  • [:]
fruits = ['apple', 'banana', 'peach', 'kiwi']
salad = fruits
fruits[0] = 'orange'
print(fruits)   # ['orange', 'banana', 'peach', 'kiwi']
print(salad)    # ['orange', 'banana', 'peach', 'kiwi']
  • There is one list in the memory and two pointers to it.
  • If you really want to make a copy the pythonic way is to use the slice syntax.
  • It creates a shallow copy.
fruits = ['apple', 'banana', 'peach', 'kiwi']
salad = fruits[:]

fruits[0] = 'orange'

print(fruits)   # ['orange', 'banana', 'peach', 'kiwi']
print(salad)    # ['apple', 'banana', 'peach', 'kiwi']

Shallow vs. Deep copy of lists

  • copy
  • deepcopy

copy

copy.copy()     # shallow copy
copy.deepcopy() # deep copy
fruits = ['apple', ['banana', 'peach'], 'kiwi']
print(fruits)        # ['apple', ['banana', 'peach'], 'kiwi']
print(fruits[0])     # apple
print(fruits[1][0])  # banana

salad = fruits[:]

fruits[0] = 'orange'
fruits[1][0] = 'mango'

print(fruits)  # ['orange', ['mango', 'peach'], 'kiwi']
print(salad)   # ['apple', ['mango', 'peach'], 'kiwi']

from copy import deepcopy

fruits = ['apple', ['banana', 'peach'], 'kiwi']
print(fruits)        # ['apple', ['banana', 'peach'], 'kiwi']
print(fruits[0])     # apple
print(fruits[1][0])  # banana

salad = deepcopy(fruits)

fruits[0] = 'orange'
fruits[1][0] = 'mango'

print(fruits)  # ['orange', ['mango', 'peach'], 'kiwi']
print(salad)   # ['apple', ['banana', 'peach'], 'kiwi']

join

  • join
fruits = ['apple', 'banana', 'peach', 'kiwi']

together = ':'.join(fruits)
print(together) # apple:banana:peach:kiwi

together = ' '.join(fruits)
print(together) # apple banana peach kiwi

mixed = ' -=<> '.join(fruits)
print(mixed) # apple -=<> banana -=<> peach -=<> kiwi

another = ''.join(fruits)
print(another)  # applebananapeachkiwi

csv = ','.join(fruits)
print(csv) # apple,banana,peach,kiwi
  • For real CSV use: csv

join list of numbers

a = ["x", "2", "y"]
b = ["x", 2, "y"]
print(":".join(a))    # x:2:y
# print ":".join(b)    # TypeError: sequence item 1: expected string, int found

# convert elements to string using map
print(":".join( map(str, b) ))        # x:2:y


# convert elements to string using list comprehension
print(":".join( str(x) for x in b ))  # x:2:y

split

  • split

  • list

  • Special case: To split a string to its characters: Use the list() function.

  • Split using more than one splitter: use re.split

words = "ab:cd::ef".split(':')
print(words)    # ['ab', 'cd', '', 'ef']

by_space = "foo   bar baz".split(' ')
print(by_space) # ['foo', '', '', 'bar', 'baz']

# special case: split by spaces
names = "foo   bar baz".split()
print(names)    # ['foo', 'bar', 'baz']

# special case: split to characters
chars = list("ab cd")
print(chars)    # ['a', 'b', ' ', 'c', 'd']

for loop on lists

  • for
  • in
things = ['apple', 'banana', 'peach', 42]
for var in things:
    print(var)

Output:

apple
banana
peach
42

in list

  • in

Check if the value is in the list?

words = ['apple', 'banana', 'peach', '42']
if 'apple' in words:
    print('found apple')

if 'a' in words:
    print('found a')
else:
    print('NOT found a')

if 42 in words:
    print('found 42')
else:
    print('NOT found 42')

# found apple
# NOT found a
# NOT found 42

Where is the element in the list

  • index
words = ['cat', 'dog', 'snake', 'camel']
print(words.index('snake'))

print(words.index('python'))

Output:

2
Traceback (most recent call last):
  File "examples/lists/index.py", line 6, in <module>
    print(words.index('python'))
ValueError: 'python' is not in list

Index improved

  • index
words = ['cat', 'dog', 'snake', 'camel']

name = 'snake'
if name in words:
    print(words.index(name))

name = 'python'
if name in words:
    print(words.index(name))

[].insert

  • insert
  • unshift
words = ['apple', 'banana', 'cat']
print(words)  # ['apple', 'banana', 'cat']

words.insert(2, 'zebra')
print(words)  # ['apple', 'banana', 'zebra', 'cat']

words.insert(0, 'dog')
print(words)  # ['dog', 'apple', 'banana', 'zebra', 'cat']

# Instead of this, use append (next slide)
words.insert(len(words), 'olifant')
print(words)  # ['dog', 'apple', 'banana', 'zebra', 'cat', 'olifant']

[].append

  • append
names = ['Foo', 'Bar', 'Zorg', 'Bambi']
print(names)  # ['Foo', 'Bar', 'Zorg', 'Bambi']

names.append('Qux')
print(names)  # ['Foo', 'Bar', 'Zorg', 'Bambi', 'Qux']

[].remove

  • remove
names = ['Joe', 'Kim', 'Jane', 'Bob', 'Kim']
print(names)                # ['Joe', 'Kim', 'Jane', 'Bob', 'Kim']

print(names.remove('Kim'))  # None
print(names)                # ['Joe', 'Jane', 'Bob', 'Kim']

print(names.remove('George'))
   # Traceback (most recent call last):
   #   File "examples/lists/remove.py", line 9, in <module>
   #     print(names.remove('George'))  # None
   # ValueError: list.remove(x): x not in list

Remove first element from a list given by its value. Throws an exception if there is no such element in the list.

Remove element by index [].pop

  • pop
planets = ['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter']
print(planets)          # ['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter']

third = planets.pop(2)
print(third)            # Earth
print(planets)          # ['Mercury', 'Venus', 'Mars', 'Jupiter']

last = planets.pop()
print(last)             # Jupiter
print(planets)          # ['Mercury', 'Venus', 'Mars']

# planets.pop(4)          # IndexError: pop index out of range

jupyter_landers = []
# jupyter_landers.pop()   # IndexError: pop from empty list

Remove and return the last element of a list. Throws an exception if the list was empty.

Remove first element of list

  • pop
  • shift

To remove an element by its index, use the slice syntax:

names = ['foo', 'bar', 'baz', 'moo']

first = names.pop(0)
print(first)    # foo
print(names)    # ['bar', 'baz', 'moo']

Remove several elements of list by index

  • slice

To remove an element by its index, use the slice syntax:

names = ['foo', 'bar', 'baz', 'moo', 'qux']

names[2:4] = []
print(names)    # ['foo', 'bar', 'qux']

Use list as a queue - FIFO

a_queue = []
print(a_queue)

a_queue.append('Moo')
print(a_queue)

a_queue.append('Bar')
print(a_queue)

first = a_queue.pop(0)
print(first)
print(a_queue)

Output:

[]
['Moo']
['Moo', 'Bar']
Moo
['Bar']

Queue using deque from collections

  • collections
  • deque
  • append
  • popleft
from collections import deque

fruits = deque()

print(type(fruits))  # <type 'collections.deque'>
print(fruits)        # deque([])
print(len(fruits))   # 0

fruits.append('Apple')
print(fruits)        # deque(['Apple'])
print(len(fruits))   # 1

fruits.append('Banana')
fruits.append('Peach')
print(fruits)        # deque(['Apple', 'Banane', 'Peach'])
print(len(fruits))   # 3

nxt = fruits.popleft()
print(nxt)           # 'Apple'
print(fruits)        # deque(['Banana', 'Peach'])
print(len(fruits))   # 2

if fruits:
    print("The queue has items")
else:
    print("The queue is empty")

nxt = fruits.popleft()
nxt = fruits.popleft()

if fruits:
    print("The queue has items")
else:
    print("The queue is empty")


Output:

<class 'collections.deque'>
deque([])
0
deque(['Apple'])
1
deque(['Apple', 'Banana', 'Peach'])
3
Apple
deque(['Banana', 'Peach'])
2
The queue has items
The queue is empty
  • .append
  • .popleft
  • len() number of elements
  • if q: to see if it has elements or if it is empty
  • dequeue

Fixed size queue

  • maxlen
from collections import deque

queue = deque([], maxlen = 3)
print(len(queue))     # 0
print(queue.maxlen)   # 3

queue.append("Foo")
queue.append("Bar")
queue.append("Baz")
print(queue)          # deque(['Foo', 'Bar', 'Baz'], maxlen=3)

queue.append("Zorg")  # Automatically removes the left-most (first) element
print(queue)          # deque(['Bar', 'Baz', 'Zorg'], maxlen=3)

List as a stack - LIFO

stack = []

stack.append("Joe")
print(stack)
stack.append("Jane")
print(stack)
stack.append("Bob")
print(stack)

while stack:
    name = stack.pop()
    print(name)
    print(stack)

Output:

['Joe']
['Joe', 'Jane']
['Joe', 'Jane', 'Bob']
Bob
['Joe', 'Jane']
Jane
['Joe']
Joe
[]

stack with deque

from collections import deque
stack = deque()

stack.append("Joe")
stack.append("Jane")
stack.append("Bob")

while stack:
    name = stack.pop()
    print(name)

# Bob
# Jane
# Joe

Exercies: Queue

  • Create file called queue_of_people.py

  • The application should manage a queue of people.

  • It will prompt the user for a new name by printing :, the user can type in a name and press ENTER. The app will add the name to the queue.

  • If the user types in "n" then the application will remove the first name from the queue and print it.

  • If the user types in "x" then the application will print the list of users who were left in the queue and it will exit.

  • If the user types in "s" then the application will show the current number of elements in the queue.

$ python queue_of_people.py

: Joe
: Jane
: Mary
: s
  3
: n
  next is Joe
: n
  next is Jane
: Peter
: n
  next is Mary
: n
  next is Peter
: n
  the queue is empty
: Bar
: Tal
: x
  Left in the queue: Bar, Tal
$

Exercise: Stack

  • Create file called reverse_polish_calculator.py
  • Implement a Reverse Polish Calculator
2
3
4
+
*
=
14
x = eXit, s = Show, [+-*/=]
:23
:19
:7
:8
:+
:3
:-
:/
:s
[23.0, -0.631578947368421]
:+
:=
22.36842105263158
:s
[]
:x

Exercise: MasterMind

  • Create file called mastermind.py

  • Implement the Master Mind board game.

  • The computer "thinks" a number with 4 different digits.

  • The user guesses which digits.

  • For every digit that matched both in value, and in location the computer gives a *.

  • For every digit that matches in value, but not in space the computer gives you a +.

  • The user tries to guess the given number in as few guesses as possible.

Computer:
2153       (this is hidden)

User    Response
2467    *        (because 2 is in the right place but none of the other digits match)
2715    *++      (because 2 is in the right place. 1 and 5 are used but in the wrong place. 7 not in use)
  • Wordle is basically the same game, just with letters and the extra limitation that each guess must be a valid word.

Solution: Queue with list

queue = []

while True:
    inp = input(":")
    inp = inp.rstrip("\n")

    if inp == 'x':
        for name in queue:
           print(name)
        exit()

    if inp == 's':
        print(len(queue))
        continue

    if inp == 'n':
        if len(queue) > 0:
            print("next is {}".format(queue.pop(0)))
        else:
            print("the queue is empty")
        continue

    queue.append(inp)

Solution: Queue with deque

from collections import deque

queue = deque()

while True:
    inp = input(":")
    inp = inp.rstrip("\n")

    if inp == 'x':
        for name in queue:
           print(name)
        exit()

    if inp == 's':
        print(len(queue))
        continue

    if inp == 'n':
        if len(queue) > 0:
            print("next is {}".format(queue.popleft()))
        else:
            print("the queue is empty")
        continue

    queue.append(inp)

Solution: Reverse Polish calculator (stack) with lists

stack = []

print("x = eXit, s = Show, [+-*/=]")
while True:
    val = input(':')

    if val == 's':
        print(stack)
        continue

    if val == 'x':
        break

    if val == '+':
        a = stack.pop()
        b = stack.pop()
        stack.append(a+b)
        continue

    if val == '-':
        a = stack.pop()
        b = stack.pop()
        stack.append(a-b)
        continue

    if val == '*':
        a = stack.pop()
        b = stack.pop()
        stack.append(a*b)
        continue

    if val == '/':
        a = stack.pop()
        b = stack.pop()
        stack.append(a/b)
        continue

    if val == '=':
        print(stack.pop())
        continue

    stack.append(float(val))

Solution: Reverse Polish calculator (stack) with deque

from collections import deque

stack = deque()

while True:
    val = input(':')

    if val == 'x':
        break

    if val == '+':
        a = stack.pop()
        b = stack.pop()
        stack.append(a+b)
        continue

    if val == '*':
        a = stack.pop()
        b = stack.pop()
        stack.append(a*b)
        continue


    if val == '=':
        print(stack.pop())
        continue

    stack.append(float(val))

Solution: MasterMind

import random
import sys

width = 4

# TODO: verify that the user gave exactly width characters

def main():
    hidden = list(map(str, random.sample(range(10), width)))
    print(f"Hidden numbers: {hidden}")
    while True:
        inp = input("Guess a number: (e.g. 1234) or x to eXit. ")
        if inp == 'x' or inp == 'X':
            exit()
        guess = list(inp)
        print(guess)
        result = []
        for ix in range(len(hidden)):
            if guess[ix] == hidden[ix]:
                result += '*'
            elif guess[ix] in hidden:
                result += '+'
        print(result)
        if result == ['*'] * width:
            print("SUCCESS")
            break
main()

MasterMind to debug

Debug the following version of the MasterMind game.

import random


def number_generator():
    y = [0, 0, 0, 0]

    for i in range(0, 4):
        y[i] = random.randrange(0, 10)
        # print(y)
        if i:
            number += str(y[i])
        else:
            number = str(y[i])
    # print(number)
    return number


def user_input():
    x = input("Type in 4 digits number:")
    if len(x) == 4:
        return x
    else:
        print("wrong input")
        user_input()


def string_compare(x, y):
    r = 0
    q = 0
    for i in range(0, 4):
        if x[i] == y[i]:
            r += 1
            continue
        for j in range(0, 4):
            if x[i] == y[j]:
                if i == j:
                    continue
                else:
                    q += 1
                    break
    return r, q


def print_result(r):
    print("")
    for i in range(0, r[0]):
        print("*", end="")
    for i in range(0, r[1]):
        print("+", end="")
    print("\n")


def main():
    comp = number_generator()
    result = 0
    while True:
        user = user_input()
        result = string_compare(comp, user)
        print_result(result)
        # print(result)
        if result[0] == 4:
            print("Correct!")
            return


main()

Debugging Queue

The following implementation has a bug. (Even though the n was supposed to remove the element and the code seems to mean that it does, we still see two items after we removed the first.)

The question is how to debug this?

q = []

while True:
    name=input("your name: ")

    if name=="n":
        print(q.pop(0))

    if name=="x":
        print(q)
        exit()

    if name=="s":
        print(len(q))
        exit()
    else:
        q.append(name)
        continue

your name: Foo
your name: Bar
your name: n
Foo
your name: s
2

sort

  • sort
planets = ['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn']
print(planets)     # ['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn']
planets.sort()
print(planets)     # ['Earth', 'Jupiter', 'Mars', 'Mercury', 'Saturn', 'Venus']

planets.sort(reverse=True)
print(planets)     # ['Venus', 'Saturn', 'Mercury', 'Mars', 'Jupiter', 'Earth']

sort numbers

  • sort
  • key
  • abs
numbers = [7, 2, -4, 19, 8]
print(numbers)                      # [7, 2, -4, 19, 8]
numbers.sort()
print(numbers)                      # [-4, 2, 7, 8, 19]

numbers.sort(reverse=True)
print(numbers)                      # [19, 9, 7, 2, -4]

numbers.sort(key=abs, reverse=True)
print(numbers)                      # [19, 9, 7, -4, 2]

key sort of strings

  • key

  • len

  • Another example for using a key.

  • To sort the list according to length

animals = ['chicken', 'cow', 'snail', 'elephant']
print(animals)

animals.sort()
print(animals)

animals.sort(key=len)
print(animals)

animals.sort(key=len, reverse=True)
print(animals)

Output:

['chicken', 'cow', 'snail', 'elephant']
['chicken', 'cow', 'elephant', 'snail']
['cow', 'snail', 'chicken', 'elephant']
['elephant', 'chicken', 'snail', 'cow']

sort mixed values

mixed = [100, 'foo', 42, 'bar']
print(mixed)
mixed.sort()
print(mixed)

In Python 3 it throws an exception.

Output:

[100, 'foo', 42, 'bar']
Traceback (most recent call last):
  File "examples/lists/sort_mixed.py", line 5, in <module>
    mixed.sort()
TypeError: unorderable types: str() < int()

Python 2 puts the numbers first in numerical order and then the strings in ASCII order.

[100, 'foo', 42, 'bar']
[42, 100, 'bar', 'foo']

sort mixed values fixed with str

mixed = [100, 'foo', 42, 'bar']
print(mixed)

mixed.sort(key=str)
print(mixed)

sorting with sorted

  • sorted
animals = ['chicken', 'cow', 'snail', 'elephant']
print(animals)         # ['chicken', 'cow', 'snail', 'elephant']

srd = sorted(animals)
print(srt)             # ['chicken', 'cow', 'elephant', 'snail']
print(animals)         # ['chicken', 'cow', 'snail', 'elephant']

rev = sorted(animals, reverse=True, key=len)
print(rev)             # ['elephant', 'chicken', 'snail', 'cow']
print(animals)         # ['chicken', 'cow', 'snail', 'elephant']

sort vs. sorted

The sort() method will sort a list in-place and return None. The built-in sorted() function will return the sorted list and leave the original list intact.

Sorted and change - shallow copy

  • Sorted creates a shallow copy of the original list

  • If the list elements are simple values that creates a copy

planets = ["Mercury", "Venus", "Earth"]
other_planets = planets
sorted_planets = sorted(planets)
planets[0] = "Jupiter"
print(planets)
print(other_planets)
print(sorted_planets)

Output:

['Jupiter', 'Venus', 'Earth']
['Jupiter', 'Venus', 'Earth']
['Earth', 'Mercury', 'Venus']
  • If some of the elements are complex structures (list, dictionaries, etc.) then the internal structures are not copied.
  • One can use copy.deepcopy to make sure the whole structure is separated, if that's needed.
planets = [
    ["Mercury", 1],
    ["Venus", 2],
    ["Earth", 3],
    ["Earth", 2]
]
other_planets = planets
sorted_planets = sorted(planets)
print(sorted_planets)

planets[0][1] = 100
print(planets)
print(other_planets)
print(sorted_planets)

Output:

[['Earth', 2], ['Earth', 3], ['Mercury', 1], ['Venus', 2]]
[['Mercury', 100], ['Venus', 2], ['Earth', 3], ['Earth', 2]]
[['Mercury', 100], ['Venus', 2], ['Earth', 3], ['Earth', 2]]
[['Earth', 2], ['Earth', 3], ['Mercury', 100], ['Venus', 2]]

Sorting characters of a string

letters = 'axzb'
print(letters)         # 'axzb'

srt = sorted(letters)
print(srt)             # ['a', 'b', 'x', 'z']
print(letters)         # 'axzb'

rev = ''.join(srt)
print(rev)               # abxz

# in one statement:
rev = ''.join(sorted(letters))
print(rev)               # abxz

range

  • range
for ix in range(11, 19, 2):
    print(ix)
# 11
# 13
# 15
# 17

for ix in range(5, 7):
    print(ix)
# 5
# 6

for ix in range(3):
    print(ix)
# 0
# 1
# 2

for ix in range(19, 11, -2):
    print(ix)

# 19
# 17
# 15
# 13

Looping over index

planets = ['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn']
for var in planets:
    print(var)

Output:

Mercury
Venus
Earth
Mars
Jupiter
Saturn
planets = ['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn']
for ix in range(len(planets)):
    print(ix, planets[ix])

Output:

0 Mercury
1 Venus
2 Earth
3 Mars
4 Jupiter
5 Saturn

Enumerate lists

  • enumerate
planets = ['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn']
for idx, planet in enumerate(planets):
    print(idx, planet)

Output:

0 Mercury
1 Venus
2 Earth
3 Mars
4 Jupiter
5 Saturn

List operators

a = ['one', 'two']
b = ['three']

print(a)     # ['one', 'two']

print(a * 2) # ['one', 'two', 'one', 'two']
print(2 * a) # ['one', 'two', 'one', 'two']

print(a + b) # ['one', 'two', 'three']
print(b + a) # ['three', 'one', 'two']

List of lists

x = ['abc', 'def']
print(x)     # ['abc', 'def']

y = [x, 'xyz']
print(y)          # [['abc', 'def'], 'xyz']
print(y[0])       #  ['abc', 'def']

print(x[0])       #    abc
print(y[0][0])    #    abc

List assignment

List assignment works in "parallel" in Python.

x, y = 1, 2
print(x)      # 1
print(y)      # 2

x, y = y, x
print(x)      # 2
print(y)      # 1

def stats(num):
    return sum(num), sum(num)/len(num), min(num), max(num)

total, average, minimum, maximum = stats([2, 3, 4])
print(total, average, minimum, maximum) # 9 3.0 2 4
x,y = f()  # works if f returns a list of 2 elements

It will throw a run-time ValueError exception if the number of values in the returned list is not 2. (Both for fewer and for more return values).

List documentation

Exercise: color selector menu

  • In a script called color_selector_menu.py have a list of colors. Write a script that will display a menu (a list of numbers and the corresponding color) and prompts the user for a number. The user needs to type in one of the numbers. That's the selected color.
  1. blue
  2. green
  3. yellow
  4. white
  • For extra credit make sure the system is user-proof and it won't blow up on various incorrect input values. (e.g Floating point number. Number that is out of range, non-number)
  • For more credit allow the user to supply the number of the color on the command line. python color_selector_menu.py 3. If that is available, don't prompt.
  • For further credit allow the user to provide the name of the color on the command line: python color_selector_menu.py yellow Can you handle color names that are not in the expected case (e.g. YelloW)?
  • Any more ideas for improvement?

Exercise: count digits

Create a script called count_digits_in_lists.py that given a list of numbers count how many times each digit appears? The output will look like this:

0  1
1  3
2  3
3  2
4  1
5  2
6  2
7  0
8  1
9  1
  • Use this skeleton
numbers = [1203, 1256, 312456, 98]

Exercise: Create list

  • Create a script called create_list.py that given a list of strings with words separated by spaces, create a single list of all the words.

  • Skeleton:

lines = [
    'grape banana mango',
    'nut orange peach',
    'apple nut banana apple mango',
]

# ....

print(fruits)

# ....

print(unique_fruites)
  • Expected result:
['grape', 'banana', 'mango', 'nut', 'orange', 'peach', 'apple', 'nut', 'banana', 'apple', 'mango']

Then create a list of unique values sorted in alphabetical order.

Expected result:

['apple', 'banana', 'grape', 'mango', 'nut', 'orange', 'peach']

Exercise: Count words

  • Create a script called count_words_in_lists.py that given a list of words (for now embedded in the program itself) will count how many times each word appears.
celestial_objects = [
    'Moon', 'Gas', 'Asteroid', 'Dwarf', 'Asteroid', 'Moon', 'Asteroid'
]


Expected output:

Moon        2
Gas         1
Asteroid    3
Dwarf       1

Exercise: Check if number is prime

Write a program called is_prime.py that gets a number on the command line a prints "True" if the number is a prime number or "False" if it isn't.

python is_prime.py 42
False
python is_prime.py 19
True

Exercise: DNA sequencing

  • Create a file called dna_sequencing.py

  • A, C, T, G are called bases or nucleotides

  • Accept a sequence on the command line like this: python dna_sequencing.py ACCGXXCXXGTTACTGGGCXTTGTXX

  • Given a sequence such as the one above (some nucleotides mixed up with other elements represented by an X)

  • First return the sequences containing only ACTG. The above string can will be changed to ['ACCG', 'C', 'GTTACTGGGC', 'TTGT'].

  • Then sort them by lenght. Expected result: ['GTTACTGGGC', 'ACCG', 'TTGT', 'C']

  • Create a file called extended_dna_sequencing.py

  • In this case the original string contains more than on type of foreign elements: e.g. 'ACCGXXTXXYYGTTQRACQQTGGGCXTTGTXX'.

  • Expected output: ['TGGGC', 'ACCG', 'TTGT', 'GTT', 'AC', 'T']

  • Ask for a sequence on the Standard Input (STDIN) like this:

python extended_dna_sequencing.py
Please type in a sequence:

Solution: menu

colors = ['blue', 'yellow', 'black', 'purple']
for ix in range(len(colors)):
    print("{}) {}".format(ix+1, colors[ix]))

selection = input("Select color: ")
if not selection.isdecimal():
    exit(f"We need a number between 1 and {len(colors)}")

if int(selection) < 1 or int(selection) > len(colors):
    exit(f"The number must be between 1 and {len(colors)}")

col = int(selection) - 1
print(colors[col])
  • We would like to show a menu where each number corresponds to one element of the list so this is one of the places where we need to iterate over the indexes of a list.

  • len(colors) gives us the length of the list (in our case 4)

  • range(len(colors)) is the range of numbers between 0 and 4 (in our case), meaning 0, 1, 2, 3.

  • (Sometimes people explicitly write 4 in this solution, but if later we change the list and include another color we'll have to remember updating this number as well. This is error prone and it is very easy to deduct this number from the data we already have. (The list.))

  • We start the list from 0, but when we display the menu we would like to show the numbers 1-4 to make it more human friendly. Therefore we show ix+1 and the color from locations ix.

  • We ask for input and save it in a variable.

  • We use the isdecimal method to check if the user typed in a decimal number. We give an error and exit if not.

  • Then we check if the users provided a number in the correct range of values. We give an error and exit if not.

  • then we convert the value to the correct range of numbers (remember, the user sees and selects numbers between 1-4 and we need them between 0-3).

Solution: count digits

numbers = [1203, 1256, 312456, 98]

count = [0] * 10 # same as [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

for num in numbers:
    for char in str(num):
        count[int(char)] += 1

for d in range(0, 10):
    print("{}  {}".format(d, count[d]))

First we have to decide where are we going to store the counts. A 10 element long list seems to fit our requirements so if we have 3 0s and 2 8s we would have [3, 0, 0, 0, 0, 0, 0, 0, 2, 0].

  • We have a list of numbers.

  • We need a place to store the counters. For this we create a variable called counter which is a list of 10 0s. We are going to count the number of times the digit 3 appears in counters[3].

  • We iterate over the numbers so num is the current number. (e.g. 1203)

  • We would like to iterate over the digits in the current number now, but if we write for var in num we will get an error TypeError: 'int' object is not iterable because num is a number, but numbers are not iterables, so we we cannot iterate over them. So we need to convert it to a string using str.

  • On each iteration char will be one character (which in or case we assume that will be a digit, but still stored as a string).

  • int(char) will convert the string to a number so for example "2" will be converted to 2.

  • count[int(char)] is going to be char[2] if char is "2". That's the location in the list where we count how many times the digit 2 appears in our numbers.

  • We increment it by one as we have just encountered a new copy of the given digit.

  • That finished the data collection.

  • The second for-loop iterates over all the "possible digits" that is from 0-9, prints out the digit and the counter in the respective place.

Solution: Create list

  • unique
  • sorted
  • set
lines = [
    'grape banana mango',
    'nut orange peach',
    'apple nut banana apple mango',
]

one_line = ' '.join(lines)
print(one_line)
fruits = one_line.split()
print(fruits)

unique_fruits = []
for word in fruits:
    if word not in unique_fruits:
        unique_fruits.append(word)
print(sorted(unique_fruits))


# a simpler way using a set, but we have not learned sets yet.
unique = sorted(set(fruits))
print(unique)

Solution: Count words

celestial_objects = [
    'Moon', 'Gas', 'Asteroid', 'Dwarf', 'Asteroid', 'Moon', 'Asteroid'
]

names   = []
counter = []

for name in celestial_objects:
    if name in names:
        idx = names.index(name)
        counter[idx] += 1
    else:
        names.append(name)
        counter.append(1)

for i in range(len(names)):
    print("{:12}   {}".format(names[i], counter[i]))

celestial_objects = [
    'Moon', 'Gas', 'Asteroid', 'Dwarf', 'Asteroid', 'Moon', 'Asteroid'
]

names   = []
counter = []

for name in celestial_objects:
    for idx in range(len(names)):
        if name == names[idx]:
            counter[idx] += 1
            break
    else:
        names.append(name)
        counter.append(1)

for i in range(len(names)):
    print("{:12}   {}".format(names[i], counter[i]))

Solution: Check if number is prime

import sys

n = int(sys.argv[1])

#print(n)

is_prime = True
for i in range(2, int( n ** 0.5) + 1):
    if n % i == 0:
        is_prime = False
        break

print(is_prime)


# math.sqrt(n) might be clearer than n ** 0.5

Solution: DNA sequencing

def get_sequences(dna):
    sequences = dna.split('X')
    sequences.sort(key=len, reverse=True)
    print(sequences)

    new_seq = []
    for w in sequences:
       if len(w) > 0:
          new_seq.append(w)

    return new_seq


if __name__ == '__main__':
    dna = 'ACCGXXCXXGTTACTGGGCXTTGT'
    short_sequences = get_sequences(dna)
    print(short_sequences)

Solution: DNA sequencing other

from dna_sequencing import get_sequences


if __name__ == '__main__':
    dna = 'ACCGXXTXXYYGTTQRACQQTGGGCXTTGTXX'

    filtered = []
    for cr in dna:
        if cr in 'ACGT':
            filtered.append(cr)
        else:
            filtered.append('X')
    #print(filtered)

    dna = ''.join(filtered)

    short_sequences = get_sequences(dna)
    print(short_sequences)

Solution: DNA sequencing using replace

from dna_sequencing import get_sequences


if __name__ == '__main__':
    dna = 'ACCGXXTXXYYGTTQRACQQTGGGCXTTGTXX'
    bad_letters = []
    for cr in dna:
        if cr not in 'ACTGX' and cr not in bad_letters:
            bad_letters.append(cr)

    for cr in bad_letters:
        while cr in dna:
            dna = dna.replace(cr, 'X')

    short_sequences = get_sequences(dna)
    print(short_sequences)

Solution: DNA sequencing using regex

import re
from dna_sequencing import get_sequences


if __name__ == '__main__':
    dna = 'ACCGXXTXXYYGTTQRACQQTGGGCXTTGTXX'

    dna = re.sub(r'[^ACTGX]+', 'X', dna)

    short_sequences = get_sequences(dna)
    print(short_sequences)

Solution: DNA sequencing with filter

dna = 'ACCGXXCXXGTTACTGGGCXTTGT'
sequences = dna.split('X')
sequences.sort(key=len, reverse=True)

def not_empty(x):
    return len(x) > 0

print(sequences)
sequences = list( filter(not_empty, sequences) )
print(sequences)

Solution: DNA sequencing with filter and lambda

dna = 'ACCGXXCXXGTTACTGGGCXTTGT'
sequences = dna.split('X')
sequences.sort(key=len, reverse=True)

print(sequences)
sequences = list( filter(lambda x: len(x) > 0, sequences) )
print(sequences)

[].extend

  • extend
names = ['Foo Bar', 'Orgo Morgo']

names.extend(['Joe Doe', 'Jane Doe'])
print(names) # ['Foo Bar', 'Orgo Morgo', 'Joe Doe', 'Jane Doe']

append vs. extend

What is the difference between [].append and [].extend ? The method append adds its parameter as a single element to the list, while extend gets a list and adds its content.

names = ['Foo Bar', 'Orgo Morgo']
more = ['Joe Doe', 'Jane Doe']
names.extend(more)
print(names)  # ['Foo Bar', 'Orgo Morgo', 'Joe Doe', 'Jane Doe']

names = ['Foo Bar', 'Orgo Morgo']
names.append(more)
print(names) # ['Foo Bar', 'Orgo Morgo', ['Joe Doe', 'Jane Doe']]

names = ['Foo', 'Bar']
names.append('Qux')
print(names)   # ['Foo', 'Bar', 'Qux']

names = ['Foo', 'Bar']
names.extend('Qux')
print(names)   # ['Foo', 'Bar', 'Q', 'u', 'x']

split and extend

When collecting data which is received from a string via splitting, we would like to add the new elements to the existing list:

lines = [
    'abc def ghi',
    'hello world',
]

collector = []

for l in lines:
   collector.extend(l.split())
   print(collector)

# ['abc', 'def', 'ghi']
# ['abc', 'def', 'ghi', 'hello', 'world']

Tuples

Create tuple

  • tuple
  • ()

Tuple

  • A tuple is a fixed-length immutable list. It cannot change its size or content.
  • Can be accessed by index, using the slice notation.
  • A tuple is denoted with parentheses: (1,2,3)
planets = ('Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn')
print(planets)
print(planets[1])
print(planets[1:3])

planets.append("Death Star")
print(planets)

Output:

('Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn')
Venus
('Venus', 'Earth')
Traceback (most recent call last):
  File "/home/gabor/work/slides/python/examples/lists/tuple.py", line 6, in <module>
    tpl.append("Death Star")
AttributeError: 'tuple' object has no attribute 'append'

List

  • Elements of a list can be changed via their index or via the list slice notation.
  • A list can grow and shrink using append and pop methods or using the slice notation.
  • A list is denoted with square brackets: [1, 2, 3]
planets = ['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn']
print(planets)
print(planets[1])
print(planets[1:3])

planets.append("Death Star")
print(planets)

Output:

['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn']
Venus
['Venus', 'Earth']
['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn', 'Death Star']

Tuples are rarely used. There are certain places where Python or some module require tuple (instead of list) or return a tuple (instead of a list) and in each place it will be explained. Otherwise you don't need to use tuples.

e.g. keys of dictionaries can be tuple (but not lists).

Convert list to tuple and tuple to list

planets = ['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn']

print(planets)
print(planets.__class__.__name__)

tpl = tuple(planets)
print(tpl)
print(tpl.__class__.__name__)


lst = list(tpl)
print(lst)
print(lst.__class__.__name__)

Output:

['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn']
list
('Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn')
tuple
['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn']
list

Enumerate returns tuples

  • enumerate
  • tuple
planets = ['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn']

enu = enumerate(planets)
print(type(enu).__name__)
print(enu)
#for t in enu:
#    print(t)
for ix, planet in enu:
    print(ix, planet)

#print('-----')
#
#element = next(enu)
#print(type(element))
#print(element)
#
#print('-----')
#
#for tpl in enumerate(planets):
#    print(tpl[0], tpl[1])
#

Output:

enumerate
<enumerate object at 0x7f7ede7e37c0>
-----
<class 'tuple'>
(0, 'Mercury')
-----
0 Mercury
1 Venus
2 Earth
3 Mars
4 Jupiter
5 Saturn

Change a tuple


z = ([1, 2], [3, 4])
print(z)               # ([1, 2], [3, 4])

z[0].append(5)
print(z)               # ([1, 2, 5], [3, 4])


# z[0] = [7, 8] # TypeError: 'tuple' object does not support item assignment
# z.append(7)   # AttributeError: 'tuple' object has no attribute 'append'

Sort tuples

students = [
    ('John', 'A', 2),
    ('John', 'B', 2),
    ('John', 'A', 3),
    ('Anne', 'B', 1),
    ('Anne', 'A', 2),
    ('Anne', 'A', 1),
]
print(students)

print(sorted(students))

"""
[
    ('Anne', 'A', 1),
    ('Anne', 'A', 2),
    ('Anne', 'B', 1),
    ('John', 'A', 2),
    ('John', 'A', 3),
    ('John', 'B', 2)
]
"""

Sort tuples by specific elements

Sorting tuples or list, or other complex structures

students = [
    ('John', 'A', 2),
    ('Zoro', 'C', 1),
    ('Dave', 'B', 3),
]
print(students)
  # [('John', 'A', 2), ('Zoro', 'C', 1), ('Dave', 'B', 3)]

print(sorted(students))
  # [('Dave', 'B', 3), ('John', 'A', 2), ('Zoro', 'C', 1)]
  # sort by the first element of each tuple

print(sorted(students, key=lambda s : s[1]))
  # [('John', 'A', 2), ('Dave', 'B', 3), ('Zoro', 'C', 1)]
  # sort by the 2nd element of the tuples (index 1)

print(sorted(students, key=lambda s : s[2]))
  # [('Zoro', 'C', 1), ('John', 'A', 2), ('Dave', 'B', 3)]
  # sort by the 3rd element of the tuples (index 2)


from operator import itemgetter
print(sorted(students, key=itemgetter(2)))
  # [('Zoro', 'C', 1), ('John', 'A', 2), ('Dave', 'B', 3)]
  # maybe this is more simple than the lambda version
  # and probably faster

Sort and secondary sort

We have a list of words. It is easy to sort them by length, but what will be the order among the words that have the same length?

A sort using a lambda-function that returns a tuple can provide the secondary sort order.


planets1 = ['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn']
planets2 = ['Mercury', 'Earth', 'Venus', 'Mars', 'Jupiter', 'Saturn']

print(sorted(planets1, key=len))
# ['Mars', 'Venus', 'Earth', 'Saturn', 'Mercury', 'Jupiter']
print(sorted(planets2, key=len))
# ['Mars', 'Earth', 'Venus', 'Saturn', 'Mercury', 'Jupiter']

print(sorted(planets1, key=lambda w: (len(w), w)))
# ['Mars', 'Earth', 'Venus', 'Saturn', 'Jupiter', 'Mercury']
print(sorted(planets2, key=lambda w: (len(w), w)))
# ['Mars', 'Earth', 'Venus', 'Saturn', 'Jupiter', 'Mercury']

Files

File types: Text vs Binary

You probably know many file types such as Images (png, jpg, ...), Word, Excel, mp3, mp4, csv, and now also .py files. Internally there are two big categories. Text and Binary files. Text files are the ones that look readable if you open them with a plain text editor such as Notepad. Binary files will look like a mess if you opened them in Noetpad.

For Binary files you need a special application to "look" at their content. For example the Excel and Word programs for the appropriate files. Some image viewer application to view all the images. VLC to look at an mp4. Some application to hear the content of mp3 files.

  • Text: Can make sense when opened with Notepad: .txt, csv, .py, .pl, ..., HTML , XML, YAML, JSON
  • Binary: Need specialized tool to make sense of it: Images, Zip files, Word, Excel, .exe, mp3, mp4

In Python you have specialized modules for each well-knonw binary type to handle the files of that format. Text files on the other hand can be handled by low level file-reading functions, however even for those we usually have modules that know how to read and interpret the specific formats. (e.g. CSV, HTML, XML, YAML, JSON parsers)

Open vs. Read vs. Load

The expression "open a file" has two distinct meanings for programmers and users of software. For a user of Word, for example, "open the file" would mean to be able to see its content in a formatted way inside the editor.

When a programmer - now acting as a regular user - opens a Python file in an editor such as Notepad++ or Pycharm, the expectation is to see the content of that program with nice colors.

However in order to provide this the programmer behind these applications had to do several things.

  • Connect to a file on the disk (aka. "opening the file" in programmer speak).
  • Read the content of the file from the disk to memory.
  • Format the content read from the file as expected by the user of that application.

Binary files: Images

This is just a quick example how to use the Pillow module to handle images. There is a whole chapter on dealing with images.

pip install pillow
from PIL import Image
import sys

if len(sys.argv) != 3 and len(sys.argv) != 4:
    exit(f"Usage: {sys.argv[0]} FILENAME %CHANGE OUTFILE")

in_file = sys.argv[1]
change = float(sys.argv[2])
out_file = sys.argv[3] if len(sys.argv) == 4 else None

img = Image.open(in_file) # opening file and reading meta

print(img.size)    # a tuple
print(img.size[0]) # width
print(img.size[1]) # height

width = int(change * img.size[0] / 100)
height = int(change * img.size[1] / 100)


out = img.resize((width, height))
#print("image size: ", sys.getsizeof(list(img.im)))
print("image size: ", sys.getsizeof(img.getdata()))
print("image size: ", sys.getsizeof(img.im))

out.show()
print("image size: {}", sys.getsizeof(out.im))

if out_file:
    out.save(out_file)

python examples/files/get_image_size.py examples/pil/first.png

Output:

(800, 450)
800
450
48
1080033

Reading an Excel file

There are many ways to deal with Excel files as well.

pip install openpyxl
import openpyxl
import sys

if len(sys.argv) !=2:
    exit(f"Usage: {sys.argv[0]} FILENAME")

in_file = sys.argv[1]

wb = openpyxl.load_workbook(filename = in_file)
for ws in wb.worksheets:
    print(ws.title)

ws = wb.worksheets[0]
print(ws['A1'].value)

Reading a YAML file

YAML files are often used as configuuration files.


# A comment

Course:
  Language:
    Name: Ladino
    IETF BCP 47: lad
  For speakers of:
    Name: English
    IETF BCP 47: en
  Special characters: []

Modules:
  - basic/
  - words/
  - verbs/
  - grammar/
  - names/
  - sentences/
pip install pyyaml
import yaml

filename = "data.yaml"

with open(filename) as fh:
    data = yaml.load(fh, Loader=yaml.Loader)

print(data)

Read and analyze a text file

{% embed include file="src/examples/files/text_report.txt)

import sys

if len(sys.argv) != 2:
    exit(f"Usage: {sys.argv[0]} FILENAME")

filename = sys.argv[1]

total = 0
with open(filename, "r") as fh:
    for row in fh:
        if "Report" not in row:
            continue
        text, value = row.split(":")
        # print(value)
        value = float(value.strip())
        # print(value)
        total += value

print(total)

Open and read file (easy but not recommended)

In some code you will encounter the following way of opening files. This was used before "with" was added to the language. It is not a recommended way of opening a file as you might easily forget to call "close" and that might cause trouble. For example you might loose data. Don't do that.

I am showing this as the first example, because it is easuer to understand.

filename = 'examples/files/numbers.txt'

fh = open(filename, 'r')
for line in fh:
    print(line)
fh.close()

Open and read file using with (recommended)

  • open
  • close
  • with
filename = 'examples/files/numbers.txt'

with open(filename, 'r') as fh:   # open(filename) would be enough
    for line in fh:
        print(line)               # duplicate newlines

# close is called when we leave the 'with' context

Read file remove newlines

  • trim
  • rstrip
  • chomp
filename = 'examples/files/numbers.txt'

with open(filename, 'r') as fh:
    for line in fh:
        line = line.rstrip("\n")
        print(line)

Filename on the command line

import sys

def main():
    if len(sys.argv) != 2:
        exit(f"Usage: {sys.argv[0]} FILENAME")
    filename = sys.argv[1]
    with open(filename) as fh:
        print("Working on the file", filename)

main()
$ python single.py
Usage: single.py FILENAME

$ python single.py numbers.txt
Working on the file numbers.txt

Filehandle with return

import sys

def process_file(filename):
    with open(filename, 'r') as fh:

        for line in fh:
            line = line.rstrip("\n")
            if len(line) > 0 and line[0] == '#':
                return

            if len(line) > 1 and line[0:2] == '//':
                return

            # process the line
            print(line)


process_file(sys.argv[0])

Read all the lines into a list

  • readlines

There are rare cases when you need the whole content of a file in the memory and you cannot process it line by line. In those rare cases we have several options. readlines will read the whole content into a list converting each line from the file to be an element in the list.

Beware though, if the file is too big, it might not fit in the free memory of the computer.

filename = 'examples/files/numbers.txt'

with open(filename, 'r') as fh:
    lines = fh.readlines()   # reads all the lines into a list

print(f"number of lines: {len(lines)}")

for line in lines:
    print(line, end="")
print('------')

lines.reverse()
for line in lines:
    print(line, end="")

Output:

number of lines: 2
23 345 12345
67 189 23 17
------
67 189 23 17
23 345 12345

Read all the characters into a string (slurp)

  • read

In some other cases, especially if you are looknig for some pattern that starts on one line but ends on another line. you'd be better off having the whole file as a single string in a variable. This is where the read method comes in handy.

It can also be used to read in chunks of the file.

filename = 'examples/files/numbers.txt'

with open(filename, 'r') as fh:
    content = fh.read()   # reads all the lines into a string

print(type(content))
print(len(content))   # number of characters in file

print(content)        # the content of the file

Output:

<class 'str'>
26
23 345 12345
67 189 23 17

read(20) will read 20 bytes.

Not existing file

  • IOError
filename = 'examples/files/unicorns.txt'

with open(filename, 'r') as fh:
    lines  = fh.read()
print("still running")

# Traceback (most recent call last):
#   File "examples/files/open_file.py", line 5, in <module>
#     with open(filename, 'r') as fh:
# IOError: [Errno 2] No such file or directory: 'examples/files/unicorns.txt'

# Traceback (most recent call last):
#   File "examples/files/open_file.py", line 3, in <module>
#     with open(filename, 'r') as fh:
# FileNotFoundError: [Errno 2] No such file or directory: 'examples/files/unicorns.txt'

Open file exception handling

  • try
  • except

Exception handling

filename = 'examples/files/unicorns.txt'

try:
    with open(filename, 'r') as fh:
        lines = fh.read()
except Exception as err:
    print('There was some error in the file operations.')
    print(err)
    print(type(err).__name__)

print('Still running.')

Output:

There was some error in the file operations.
[Errno 2] No such file or directory: 'examples/files/unicorns.txt'
FileNotFoundError
Still running.

Open many files - exception handling

import sys


def main():
    for filename in sys.argv[1:]:
        try:
            do_some_stuff(filename)
        except Exception as err:
            print(f"trouble with '{filename}': Error: {err}")

def do_some_stuff(filename):
    with open(filename) as fh:
        total = 0
        count = 0
        for line in fh:
            number = float(line)
            total += number
            count += 1
        print("Average: ", total/count)

main()
23
1
192
17

1
2
3
4
5
6
$ python average_from_files.py number_per_line.txt empty.txt number_per_line2.txt

Average:  58.25
trouble with 'empty.txt': Error: division by zero
Average:  3.5
$ python average_from_files.py numbers.txt

trouble with 'numbers.txt': Error: could not convert string to float: '23 345 12345\n'
$ python average_from_files.py more_numbers.txt

trouble with 'more_numbers.txt': Error: [Errno 2] No such file or directory: 'more_numbers.txt'

Writing to file

  • open
  • write

In order to write to a file we open it passing the "w" write mode. If the file did not exist it will try to create it. If the file already existed it will remove all its content so after such call to open we'll end up with an empty file if we don't write into it.

Once the file is opened we can use the write method to write to it. This will NOT automatically append a newline at the end so we'll have to include \n if we would like to insert a newline.

Opening the file will fail if we don't have write permissions or if the folder in which we are trying to create the file does not exist.

filename = 'data.txt'

with open(filename, 'w') as out:
    out.write('text\n')

Print to file

  • open
  • print

We can also use the print function to print (or write) to a file. In this case the same rules apply as printing to standard output (automatically adding a trailing newline, inserting a space between parameters). We do this by passing the file-handle as the value of the file parameter of print.


filename = 'out.txt'
with open(filename, 'w') as fh:
    print("Hello", "World", file=fh)

Append to file

  • append
filename = 'data.txt'

with open(filename, 'a') as out:
    out.write('append more text\n')

Binary mode

  • rb
import sys
if len(sys.argv) != 2:
    exit("Need name of file")

filename = sys.argv[1]

try:
    with open(filename, 'rb') as fh:
        while True:
            binary_str = fh.read(1000)
            print(len(binary_str))
            if len(binary_str) == 0:
                break
            # do something with the content of the binary_str
except Exception:
    pass

python examples/files/read_binary.py examples/pil/first.png

1000
1000
1000
1000
1000
775
0

Does file exist? Is it a file?

Direct access of a line in a file

names = ['Foo', 'Bar', 'Baz']
for name in names:
    print(name)
print(names[1])

Output:

Foo
Bar
Baz
Bar
import sys
if len(sys.argv) != 2:
    exit(f"Run {sys.argv[0]} FILENAME")

filename = sys.argv[1]

# We can iterate over the lines
#with open(filename, 'r') as fh:
#    for line in fh:
#        print(line)

# We cannot access an element
with open(filename, 'r') as fh:
    print(fh[2])
Traceback (most recent call last):
  File "examples/files/fh_access.py", line 14, in <module>
    print(fh[2])
TypeError: '_io.TextIOWrapper' object is not subscriptable

This does NOT work because files can only be accessed sequentially.

import sys
if len(sys.argv) != 2:
    exit(f"Run {sys.argv[0]} FILENAME")

filename = sys.argv[1]

with open(filename, 'r') as fh:
    rows = fh.readlines()
print(rows[2])
import sys
if len(sys.argv) != 2:
    exit(f"Run {sys.argv[0]} FILENAME")

filename = sys.argv[1]

with open(filename, 'r') as fh:
    count = 0
    for row in fh:
        if count == 2:
            break
        count += 1
print(row)

Exercise: count digits

23 345 12345
67 189 23 17
  1. Given the file examples/files/numbers.txt (or a similar file), create a file called count_digits_in_file.py that will count how many times each digit appears? The output will look like this. Just different values.
  2. Save the results in a file called report.txt.
0 0
1 3
2 3
3 4
4 2
5 2
6 1
7 2
8 1
9 1

Exercise: remove newlines

  • Create a file called remove_newlines.py that will be able to read all the lines of a given file into a list and remove trailing newlines.

Exercise: print lines with Report

In many cases you get some text report in some free form of text (and not in a CSV file or an Excel file.) You need to extract the information from such a file after recognizing the patterns. This exercise tries to provide such a case.

  • Create a script called text_report.py

Given a file that looks like this:

This is a text report there are some lines that start with
Report: 23
Other linese has this somewhere in the middle.

Begin report

Report: -3

Like this. Report: 17
More lines starting with
Report: 44

End report

We will have some exercise with this file. Maybe 4 exercises.
Report: 123
  • Print out the first line that starts with Report:.

  • Print out all the lines that have the string Report: in it.

  • Print out all the lines that start with the string Report:.

  • Print out the numbers that are after Report:. (e.g. Report: 42 print out 42)

  • Add the numbers that after after the string Report:. So in the above example the result is expected to be 204.

  • Do the same, but only take account lines between the Begin report and End report section. (sum expected to be 58)

Exercise: color selector

  • Create a file similar to the colors.txt file and use it as the list of colors in the earlier example where we prompted for a color.
  • Call the new script color_selector_file.py
blue
yellow
white
green

Extend the previous example by letting the user provide the name of the file on the command line: python color.py examples/files/color.txt

Exercise: ROT13

  • rot13

  • Implement ROT13:

  • Create a script called rot13_file.py that given a file on the command line it will replace the content with the rot13 of it of it.

Exercise: Combine lists

Tomato=78
Avocado=23
Pumpkin=100
Cucumber=17
Avocado=10
Cucumber=10

Write a script called combine_lists.py that takes the two files and combines them adding the values for each vegetable. The expected result is:

Avocado=33
Cucumber=27
Pumpkin=100
Tomato=78

Exercise: Number guessing game - save to file

Level 7

  • Create a file called number_guessing_game_7.py
  • Based on the previous solutions.
  • When starting a new game ask the user for their name and save the game information in the file.
  • The hidden number and the guesses.
  • Have an option to show the previously played games.

Solution: count numbers

import sys

if len(sys.argv) < 2:
    exit("Need name of file.")

counter = [0] * 10
filename = sys.argv[1]
with open(filename) as fh:
    for line in fh:
        for c in line.rstrip("\n"):
            if c == ' ':
                continue

            c = int(c)
            counter[c] += 1

for i in range(10):
    print("{} {}".format(i, counter[i]))

Solution: remove newlines

import sys
filename = sys.argv[0]
with open(filename) as fh:
    lines = []
    for line in fh:
        lines.append(line.rstrip("\n"))
    print(lines)

Solution: print lines with Report

import sys


def main():
    if len(sys.argv) !=2:
        exit(f"Usage: {sys.argv[0]} FILENAME")
        # text_report.txt

    in_file = sys.argv[1]
    show_rows_with_report(in_file)
    show_rows_start_with_report(in_file)
    show_numbers_after_report(in_file)
    sum_numbers_after_report(in_file)
    sum_numbers_after_report_within_begin_end_section(in_file)


def show_rows_with_report(in_file):
    with open(in_file) as fh:
        for row in fh:
            row = row.rstrip("\n")
            if 'Report:' in row:
                print(row)
    print('-' * 20)

def show_rows_start_with_report(in_file):
    with open(in_file) as fh:
        for row in fh:
            row = row.rstrip("\n")
            if row.startswith('Report:'):
                print(row)
    print('-' * 20)

def show_numbers_after_report(in_file):
    with open(in_file) as fh:
        for row in fh:
            row = row.rstrip("\n")
            if 'Report:' in row:
                parts = row.split(':')
                print(int(parts[1]))
    print('-' * 20)

def sum_numbers_after_report(in_file):
    total = 0
    with open(in_file) as fh:
        for row in fh:
            row = row.rstrip("\n")
            if 'Report:' in row:
                parts = row.split(':')
                total += int(parts[1])
    print(f"Total: {total}")
    print('-' * 20)

def sum_numbers_after_report_within_begin_end_section(in_file):
    in_section = False
    total = 0
    with open(in_file) as fh:
        for row in fh:
            row = row.rstrip("\n")
            if row == 'Begin report':
                in_section = True
                continue
            if row == 'End report':
                in_section = False
                continue
            if in_section:
                if 'Report:' in row:
                    parts = row.split(':')
                    total += int(parts[1])
                    print(int(parts[1]))
    print(f"Total in section: {total}")
    print('-' * 20)


main()

Solution: color selector

def main():
    try:
        with open('colors.txt') as fh:
            colors = []
            for line in fh:
                colors.append(line.rstrip("\n"))
    except IOError:
        print("Could not open colors.txt")
        exit()

    for i in range(len(colors)):
        print("{}) {}".format(i, colors[i]))

    c = int(input("Select color: "))
    print(colors[c])

main()

Solution: ROT13

  • rot13
import sys
import codecs

if len(sys.argv) != 2:
    exit(f"Usage: {sys.argv[0]} FILENAME")

filename = sys.argv[1]
with open(filename, 'r') as fh:
    original = fh.read()

encoded = codecs.encode(original, encoding='rot_13')
#print(encoded)

with open(filename, 'w') as fh:
    fh.write(encoded)

Solution: Combine lists

files = ['examples/files/a.txt', 'examples/files/b.txt']
names = []
values = []

for filename in files:
    with open(filename) as fh:
        for line in fh:
            name, value = line.rstrip("\n").split("=")
            value = int(value)
            if name in names:
                idx = names.index(name)
                values[idx] += value
            else:
                names.append( name )
                values.append( value )

with open('out.txt', 'w') as fh:
    for ix in range(len(names)):
        fh.write("{}={}\n".format(names[ix], values[ix]))

Solution: Combine lists with tuple

  • zip
  • tuple
files = ['examples/files/a.txt', 'examples/files/b.txt']
names = []
values = []

for filename in files:
    with open(filename) as fh:
        for line in fh:
            name, value = line.rstrip("\n").split("=")
            value = int(value)
            if name in names:
                idx = names.index(name)
                values[idx] += value
            else:
                names.append( name )
                values.append( value )

pairs = []
for ix in range(len(names)):
    pairs.append((names[ix], values[ix]))

# for name, value in zip(names, values):
#     pairs.append((name, value))

print(pairs)
print(sorted(pairs))
print(sorted(pairs, key=lambda p: p[1]))

with open('out.txt', 'w') as fh:
    for name, value in pairs:
        fh.write("{}={}\n".format(name, value))

Filehandle using with and not using it

  • open
  • close
  • with
filename = 'examples/files/numbers.txt'

fh = open(filename, 'r')
print(fh)      # <open file 'numbers.txt', mode 'r' at 0x107084390>
data = fh.read()
# do something with the data
fh.close()
print(fh)      # <closed file 'numbers.txt', mode 'r' at 0x107084390>



with open(filename, 'r') as fh:
   print(fh)  # <open file 'numbers.txt', mode 'r' at 0x1070840c0>
   data = fh.read()
print(fh)     # <closed file 'numbers.txt', mode 'r' at 0x1070840c0>

Dictionary (hash)

What is a dictionary

  • Unordered key-value pairs.
  • Keys are immutables (numbers, strings, tuples).
  • Values can be any object.

When to use dictionaries

  • ID to Name mapping.
  • Object to Count mapping.
  • Name of a feature to value of the feature.
  • Name of an attribute to value of the attribute.

Various dictionary examples

person_1 = {
    'fname': 'Moshe',
    'lname': 'Cohen',
    'email': 'moshe@cohen.com',
    'children': ['Maya', 'Tal'],
}
person_2 = {
    'fname': 'Dana',
    'lname': 'Levy',
    'email': 'dana@levy.com',
    'phone': '123-456',
}

from person import person_1, person_2

people = [person_1, person_2]
print(people[0]['fname'])
for person in people:
    print(person)
print('----------------')

people_by_name = {
    'Moshe Cohen': 'moshe@cohen.com',
    'Dana Levy': 'dana@levy.com',
}
print(people_by_name['Dana Levy'])
for name, email in people_by_name.items():
    print(f"{name}  ->  {email}")
print('----------------')



full_people_by_name = {
    'Moshe': person_1,
    'Dana': person_2,
}

print(full_people_by_name['Moshe']['lname'])
print(full_people_by_name['Dana'])
for fname, data in full_people_by_name.items():
    print(fname)
    print(data)
Moshe
{'fname': 'Moshe', 'lname': 'Cohen', 'email': 'moshe@cohen.com', 'children': ['Maya', 'Tal']}
{'fname': 'Dana', 'lname': 'Levy', 'email': 'dana@levy.com', 'phone': '123-456'}
----------------
dana@levy.com
Moshe Cohen  ->  moshe@cohen.com
Dana Levy  ->  dana@levy.com
----------------
Cohen
{'fname': 'Dana', 'lname': 'Levy', 'email': 'dana@levy.com', 'phone': '123-456'}
Moshe
{'fname': 'Moshe', 'lname': 'Cohen', 'email': 'moshe@cohen.com', 'children': ['Maya', 'Tal']}
Dana
{'fname': 'Dana', 'lname': 'Levy', 'email': 'dana@levy.com', 'phone': '123-456'}

Dictionary

  • dictionary

  • dict

  • {}

  • We can start from an empty dictionary and then fill it witg key-value pairs.

user = {}
user['name'] = 'Foobar'
print(user)        # {'name': 'Foobar'}

user['email'] = 'foo@bar.com'
print(user)        # {'name': 'Foobar', 'email': 'foo@bar.com'}

the_name = user['name']
print(the_name)    # Foobar

field = 'name'
the_value = user[field]
print(the_value)   # Foobar

user['name'] = 'Edith Piaf'
print(user)      # {'name': 'Edith Piaf', 'email': 'foo@bar.com'}

Create dictionary

  • We can also start with a dictionary that already has some data in it.
user = {
   'fname': 'Foo',
   'lname': 'Bar',
}

print(user)   # {'lname': 'Bar', 'fname': 'Foo'}

user['email'] = 'foo@bar.com'

keys

  • keys

  • Sometimes we don't know up front what keys we might have

Jupiter:300
Saturn:500
Earth:0
import sys

if len(sys.argv) != 2:
    exit(f"Usage: {sys.argv[0]} FILENAME")

filename = sys.argv[1]

planets = {}
with open(filename) as fh:
    for row in fh:
        row = row.rstrip("\n")
        # print(row)
        # planet, distance = row.split(":")
        tpl = row.split(":")
        if len(tpl) != 2:
            print(f"not good {row}")
            #exit(1)
            continue
        planet, distance = tpl
        # print(planet)
        planets[planet] = distance

print(planets) #

print(planets.keys())        #
print(list(planets.keys()))  #

  • Keys are returned in seemingly random order.

Loop over keys

  • keys
user = {
    'fname': 'Foo',
    'lname': 'Bar',
}

for key in user.keys():
    print(key)

# lname
# fname

for key in user.keys():
    print(f"{key} -> {user[key]}")

# lname -> Bar
# fname -> Foo

Loop over dictionary keys

Looping over the "dictionary" is just like looping over the keys, but personally I prefer when we use the somedictionary.keys() expression.

user = {
    'fname': 'Foo',
    'lname': 'Bar',
}

for key in user:
    print(f"{key} -> {user[key]}")

# lname -> Bar
# fname -> Foo

Loop using items

  • items
people = {
    "Tal"  : "123",
    "Maya" : "456",
    "Ruth" : "789",
}

for name, uid in people.items():
    print(f"{name} => {uid}")
Tal => 123
Maya => 456
Ruth => 789
user = {
    'fname': 'Foo',
    'lname': 'Bar',
}

for tpl in user.items():      # iterates on tuples
    print(f"{tpl[0]} -> {tpl[1]}")
    print("{} -> {}".format(*tpl))

# fname -> Foo
# fname -> Foo
# lname -> Bar
# lname -> Bar

values

  • values

  • Values are returned in the same random order as the keys are.

user = {
   'fname': 'Foo',
   'lname': 'Bar',
   'workplace': 'Bar',
}

print(user)   # {'fname': 'Foo', 'lname': 'Bar', 'workplace': 'Bar'}

print(user.keys())    # dict_keys(['fname', 'lname', 'workplace'])

print(user.values())  # dict_values(['Foo', 'Bar', 'Bar'])

Not existing key

If we try to fetch the value of a key that does not exist, we get an exception.


def main():
    user = {
        'fname': 'Foo',
        'lname': 'Bar',
    }

    print(user['fname'])
    print(user['email'])

main()
Foo
Traceback (most recent call last):
  File "examples/dictionary/no_such_key.py", line 11, in <module>
    main()
  File "examples/dictionary/no_such_key.py", line 9, in main
    print(user['email'])
KeyError: 'email'

Get key

  • get

If we use the get method, we get None if the key does not exist.

user = {
    'fname': 'Foo',
    'lname': 'Bar',
    'address': None,
}

print(user.get('fname'))        # Foo     - because 'fname' has the value 'Foo'
print(user.get('email'))        # None    - because 'email' does not exist
print(user.get('address'))      # None    - because 'address' has the value None

# set a default value to return
print(user.get('fname', 'ABC')) # Foo     - because the value of 'fname' is 'Foo'
print(user.get('answer', 42))   # 42      - because 'answer' does not exist
print(user.get('address', 23))  # None    - because None is the value of the 'address' key
Foo
None
None
Foo
42
None

None will be interpreted as False, if checked as a boolean.

Does the key exist?

  • exists
  • in
user = {
    'fname': 'Foo',
    'lname': 'Bar',
    'answer': None,
}

print('fname' in user)  # True
print('email' in user)  # False
print('answer' in user) # True
print('Foo' in user)    # False

for attr in ['fname', 'email', 'lname']:
    if attr in user:
        print(f"{attr} => {user[attr]}")

# fname => Foo
# lname => Bar

True
False
False
fname => Foo
lname => Bar

Does the value exist?

  • values
user = {
   'fname': 'Foo',
   'lname': 'Bar',
}

print('fname' in user.values())  # False
print('Foo' in user.values())    # True
False
True

Delete key

  • del
  • pop
user = {
    'fname': 'Foo',
    'lname': 'Bar',
    'email': 'foo@bar.com',
}

print(user) # {'lname': 'Bar', 'email': 'foo@bar.com', 'fname': 'Foo'}

fname = user['fname']
del user['fname']
print(fname) # Foo
print(user) # {'lname': 'Bar', 'email': 'foo@bar.com'}

lname_was = user.pop('lname')
print(lname_was) # Bar
print(user) # {'email': 'foo@bar.com'}

{'fname': 'Foo', 'lname': 'Bar', 'email': 'foo@bar.com'}
Foo
{'lname': 'Bar', 'email': 'foo@bar.com'}
Bar
{'email': 'foo@bar.com'}

List of dictionaries

people = [
    {
        'name'  : 'Foo Bar',
        'email' : 'foo@example.com'
    },
    {
        'name'     : 'Tal Bar',
        'email'    : 'tal@example.com',
        'address'  : 'Borg, Country',
        'children' : [
            'Alpha',
            'Beta'
        ]
    }
]
children = people[1]['children']

# print(people)
print(people[0]['name'])
print(people[1]['children'][0])
people[1]['children'].append('Gamma')
print(children)

print(list(map(lambda p: p['name'], people)))

people[0]['children'] = ['Zorg', 'Buzz']

Foo Bar
Alpha
['Alpha', 'Beta', 'Gamma']
['Foo Bar', 'Tal Bar']

Shared dictionary

people = [
    {
       "name" : "Foo",
       "id"   : "1",
    },
    {
       "name" : "Bar",
       "id"   : "2",
    },
    {
       "name" : "Moo",
       "id"   : "3",
    },
]

by_name = {}
by_id = {}
for person in people:
    by_name[ person['name' ] ] = person
    by_id[ person['id' ] ] = person
print(by_name)
print(by_id)
print('-------------------')

print(by_name["Foo"])
by_name["Foo"]['email'] = 'foo@weizmann.ac.il'

people[0]["name"] = "Foooooo";
print(by_name)
print(by_id)

print(by_name["Foo"])  # the key remained Foo !!!!
print(by_id["1"])
{'Foo': {'name': 'Foo', 'id': '1'}, 'Bar': {'name': 'Bar', 'id': '2'}, 'Moo': {'name': 'Moo', 'id': '3'}}
{'1': {'name': 'Foo', 'id': '1'}, '2': {'name': 'Bar', 'id': '2'}, '3': {'name': 'Moo', 'id': '3'}}
-------------------
{'name': 'Foo', 'id': '1'}
{'Foo': {'name': 'Foooooo', 'id': '1', 'email': 'foo@weizmann.ac.il'}, 'Bar': {'name': 'Bar', 'id': '2'}, 'Moo': {'name': 'Moo', 'id': '3'}}
{'1': {'name': 'Foooooo', 'id': '1', 'email': 'foo@weizmann.ac.il'}, '2': {'name': 'Bar', 'id': '2'}, '3': {'name': 'Moo', 'id': '3'}}
{'name': 'Foooooo', 'id': '1', 'email': 'foo@weizmann.ac.il'}
{'name': 'Foooooo', 'id': '1', 'email': 'foo@weizmann.ac.il'}

immutable collection: tuple as dictionary key

points = {}
p1 = (2, 3)

points[p1] = 'Joe'
points[(17, 5)] = 'Jane'

print(points)
for k in points.keys():
   print(k)
   print(k.__class__.__name__)
   print(points[k])
{(2, 3): 'Joe', (17, 5): 'Jane'}
(2, 3)
tuple
Joe
(17, 5)
tuple
Jane

immutable numbers: numbers as dictionary key


number = {
    23   : "Twenty three",
    17   : "Seventeen",
    3.14 : "Three dot fourteen",
    42   : "The answer",
}

print(number)
print(number[42])
print(number[3.14])
{23: 'Twenty three', 17: 'Seventeen', 3.14: 'Three dot fourteen', 42: 'The answer'}
The answer
Three dot fourteen

Sort a dictionary

When people says "sort a dictionary" they usually mean sorting the keys of the dictionary, but what does it mean in Python if we call sorted on a dictionary?

scores = {
   'Foo' : 10,
   'Bar' : 34,
   'Miu' : 88,
   'Abc' : 34,
}

print(scores) # {'Foo': 10, 'Bar': 34, 'Miu': 88, 'Abc': 34}

sorted_names = sorted(scores) # "sort dictionary" sorts the keys
print(sorted_names)  # ['Abc', 'Bar', 'Foo', 'Miu']

sorted_keys = sorted(scores.keys())
print(sorted_keys)  # ['Abc', 'Bar', 'Foo', 'Miu']

Sort dictionary values

scores = {
   'Foo' : 10,
   'Bar' : 34,
   'Miu' : 88,
   'Abc' : 34,
}

# sort the values, but we cannot get the keys back!
sorted_values = sorted(scores.values())
print(sorted_values) # [10, 34, 34, 88]

Sort dictionary by value

  • Sort the keys by the values
scores = {
   'Foo' : 10,
   'Bar' : 34,
   'Miu' : 88,
   'Abc' : 34,
}

def by_value(x):
    return scores[x]

sorted_names = sorted(scores.keys(), key=by_value)
print(sorted_names) # ["Foo", "Bar", "Abc", "Miu"]

# sort using a lambda expression
sorted_names = sorted(scores.keys(), key=lambda x: scores[x])

print(sorted_names) # ["Foo", "Bar", "Abc", "Miu"]

for k in sorted_names:
    print("{} : {}".format(k, scores[k]))

# Foo : 10
# Bar : 34
# Abc : 34
# Miu : 88

scores = {
   'Foo' : 10,
   'Bar' : 34,
   'Miu' : 88,
   'Abc' : 34,
}

# sort the keys according to the values:
sorted_names = sorted(scores, key=scores.__getitem__)

print(sorted_names) # ["Foo", "Bar", "Miu", "Abc"]

for k in sorted_names:
    print("{} : {}".format(k, scores[k]))

# Foo : 10
# Bar : 34
# Abc : 34
# Miu : 88

Sort dictionary keys by value (another example)

  • sort
  • key
scores = {
    "Jane"    : 30,
    "Joe"     : 20,
    "George"  : 30,
    "Hellena" : 90,
}

for name in scores.keys():
    print(f"{name:8} {scores[name]}")

print('')
for name in sorted(scores.keys()):
    print(f"{name:8} {scores[name]}")

print('')
for val in sorted(scores.values()):
    print(f"{val:8}")

print('')
for name in sorted(scores.keys(), key=lambda x: scores[x]):
    print(f"{name:8} {scores[name]}")
Jane     30
Joe      20
George   30
Hellena  90

George   30
Hellena  90
Jane     30
Joe      20

      20
      30
      30
      90

Joe      20
Jane     30
George   30
Hellena  90

Insertion Order is kept

Since Python 3.7

d = {}
d['a'] = 1
d['b'] = 2
d['c'] = 3
d['d'] = 4
print(d)

{'a': 1, 'b': 2, 'c': 3, 'd': 4}

Change order of keys in dictionary - OrderedDict

  • collections
  • OrderedDict
from collections import OrderedDict

d = OrderedDict()
d['a'] = 1
d['b'] = 2
d['c'] = 3
d['d'] = 4

print(d)
d.move_to_end('a')

print(d)
d.move_to_end('d', last=False)

print(d)

for key in d.keys():
    print(key)
OrderedDict([('a', 1), ('b', 2), ('c', 3), ('d', 4)])
OrderedDict([('b', 2), ('c', 3), ('d', 4), ('a', 1)])
OrderedDict([('d', 4), ('b', 2), ('c', 3), ('a', 1)])
d
b
c
a

Set order of keys in dictionary - OrderedDict

  • collections
  • OrderedDict
from collections import OrderedDict

d = {}
d['a'] = 1
d['b'] = 2
d['c'] = 3
d['d'] = 4
print(d)

planned_order = ('b', 'c', 'd', 'a')
e = OrderedDict(sorted(d.items(), key=lambda x: planned_order.index(x[0])))
print(e)

print('-----')
# Create index to value mapping dictionary from a list of values
planned_order = ('b', 'c', 'd', 'a')
plan = dict(zip(planned_order, range(len(planned_order))))
print(plan)

f = OrderedDict(sorted(d.items(), key=lambda x: plan[x[0]]))
print(f)

{'a': 1, 'b': 2, 'c': 3, 'd': 4}
OrderedDict([('b', 2), ('c', 3), ('d', 4), ('a', 1)])
-----
{'b': 0, 'c': 1, 'd': 2, 'a': 3}
OrderedDict([('b', 2), ('c', 3), ('d', 4), ('a', 1)])

Setdefault

Trying to access a key in a dictionary that does not exist will result a KeyError exception.

Using the get method we can avoid this. The get method, will return the value of the key if the key exists. None if the key does not exists, or a default value if it was supplied to the get method. This will not change the dictionary.

Using the setdefault method is similar to the get method but it will also create the key with the given value.


grades = {}
# print(grades['python'])              # KeyError: 'python'
print(grades.get('python'))           # None
print(grades.get('python', 'snake'))  # snake
print(grades)                         # {}

print(grades.setdefault('perl'))      # None
print(grades)                         # {'perl': None}

print(grades.setdefault('python', 'snake')) # 'snake'
print(grades)                         # {'perl': None, 'python': 'snake'}

print(grades.setdefault('python', 'boa')) # 'snake'
print(grades)                         # {'perl': None, 'python': 'snake'}

Exercise: count characters

  • Write a script called count_characters.py that given a long text will count how many times each character appears.
  • Change the code so it will be able to count characters in a file.
text = """
This is a very long text.
OK, maybe it is not that long after all.
"""

Exercise: count words

  • Create script called count_words.py
  • Skeleton:
words = ['Wombat', 'Rhino', 'Sloth', 'Tarantula', 'Sloth', 'Rhino', 'Sloth']

Expected output: (the order is not important)

Wombat:1
Rhino:2
Sloth:3
Tarantula:1

Exercise: count words from a file

Create a script called count_words_from_a_file.py that given a file with words and spaces and newlines only, count how many times each word appears.

Lorem ipsum dolor qui ad labor ad labor sint dolor  tempor incididunt ut labor ad dolore lorem ad
Ut labor ad dolor lorem qui ad ut labor   ut ad commodo commodo
Lorem ad dolor in reprehenderit in lorem ut labor ad dolore eu in labor dolor
sint occaecat ad labor proident sint in in qui labor ad dolor ad in ad labor

Expected result for the above file:

ad            13
commodo        2
dolor          6
dolore         2
eu             1
in             6
incididunt     1
ipsum          1
labor         10
lorem          5
occaecat       1
proident       1
qui            3
reprehenderit  1
sint           3
tempor         1
ut             5

Exercise: Apache log

Every web server logs the visitors and their requests in a log file. The Apache web server has a log file similar to the following file. (Though I have trimmed the lines for the exercise.) Each line is a "hit", a request from the browser of a visitor.

Each line starts with the IP address of the visitor. e.g. 217.0.22.3.

Create a script called apache_log_parser.py that given sucha a log file from Apache, report how many hits (line were from each IP address.

{% embed include file="src/examples/dictionary/apache_access.log)

Expected output:

127.0.0.1         12
139.12.0.2         2
217.0.22.3         7

Exercise: Combine lists again

See the same exercise in the previous chapter. Use the filename combine_lists_using_dictionary.py.

Exercise: counting DNA bases

Write a script called count_dna_bases.py that given a sequence like this: "ACTNGTGCTYGATRGTAGCYXGTN", will print out the distribution of the elemnts to get the following result:

A 3 - 12.50 %
C 3 - 12.50 %
G 6 - 25.00 %
N 2 -  8.33 %
R 1 -  4.17 %
T 6 - 25.00 %
X 1 -  4.17 %
Y 2 -  8.33 %

Exercise: Count Amino Acids

  • Each sequence consists of many repetition of the 4 bases represented by the ACTG characters.

  • There are 64 codons (sets of 3 bases following each other)

  • There are 20 Amino Acids each of them are represented by 3 bases (by one codon).

  • Some of the Amino Acids can be represented in multiple ways, represented in the Codon Table. For example Histidine can be encoded by both CAU, CAC

  • Create a file called count_amino_acids.py that given a file witha DNA sequence in it, will count the Amino acids from the sequence.

  • Read the sequence saved in a txt file.

  • You can generate a sequence with a random number generator and save it to that file, but it would be much better if you used a real sequence.

  • An even better way would be to read the sequence from a FASTA file. You can download one from NCBI.

  • Skeleton:

codon_table = {
    'Phe' : ['TTT', 'TTC'],
    'Leu' : ['TTA', 'TTG', 'CTT', 'CTC', 'CTA', 'CTG'],
    'Ile' : ['ATT', 'ATC', 'ATA'],
    'Met' : ['ATG'],
    'Val' : ['GTT', 'GTC', 'GTA', 'GTG'],
    'Ser' : ['TCT', 'TCC', 'TCA', 'TCG', 'AGT', 'AGC'],
    'Pro' : ['CCT', 'CCC', 'CCA', 'CCG'],
    'Thr' : ['ACT', 'ACC', 'ACA', 'ACG'],
    'Ala' : ['GCT', 'GCC', 'GCA', 'GCG'],
    'Tyr' : ['TAT', 'TAC'],
    'His' : ['CAT', 'CAC'],
    'Gln' : ['CAA', 'CAG'],
    'Asn' : ['AAT', 'AAC'],
    'Lys' : ['AAA', 'AAG'],
    'Asp' : ['GAT', 'GAC'],
    'Glu' : ['GAA', 'GAG'],
    'Cys' : ['TGT', 'TGC'],
    'Trp' : ['TGG'],
    'Arg' : ['CGT', 'CGC', 'CGA', 'CGG', 'AGA', 'AGG'],
    'Gly' : ['GGT', 'GGC', 'GGA', 'GGG'],
    'STOP' : ['TAA', 'TAG', 'TGA']
}


  • You will want to convert this to a dictionary that maps each codon to an Amino Acid. Do it programmatically!

Exercise: List of dictionaries

Given the following file build a list of dictionaries where each dictionary represents one person. The keys in the dictionary are the names of the columns (fname, lname, born) the values are the respective values from each row. Create a file called list_of_dictionaries.py.

fname,lname,born
Graham,Chapman,8 January 1941
Eric,Idle,29 March 1943
Terry,Gilliam,22 November 1940
Terry,Jones,1 February 1942
John,Cleese,27 October 1939
Michael,Palin,5 May 1943
  • Skeleton

# ...

print(people[1]['fname'])

Exercise: Dictionary of dictionaries

Given the following file build a dictionary of dictionaries where each internal dictionary represents one person. The keys in the internal dictionaries are the names of the columns (fname, lname, born) the values are the respective values from each row. In the outer dictionary the keys are the (fname, lname) tuples. Create a file called dictionary_of_dictionaries.py

fname,lname,born
Graham,Chapman,8 January 1941
Eric,Idle,29 March 1943
Terry,Gilliam,22 November 1940
Terry,Jones,1 February 1942
John,Cleese,27 October 1939
Michael,Palin,5 May 1943

Skeleton:


# ...

print(people[('Eric', 'Idle')]['born']) # 29 March 1943

Exercise: Age limit with dictionaries

  • Create a file called age_limit_with_dictionary.py

  • Ask the user what is their age and in which country are they located.

  • Tell them if they can legally drink alcohol.

  • See the Legal drinking age list.

  • Given a file like the following create a new file with a third column in which you write "yes", or "no" depending if the person can legally drink alcohol in that country.

Exercise: Merge files with timestamps

  • Write a script called merge_files_with_timestamps.py
  • Given a few CSV files in which the first column is a timestamp, write a script that can merge the files so the merged result also has timestamps in increasing order.
  • First try to solve it for 2 files.
  • Then solve it for any N files.
1601009973,1
1601009975,3
1601009976,4
1601009978,6
1601009981,9
1601009982,10
1601009983,11
1601009984,12
1601009987,15
1601009989,17
1601009990,18
1601009991,19
1601009992,20
1601009974,2
1601009977,5
1601009980,8
1601009988,16
1601009979,7
1601009985,13
1601009986,14

Solution: count characters

text = """
This is a very long text.
OK, maybe it is not that long after all.
"""

# print(text)
count = {}

for char in text:
    if char == '\n':
        continue
    if char not in count:
        count[char] = 1
    else:
        count[char] += 1

for key in sorted( count.keys() ):
    print("'{}' {}".format(key, count[key]))
  • We need to store the counter somewhere. We could use two lists for that, but that would give a complex solution that runs in O(n**2) time.
  • Besides, we are in the chapter about dictionaries so probably we better use a dictionary.
  • In the count dictionary we each key is going to be one of the characters and the respective value will be the number of times it appeared.
  • So if out string is "aabx" then we'll end up with
{
    "a": 2,
    "b": 1,
    "x": 1,
}
  • The for in loop on a string will iterate over it character by charter (even if we don't call our variable char.

  • We check if the current character is a newline \n and if it we call continue to skip the rest of the iteration. We don't want to count newlines.

  • Then we check if we have already seen this character. That is, it is already one of the keys in the count dictionary. If not yet, then we add it and put 1 as the values. After all we saw one copy of this character. If we have already seen this character (we get to the else part) then we increment the counter for this character.

  • We are done now with the data collection.

  • In the second loop we go over the keys of the dictionary, that is the characters we have encountered. We sort them in ASCII order.

  • Then we print each one of them and the respective value, the number of times the character was found.

Default Dict

  • collections
  • defaultdict
counter = {}

word = 'eggplant'

counter[word] += 1
# counter[word] = counter[word] + 1
Traceback (most recent call last):
  File "counter.py", line 5, in <module>
    counter[word] += 1
KeyError: 'eggplant'
counter = {}

word = 'eggplant'

if word not in counter:
    counter[word] = 0
counter[word] += 1

print(counter)
{'eggplant': 1}
from collections import defaultdict

counter = defaultdict(int)

word = 'eggplant'

counter[word] += 1

print(counter)
defaultdict(<class 'int'>, {'eggplant': 1})

Solution: count characters with default dict

  • collections
  • defaultdict
from collections import defaultdict

text = """
This is a very long text.
OK, maybe it is not that long after all.
"""

# print(text)
count = defaultdict(int)

for char in text:
    if char == '\n':
        continue
    count[char] += 1

for key in sorted( count.keys() ):
    print("'{}' {}".format(key, count[key]))
  • The previous solution can be slightly improved by using defaultdict from the collections module.
  • count = defaultdict(int) creates an empty dictionary that has the special feature that if you try to use a key that does not exists, it pretends that it exists and that it has a value 0.
  • This allows us to remove the condition checking if the character was already seen and just increment the counter. The first time we encounter a charcter the dictionary will pretend that it was already there with value 0 so everying will work out nicely.

Solution: count words (plain)

words = ['Wombat', 'Rhino', 'Sloth', 'Tarantula', 'Sloth', 'Rhino', 'Sloth']

counter = {}
for word in words:
   if word not in counter:
       counter[word] = 0
   counter[word] += 1

for word in counter:
   print("{}:{}".format(word, counter[word]))



Solution: count words (defaultdict)

from collections import defaultdict

words = ['Wombat', 'Rhino', 'Sloth', 'Tarantula', 'Sloth', 'Rhino', 'Sloth']

counter = defaultdict(int)
for word in words:
   counter[word] += 1

print(counter)
for word in counter.keys():
   print("{}:{}".format(word, counter[word]))

Solution: count words (Counter)

from collections import Counter

words = ['Wombat', 'Rhino', 'Sloth', 'Tarantula', 'Sloth', 'Rhino', 'Sloth']

cnt = Counter()
for word in words:
   cnt[word] += 1

print(cnt)
for word in cnt.keys():
   print("{}:{}".format(word, cnt[word]))



Solution: count words in file

from collections import defaultdict
import sys

filename = 'README'
if len(sys.argv) > 1:
    filename = sys.argv[1]
print(filename)

count = defaultdict(int)

with open(filename) as fh:
    for full_line in fh:
        line = full_line.rstrip('\n')
        line = line.lower()
        for word in line.split():
            if word == '':
                continue
            count[word] += 1

for word in sorted(count):
    print("{:13} {:>2}".format(word, count[word]))

Solution: Apache log

from collections import defaultdict
import sys

filename = 'apache_access.log'
if len(sys.argv) > 1:
    filename = sys.argv[1]

count = defaultdict(int)

with open(filename) as fh:
    for line in fh:
        space = line.index(' ')
        ip = line[0:space]
        count[ip] += 1

for ip in count:
    print("{:16} {:>3}".format(ip, count[ip]))

Solution: Apache log using split

from collections import defaultdict
import sys

filename = 'apache_access.log'
if len(sys.argv) > 1:
    filename = sys.argv[1]

count = defaultdict(int)

with open(filename) as fh:
    for line in fh:
        ip, rest = line.split(' ', 1)
        #ip = line.split(' ', 1)[0]
        count[ip] += 1

for ip in count:
    print("{:16} {:>3}".format(ip, count[ip]))

Solution: Combine files

  • This is a working, but very verbose solution. Check out the next one!
c = {}
with open('examples/files/a.txt') as fh:
    for line in fh:
        k, v = line.rstrip("\n").split("=")
        if k in c:
            c[k] += int(v)
        else:
            c[k] = int(v)

with open('examples/files/b.txt') as fh:
    for line in fh:
        k, v = line.rstrip("\n").split("=")
        if k in c:
            c[k] += int(v)
        else:
            c[k] = int(v)


with open('out.txt', 'w') as fh:
    for k in sorted(c.keys()):
        fh.write("{}={}\n".format(k, c[k]))

Solution: Combine files-improved

from collections import defaultdict

combined = defaultdict(int)

for filename in (['examples/files/a.txt', 'examples/files/b.txt']):
    with open(filename) as fh:
        for line in fh:
            key, value = line.rstrip("\n").split("=")
            combined[key] += int(value)


with open('out.txt', 'w') as fh:
    for key, value in sorted(combined.items()):
        print("{}={}".format(key, value))
        fh.write("{}={}\n".format(key, value))

Solution: counting DNA bases

from collections import defaultdict

seq = "ACTNGTGCTYGATRGTAGCYXGTN"

count = defaultdict(int)

for cr in seq:
   count[cr] += 1

for cr in sorted(count.keys()):
    print("{} {} - {:>5.2f} %".format(cr, count[cr], 100 * count[cr]/len(seq)))

# >5 is the right alignment of 5 places
# .2f is the floating point with 2 digits after the floating point

Solution: Count Amino Acids

Generate random DNA sequence

import sys
import random

if len(sys.argv) != 2:
    exit("Need a number")
count = int(sys.argv[1])

dna = []
for _ in range(count):
    dna.append(random.choice(['A', 'C', 'T', 'G']))
print(''.join(dna))

dna = 'CACCCATGAGATGTCTTAACGCTGCTTTCATTATAGCCG'

aa_by_codon = {
    'ACG' : '?',
    'CAC' : 'Histidin',
    'CAU' : 'Histidin',
    'CCA' : 'Proline',
    'CCG' : 'Proline',
    'GAT' : '?',
    'GTC' : '?',
    'TGA' : '?',
    'TTA' : '?',
    'CTG' : '?',
    'CTT' : '?',
    'TCA' : '?',
    'TAG' : '?',
    #...
}

count = {}

for i in range(0, len(dna)-2, 3):
    codon = dna[i:i+3]
    #print(codon)
    aa = aa_by_codon[codon]
    if aa not in count:
        count[aa] = 0
    count[aa] += 1

for aa in sorted(count.keys()):
    print("{}  {}".format(aa, count[aa]))

seq = input('Type your DNA sequence here: ').upper()

codon_table = {
    'Phe' : ['TTT', 'TTC'],
    'Leu' : ['TTA', 'TTG', 'CTT', 'CTC', 'CTA', 'CTG'],
    'Ile' : ['ATT', 'ATC', 'ATA'],
    'Met' : ['ATG'],
    'Val' : ['GTT', 'GTC', 'GTA', 'GTG'],
    'Ser' : ['TCT', 'TCC', 'TCA', 'TCG', 'AGT', 'AGC'],
    'Pro' : ['CCT', 'CCC', 'CCA', 'CCG'],
    'Thr' : ['ACT', 'ACC', 'ACA', 'ACG'],
    'Ala' : ['GCT', 'GCC', 'GCA', 'GCG'],
    'Tyr' : ['TAT', 'TAC'],
    'His' : ['CAT', 'CAC'],
    'Gln' : ['CAA', 'CAG'],
    'Asn' : ['AAT', 'AAC'],
    'Lys' : ['AAA', 'AAG'],
    'Asp' : ['GAT', 'GAC'],
    'Glu' : ['GAA', 'GAG'],
    'Cys' : ['TGT', 'TGC'],
    'Trp' : ['TGG'],
    'Arg' : ['CGT', 'CGC', 'CGA', 'CGG', 'AGA', 'AGG'],
    'Gly' : ['GGT', 'GGC', 'GGA', 'GGG'],
    'STOP' : ['TAA', 'TAG', 'TGA']
}

amino_acids = []
counter = {}
protein_sequence = []

while seq:
    amino_acids.append(seq[:3])
    seq = seq[3:]

for codon in amino_acids:
    if len(codon) < 3:
        print('The remaining bases: {} are not coding for an amino acid'.format(codon))
    for aa in codon_table:
        if codon in codon_table[aa]:
            if aa in counter:
                counter[aa] += 1
            else:
                counter[aa] = 1
            protein_sequence.append(aa)
            break

print(''.join(protein_sequence))

ordered = sorted(counter.keys())
for aa in ordered:
    print('{} {} - {:>5.2f} %'.format(aa, counter[aa], counter[aa]/len(protein_sequence)*100))

Solution: List of dictionaries

import sys

filename = 'examples/csv/monty_python.csv'
if len(sys.argv) == 2:
    filename = sys.argv[1]

people = []

with open(filename) as fh:
    fh.readline()  # skip first row
    for line in fh:
        line = line.rstrip('\n')
        fname, lname, born = line.split(',')
        people.append({
            'fname': fname,
            'lname': lname,
            'born': born,
        })

print(people[1]['fname'])
import sys
import csv

filename = 'examples/csv/monty_python.csv'
if len(sys.argv) == 2:
    filename = sys.argv[1]

people = []

with open(filename) as fh:
    reader = csv.DictReader(fh)
    for line in reader:
        people.append(line)

print(people[1]['fname'])

Solution: Dictionary of dictionaries

import sys

filename = 'examples/csv/monty_python.csv'
if len(sys.argv) == 2:
    filename = sys.argv[1]

people = {}

with open(filename) as fh:
    fh.readline() # skip first row
    for line in fh:
        line = line.rstrip('\n')
        fname, lname, born = line.split(',')
        people[(fname, lname)] = {
            'fname': fname,
            'lname': lname,
            'born': born,
        }

print(people[('Eric', 'Idle')]['born'])
import sys
import csv

filename = 'examples/csv/monty_python.csv'
if len(sys.argv) == 2:
    filename = sys.argv[1]

people = {}

with open(filename) as fh:
    reader = csv.DictReader(fh)
    for line in reader:
        people[(line['fname'], line['lname'])] = line

print(people[('Eric', 'Idle')]['born'])

Solution: Age limit with dictionaries

legal_drinking_age = {
0 : ['Angola', 'Guinea-Bissau', 'Nigeria', 'Togo', 'Western Sahara', 'Haiti', 'Cambodia', 'Macau'],
15 : ['Central African Republic'],
16 : [
'Gambia',
'Morocco',
'Antigua and Barbuda',
'Barbados',
'British Virgin Islands',
'Cuba',
'Dominica',
'Grenada',
'Saint Lucia',
'Saint Vincent and the Grenadines',
'Palestinian Authority',
'Austria',
'Denmark',
'Germany',
'Gibraltar',
'Lichtenstein',
'Luxembourg',
'San Marino',
'Switzerland'
],
17 : ['Malta'],
19 : ['Canada', 'South Korea'],
20 : ['Benin', 'Paraguay', 'Japan', 'Thailand', 'Uzbekistan', 'Iceland', 'Sweden'],
21 : [
'Cameroon',
'Egypt',
'Equatorial Guinea',
'Bahrain', 'Indonesia',
'Kazakhstan',
'Malaysia',
'Mongolia',
'Oman',
'Qatar',
'Sri Lanka',
'Turkmenistan',
'United Arab Emirates',
'American Samoa',
'Northern Mariana Islands',
'Palau',
'Samoa',
'Solomon Islands'
],
25 : ['USA'],
200 : ['Lybia', 'Somalia', 'Sudan', 'Afghanistan', 'Brunei', 'Iran', 'Iraq', 'Kuwait', 'Pakistan', 'Saudi Arabia', 'Yemen'],
}

age = int(input('Please enter your age in number of years: '))
country = input('Please enter the country of your location: ')

for k in legal_drinking_age:
    if country in legal_drinking_age[k]:
        print('The minimum legal drinking age in your location is: {} years'.format(k))
        if age >= k:
            exit('You are allowed to consume alcohol in your location')
        else:
            exit('You are not permitted to consume alcohol currently in your location.')
print('The minimum legal drinking age in your location is: 18 years')
if age >= 18:
    exit('You are allowed to consume alcohol in your location')
else:
    exit('You are not permitted to consume alcohol currently in your location.')

Solution: Merge files with timestamps

import sys

file_a = sys.argv[1]
file_b = sys.argv[2]

with open(file_a) as fha:
    with open(file_b) as fhb:
        line_a = None
        line_b = None
        while True:
            if line_a is None:
                line_a = fha.readline()
            if line_b is None:
                line_b = fhb.readline()

            if line_a == '' and line_b == '':
                break

            if line_a == '':
                print(line_b, end='')
                line_b = None
                continue

            if line_b == '':
                print(line_a, end='')
                line_a = None
                continue

            time_a = line_a.split(',')[0]
            time_b = line_b.split(',')[0]
            if int(time_a) < int(time_b):
                print(line_a, end='')
                line_a = fha.readline()
            else:
                print(line_b, end='')
                line_b = fhb.readline()

import sys

files = sys.argv[1:]

fhs = {}
rows = {}
for filename in files:
    try:
        fhs[filename] = open(filename)
        rows[filename] = None
    except Exception:
        print("Could not open {filename}")


while True:
    files_with_content = []
    for filename, fh in fhs.items():
        if rows[filename] is None:
            rows[filename] = fh.readline()
        if rows[filename] != '':
            files_with_content.append(filename)

    if not files_with_content:
        break

    sorted_rows = sorted(files_with_content, key=lambda filename: rows[filename].split(',')[0])
    smallest = sorted_rows[0]
    print(rows[smallest], end='')
    rows[smallest] = None


for fh in fhs.values():
    fh.close()

Do not change dictionary in loop

user = {
    'fname': 'Foo',
    'lname': 'Bar',
}

for k in user.keys():
    user['email'] = 'foo@bar.com'
    print(k)

print('-----')

for k in user:
    user['birthdate'] = '1991'
    print(k)

# lname
# fname
# -----
# lname
# Traceback (most recent call last):
#   File "examples/dictionary/change_in_loop.py", line 13, in <module>
#     for k in user:
# RuntimeError: dictionary changed size during iteration

Named tuple (sort of immutable dictionary)

  • namedtuple

  • A bit like an immutable dictionary

from collections import namedtuple

Person = namedtuple('Person', ['name', 'email'])

one = Person(name='Joe', email='joe@example.com')
two = Person(name='Jane', email='jane@example.com')

print(one.name)
print(two.email)

Create dictionary from List

categories_list = ['animals', 'vegetables', 'fruits']

categories_dict = {cat:[] for cat in categories_list}
print(categories_dict)
categories_dict['animals'].append('cat')
print(categories_dict)

{'animals': [], 'vegetables': [], 'fruits': []}
{'animals': ['cat'], 'vegetables': [], 'fruits': []}

Sort Hungarian letters (lookup table)

letters = [
    "a", "á", "b", "c", "cs", "d", "dz", "dzs", "e", "é", "f",
    "g", "gy", "h", "i", "í", "j", "k", "l", "ly", "m", "n",
    "ny", "o", "ó", "ö", "ő", "p", "q", "r", "s", "sz", "t",
    "ty", "u", "ú", "ü", "ű", "v", "w", "x", "y", "z", "zs",
]
print(enumerate(letters))
print('-------')
print(list(enumerate(letters)))
print('-------')
print(dict(enumerate(letters)))
print('-------')
#mapping = {v:k for k, v in dict(enumerate(letters)).items()}
mapping = {letter:ix for ix, letter in enumerate(letters)}
print(mapping)
print('------------------')

text = ["cs", "á", "ő", "ú", "e", "dzs", "zs", "a", "ny"]
print(sorted(text))
print('------------------')
print(sorted(text, key=lambda letter: mapping[letter]))

Sets

sets

  • set

  • Sets in Python are used when we are primarily interested in operations that we know from the set theory.

  • See also the Venn diagrams.

  • In day to day speach we often use the word "group" instead of "set" even though they are not the same.

  • What are the common elements of two set (two groups).

  • Is one group (set) the subset of the other?

  • What are all the elements that exist in both groups (sets)?

  • What are the elements that exist in exactly one of the groups (sets)?

Venn diagram A intersect B

set operations

  • set

  • issubset

  • intersection

  • symmetric_difference

  • set

  • issubset

  • intersection

  • symmetric difference

  • union

  • relative complement (difference)

  • stdtypes: set

Creating a set

things = {'table', 'chair', 'door', 'chair'}
print(things)
print(type(things))

if 'table' in things:
   print("has table")


Output:

{'door', 'chair', 'table'}
<class 'set'>
has table

Creating a set from a list

furniture = ['table', 'chair', 'door', 'chair', 'chair']
things = set(furniture)
print(things)
print(type(things))

if 'table' in things:
   print("has table")

Output:

{'table', 'chair', 'door'}
<class 'set'>
has table

Converting set to list

planets = {'Mars', 'Jupiter', 'Saturn', 'Mercury', 'Venus', 'Earth', 'Mars'}
print(planets)

planets_list = list(planets)
print(planets_list)

Output:

{'Jupiter', 'Mars', 'Earth', 'Saturn', 'Venus', 'Mercury'}
['Jupiter', 'Mars', 'Earth', 'Saturn', 'Venus', 'Mercury']

Creating an empty set

objects = set()
print(objects)
print(type(objects))

other = {}
print(other)
print(type(other)) # This is an empty dict and not a set!!!!

Output:

set()
<class 'set'>
{}
<class 'dict'>

Adding an element to a set (add)

objects = set()
print(objects)

objects.add('Mars')
print(objects)

objects.add('Mars')
print(objects)

objects.add('Neptun')
print(objects)

Output:

set()
{'Mars'}
{'Mars'}
{'Neptun', 'Mars'}

Merging one set into another set (update)

objects  = set(['Mars', 'Jupiter', 'Saturn'])
internal = set(['Mercury', 'Venus', 'Earth', 'Mars'])

objects.update(internal)
print(objects)
print(internal)

Output:

{'Mars', 'Earth', 'Jupiter', 'Saturn', 'Mercury', 'Venus'}
{'Earth', 'Mars', 'Mercury', 'Venus'}

set intersection

  • set
  • intersection
english = set(['door', 'car', 'lunar', 'era'])
spanish = set(['era', 'lunar', 'hola'])

print('english: ', english)
print('spanish: ', spanish)

both = english.intersection(spanish)
print(both)
  • intersection returns the elements that are in both sets.

Output:

english:  {'car', 'lunar', 'era', 'door'}
spanish:  {'lunar', 'era', 'hola'}
{'lunar', 'era'}

set subset

  • set
  • issubset
english = set(['door', 'car', 'lunar', 'era'])
spanish = set(['era', 'lunar', 'hola'])

words = set(['door', 'lunar'])


print('issubset: ', words.issubset( english ))
print('issubset: ', words.issubset( spanish ))

Output:

issubset:  True
issubset:  False

set symmetric difference

  • set
  • symmetric_difference
english = set(['door', 'car', 'lunar', 'era'])
spanish = set(['era', 'lunar', 'hola'])

diff = english.symmetric_difference(spanish)
print('symmetric_difference: ', diff)
  • Symmetric difference contains all the elements in either one of the sets, but not in both. "the ears of the elephant".

Output:

symmetric_difference:  {'door', 'hola', 'car'}

set union

  • set
  • union
english = set(['door', 'car', 'lunar', 'era'])
spanish = set(['era', 'lunar', 'hola'])

all_the_words = english.union(spanish)

print(english)
print(spanish)
print(all_the_words)

# x = english + spanish # TypeError: unsupported operand type(s) for +: 'set' and 'set'

Output:

{'era', 'door', 'lunar', 'car'}
{'era', 'hola', 'lunar'}
{'era', 'door', 'car', 'hola', 'lunar'}

set relative complement (difference)

english = set(['door', 'car', 'lunar', 'era'])
spanish = set(['era', 'lunar', 'hola'])

print(spanish.difference(english))
print(english.difference(spanish))
print()


eng = english - spanish
spa = spanish - english
print(spa)
print(eng)
print()

print(english)
print(spanish)

Output:

{'hola'}
{'door', 'car'}

{'hola'}
{'door', 'car'}

{'door', 'car', 'era', 'lunar'}
{'lunar', 'era', 'hola'}

Set of numbers

numbers = {2, 3}
print(numbers)

Output:

{2, 3}

Set of lists

lists = set([ [2, 3], [1, 2] ])

Output:

Traceback (most recent call last):
  File "/home/gabor/work/slides/python/examples/sets/set_of_lists.py", line 1, in <module>
    lists = set([ [2, 3], [1, 2] ])
TypeError: unhashable type: 'list'

Set of tuples

tuples = set([ (2, 3), (1, 2) ])
print(tuples)
print(type(tuples))

Output:

{(2, 3), (1, 2)}
<class 'set'>

Create set from List

categories_list = ['animals', 'vegetables', 'fruits']

categories_set = {cat:set() for cat in categories_list}
print(categories_set)
categories_set['animals'].add('cat')
print(categories_set)

Output:

{'animals': set(), 'vegetables': set(), 'fruits': set()}
{'animals': {'cat'}, 'vegetables': set(), 'fruits': set()}

Code Reuse

Permutations

import sys

if len(sys.argv) != 2:
    exit(f"Usage: {sys.argv[0]} n")

'''
  n!
'''

n = int(sys.argv[1])

n_fact = 1
for i in range(1, n+1):
    n_fact *= i
print(n_fact)

k-Permutations

import sys

if len(sys.argv) != 3:
    exit(f"Usage: {sys.argv[0]} n r")

'''
               n!
 P(n, r)  =  -----
             (n-r)!

'''

n = int(sys.argv[1])
r = int(sys.argv[2])

n_fact = 1
for i in range(1, n+1):
    n_fact *= i
#print(n_fact)

n_r_fact = 1
for i in range(1, n-r+1):
    n_r_fact *= i
#print(n_r_fact)

P = n_fact // n_r_fact
print(P)

Binomial coefficient

import sys

if len(sys.argv) != 3:
    exit(f"Usage: {sys.argv[0]} n k")

'''
   n         n!
   -  =   ---------
   k      k!*(n-k)!

'''

n = int(sys.argv[1])
k = int(sys.argv[2])

n_fact = 1
for i in range(1, n+1):
    n_fact *= i
print(n_fact)

n_k_fact = 1
for i in range(1, n-k+1):
    n_k_fact *= i
print(n_k_fact)

k_fact = 1
for i in range(1, k+1):
    k_fact *= i
print(k_fact)


bc = n_fact // (k_fact * n_k_fact)
print(bc)

Binomial coefficient - factorial function

import sys

if len(sys.argv) != 3:
    exit(f"Usage: {sys.argv[0]} n k")


'''
   n         n!
   -  =   ---------
   k      k!*(n-k)!

'''

def fact(x):
    x_fact = 1
    for i in range(1, x+1):
        x_fact *= i
    return x_fact


n = int(sys.argv[1])
k = int(sys.argv[2])

n_fact = fact(n)
print(n_fact)

n_k_fact = fact(n-k)
print(n_k_fact)

k_fact = fact(k)
print(k_fact)


bc = n_fact // (k_fact * n_k_fact)
print(bc)

k-Permutations - factorial function

import sys

if len(sys.argv) != 3:
    exit(f"Usage: {sys.argv[0]} n r")

'''
               n!
 P(n, r)  =  -----
             (n-r)!

'''

def fact(x):
    x_fact = 1
    for i in range(1, x+1):
        x_fact *= i
    return x_fact


n = int(sys.argv[1])
r = int(sys.argv[2])

n_fact = fact(n)
#print(n_fact)

n_r_fact = fact(n-r)
#print(n_r_fact)

P = n_fact // n_r_fact
print(P)

Permutations - factorial funcion

import sys

if len(sys.argv) != 2:
    exit(f"Usage: {sys.argv[0]} n")

'''
  n!
'''

n = int(sys.argv[1])

def fact(x):
    x_fact = 1
    for i in range(1, x+1):
        x_fact *= i
    return x_fact


n_fact = fact(n)
print(n_fact)

mymath module

def fact(x):
    x_fact = 1
    for i in range(1, x+1):
        x_fact *= i
    return x_fact

Permutations - module

import sys
from mymath import fact

if len(sys.argv) != 2:
    exit(f"Usage: {sys.argv[0]} n")

'''
  n!
'''

n = int(sys.argv[1])


n_fact = fact(n)
print(n_fact)

k-Permutations - module

import sys
from mymath import fact

if len(sys.argv) != 3:
    exit(f"Usage: {sys.argv[0]} n r")

'''
               n!
 P(n, r)  =  -----
             (n-r)!

'''


n = int(sys.argv[1])
r = int(sys.argv[2])

n_fact = fact(n)
#print(n_fact)

n_r_fact = fact(n-r)
#print(n_r_fact)

P = n_fact // n_r_fact
print(P)

Binomial coefficient - module

import sys
from mymath import fact

if len(sys.argv) != 3:
    exit(f"Usage: {sys.argv[0]} n k")


'''
   n         n!
   -  =   ---------
   k      k!*(n-k)!

'''

n = int(sys.argv[1])
k = int(sys.argv[2])

n_fact = fact(n)
print(n_fact)

n_k_fact = fact(n-k)
print(n_k_fact)

k_fact = fact(k)
print(k_fact)


bc = n_fact // (k_fact * n_k_fact)
print(bc)

Functions (subroutines)

Why use functions?

There are two main reasons to use functions.

One of the is code reuse. Instead of copy-paste-ing snippets of code that does the same in multiple areas in the application, we can create a function with a single copy of the code and call it from multiple location.

Having functions can also make the code easier to understand, easier to test and to maintain.

The functions are supposed to be relatively short, each function dealing with one issue, with one concern. They should have well defined input and output and without causing side-effects.

There are no clear rules, but the suggestion is that function be somewhere between 4-30 lines of code.

  • Code reuse - DRY - Don't Repeate Yourself
  • Small units of code. (One thought, single responsibility) Easier to understand, test, and maintain.

Defining simple function

  • def
  • return
def add(x, y):
    z = x + y
    return z

a = add(2, 3)
print(a)    # 5

q = add(23, 19)
print(q)   # 42

The function definition starts with the word "dev" followed by the name of the function ("add" in our example), followed by the list of parameters in a pair of parentheses, followed by a colon ":". Then the body of the function is indented to the right. The depth of indentation does not matter but it must be the same for all the lines of the function. When we stop the indentation and start a new expression on the first column, that's what tells Python that the function defintion has ended.

Passing positional parameters to a function

  • def
def sendmail(From, To, Subject, Content):
    print('From:', From)
    print('To:', To)
    print('Subject:', Subject)
    print('')
    print(Content)

sendmail('gabor@szabgab.com',
    'szabgab@gmail.com',
    'self message',
    'Has some content too')

Positional parameters.

Function parameters can be named

  • named parameter
  • keyword argument
def sendmail(From, To, Subject, Content):
    print('From:', From)
    print('To:', To)
    print('Subject:', Subject)
    print('')
    print(Content)

sendmail(
    Subject = 'self message',
    Content = 'Has some content too',
    From = 'gabor@szabgab.com',
    To = 'szabgab@gmail.com',
)

The parameters of every function can be passed either as positional parameters or as named parameters.

Mixing positional and named parameters

We have already seen several built-in functions where we mixed positional arguments with some key-value arguments.

fname = "Foo"
lname = "Bar"
animals = ["snake", "mouse", "cat", "dog"]

print(fname, lname, sep="-", end="\n\n")

by_length = sorted(animals, key=len, reverse=True)
print(by_length)

Output:

Foo-Bar

['snake', 'mouse', 'cat', 'dog']

Mixing positional and named parameters - order

We can also mix the parameters passed to any user-defined function, but we have to make sure that positional parameters always come first and named (key-value) parameter come at the end of the parameter list.

def sendmail(From, To, Subject, Content):
    print('From:', From)
    print('To:', To)
    print('Subject:', Subject)
    print('')
    print(Content)

sendmail(
    Subject = 'self message',
    Content = 'Has some content too',
    To = 'szabgab@gmail.com',
    'gabor@szabgab.com',
)
  File "examples/functions/named_and_positional_params.py", line 14
    'gabor@szabgab.com',
    ^
SyntaxError: positional argument follows keyword argument
def sendmail(From, To, Subject, Content):
    print('From:', From)
    print('To:', To)
    print('Subject:', Subject)
    print('')
    print(Content)

sendmail(
    'gabor@szabgab.com',
    Subject = 'self message',
    Content = 'Has some content too',
    To = 'szabgab@gmail.com',
)

Default values, optional parameters, optional parameters

def prompt(question, retry=3):
    print(question)
    print(retry)
    #while retry > 0:
    #    inp = input('{} ({}): '.format(question, retry))
    #    if inp == 'my secret':
    #        return True
    #    retry -= 1
    #return False

prompt("Type in your password")

prompt("Type in your secret", 1)

prompt("Hello", retry=7)

# prompt(retry=7, "Hello")  # SyntaxError: positional argument follows keyword argument

prompt(retry=42, question="Is it you?")

Output:

Type in your password
3
Type in your secret
1
Hello
7
Is it you?
42

Function parameters can have default values. In such case the parameters are optional. In the function declaration, the parameters with the default values must come last. In the call, the order among these arguments does not matter, and they are optional anyway.

Default value in first param


def add(x=2, y):
    print("OK")

Output:

  File "default_first.py", line 2
    def add(x=2, y):
            ^
SyntaxError: non-default argument follows default argument

Several defaults, using names

  • non-keyword arg after keyword arg

Parameters with defaults must come at the end of the parameter declaration.

def f(a, b=2, c=3):
    print(a, b , c)

f(1)             # 1 2 3
f(1, b=0)        # 1 0 3
f(1, c=0)        # 1 2 0
f(1, c=0, b=5)   # 1 5 0

# f(b=0, 1)
# would generate:
# SyntaxError: non-keyword arg after keyword arg

f(b=0, a=1)      # 1 0 3


def f(a=2, b):
    print(a)
    print(b)

Output:

  File "examples/functions/named_and_positional_bad.py", line 2
    def f(a=2, b):
          ^
SyntaxError: non-default argument follows default argument

There can be several parameters with default values. They are all optional and can be given in any order after the positional arguments.

Default list

# don't use complex data structures as default values
def extend_and_print(names = []):
    names.append("cat")
    print(names)


extend_and_print()
extend_and_print()
print()

def fixed(names = None):
    if names is None:
        names = []
    names.append("dog")
    print(names)


fixed()
fixed()


Output:

['cat']
['cat', 'cat']

['dog']
['dog']

Arbitrary number of arguments *

  • *args
  • tuple

The values arrive as tuple.

def mysum(*numbers):
    print(numbers)
    print(type(numbers))
    total = 0
    for s in numbers:
        total += s
    return total

from mysum import mysum

print(mysum())
print(mysum(1))
print(mysum(1, 2))
print(mysum(1, 1, 1))

x = 2
y = 7
z = 9
print(mysum(x, y, z))

Output:

()
<class 'tuple'>
0
(1,)
<class 'tuple'>
1
(1, 2)
<class 'tuple'>
3
(1, 1, 1)
<class 'tuple'>
3
(2, 3, 5, 6)
<class 'tuple'>
16

Arbitrary number of arguments passing a lists

from mysum import mysum

x = [2, 3, 5, 6]

mysum(x)

Output:

([2, 3, 5, 6],)
<class 'tuple'>
Traceback (most recent call last):
  File "/home/gabor/work/slides/python/examples/functions/sum_of_list.py", line 5, in <module>
    mysum(x)
  File "/home/gabor/work/slides/python/examples/functions/mysum.py", line 6, in mysum
    total += s
TypeError: unsupported operand type(s) for +=: 'int' and 'list'
from mysum import mysum

x = [2, 3, 5, 6]

print(mysum(*x))

Output:

(2, 3, 5, 6)
<class 'tuple'>
16

Arbitrary number of arguments passing a tuple

from mysum import mysum

z = (2, 3, 5, 6)

mysum(z)

Output:

((2, 3, 5, 6),)
<class 'tuple'>
Traceback (most recent call last):
  File "/home/gabor/work/slides/python/examples/functions/sum_of_tuple.py", line 5, in <module>
    mysum(z)
  File "/home/gabor/work/slides/python/examples/functions/mysum.py", line 6, in mysum
    total += s
TypeError: unsupported operand type(s) for +=: 'int' and 'tuple'
from mysum import mysum

z = (2, 3, 5, 6)

print(mysum(*z))

Output:

(2, 3, 5, 6)
<class 'tuple'>
16

Fixed parmeters before the others

The *numbers argument can be preceded by any number of regular arguments

def mysum(op, *numbers):
    print(numbers)
    if op == '+':
        total = 0
    elif op == '*':
        total = 1
    else:
        raise Exception('invalid operator {}'.format(op))

    for s in numbers:
        if op == '+':
            total += s
        elif op == '*':
            total *= s

    return total

print(mysum('+', 1))
print(mysum('+', 1, 2))
print(mysum('+', 1, 1, 1))
print(mysum('*', 1, 1, 1))

Output:

(1,)
1
(1, 2)
3
(1, 1, 1)
3
(1, 1, 1)
1

Pass arbitrary number of functions

  • As an advanced example we could even pass an arbitrary number of functions

def run_these(value, *functions):
    print(functions)
    for func in functions:
        print(func(value))

run_these("abc", len, lambda x: x+x,  lambda y: f"text: {y}")

Output:

(<built-in function len>, <function <lambda> at 0x7fcb4e8bedc0>, <function <lambda> at 0x7fcb4e8bee50>)
3
abcabc
text: abc

Arbitrary key-value pairs in parameters **

  • **kwargs
def f(**kw):
    print(kw)

f(a=23, b=12)
f(x=11, y=99, z=1)

Output:

{'a': 23, 'b': 12}
{'x': 11, 'y': 99, 'z': 1}

Pass a real dictionary

def func(**kw):
    print(kw)

func(a = 23,
    b = 19,)

z = {
    'c': 10,
    'd': 20,
}

func(z = z)

func(**z)

Output:

{'a': 23, 'b': 19}
{'z': {'c': 10, 'd': 20}}
{'c': 10, 'd': 20}

The dictionary contains copy

def f(**kw):
    print(kw)
    kw['a'] = 7
    print(kw)

z = 23
f(a=10, b=12)
f(a=z, y=99, z=1)
print(z)

Output:

{'a': 10, 'b': 12}
{'a': 7, 'b': 12}
{'a': 23, 'y': 99, 'z': 1}
{'a': 7, 'y': 99, 'z': 1}
23

The dictionary contains copy but NOT deep copy!

def f(**kw):
    print(kw)
    print(hex(id(kw['z'])))
    kw['z']['a'] = 7

z = {'a': 1, 'b': 2}
print(z)
print(hex(id(z)))
f(z = z)

print(z)

Output:

{'a': 1, 'b': 2}
0x7f01fd163180
{'z': {'a': 1, 'b': 2}}
0x7f01fd163180
{'a': 7, 'b': 2}

Extra key-value pairs in parameters

  • **kwargs
def f(name, **kw):
    print(name)
    print(kw)

f(name="Foo", a=23, b=12)

f(a=23, name="Bar", b=12)

Output:

Foo
{'a': 23, 'b': 12}
Bar
{'a': 23, 'b': 12}

Extra key-value pairs in parameters for email

def sendmail(From, To, Subject, Content, **header):
    print('From:', From)
    print('To:', To)
    print('Subject:', Subject)
    for field, value in header.items():
        print(f"X-{field}: {value}")
    print('')
    print(Content)

sendmail(
    Subject = 'self message',
    Content = 'Has some content too',
    From = 'gabor@szabgab.com',
    To = 'szabgab@gmail.com',

    mailer = "Python",
    signature = "My sig",
)

Output:

From: gabor@szabgab.com
To: szabgab@gmail.com
Subject: self message
X-mailer: Python
X-signature: My sig

Has some content too

Every parameter option

def f(op, count=0, *things, **kw):
    print(op)
    print(count)
    print(things)
    print(kw)

f(2, 3, 4, 5, a=23, b=12)

Output:

2
3
(4, 5)
{'a': 23, 'b': 12}

Duplicate declaration of functions (multiple signatures)

def add(x, y):
    return x*y

print(add(2, 3))  # 6

def add(x):
    return x+x

print(add(2))  # 4

add(2, 3)
# TypeError: add() takes exactly 1 argument (2 given)

Output:

4
Traceback (most recent call last):
  File "examples/functions/duplicate_add.py", line 9, in <module>
    add(2, 3)
TypeError: add() takes 1 positional argument but 2 were given

The second declaration silently overrides the first declaration.

Pylint duplicate declaration

  • pylint can find such problems, along with a bunch of others.
pylint -E duplicate_add.py

Output:

************* Module duplicate_add
examples/functions/duplicate_add.py:4:0: E0102: function already defined line 1 (function-redefined)
examples/functions/duplicate_add.py:9:0: E1121: Too many positional arguments for function call (too-many-function-args)

Return more than one value


def calc(x, y):
    a = x+y
    b = x*y
    return a, b

t = calc(4, 5)
print(t)
print(type(t))


z, q = calc(2, 3)
print(z)
print(q)



Output:

(9, 20)
<class 'tuple'>
5
6

Recursive factorial

n! = n * (n-1) ... * 1

0! = 1
n! = n * (n-1)!

f(0) = 1
f(n) = n * f(n-1)
def f(n):
    if int(n) != n or n < 0:
        raise ValueError("Bad parameter")

    if n == 0:
       return 1
    return n * f(n-1)

print(f(1))   # 1
print(f(2))   # 2
print(f(3))   # 6
print(f(4))   # 24

f(-1)

Recursive Fibonacci

fib(1) = 1
fib(2) = 1
fib(n) = fib(n-1) + fib(n-2)
def fib(n):
    if int(n) != n or n <= 0:
        raise ValueError("Bad parameter")

    if n == 1:
        return 1
    if n == 2:
        return 1
    return fib(n-1) + fib(n-2)

print(3, fib(3))    # 2
print(30, fib(30))  # 832040

fib(0.5)

Python also supports recursive functions.

Non-recursive Fibonacci

def fib(n):
    if n == 1:
        return [1]
    if n == 2:
        return [1, 1]
    fibs = [1, 1]
    for _ in range(2, n):
        fibs.append(fibs[-1] + fibs[-2])
    return fibs

print(fib(1))  # [1]
print(fib(2))  # [1, 1]
print(fib(3))  # [1, 1, 2]
print(fib(10)) # [1, 1, 2, 3, 5, 8, 13, 21, 34, 55]

Unbound recursion

  • In order to protect us from unlimited recursion, Python limits the depth of recursion:


def recursion(n):
    print(f"In recursion {n}")
    recursion(n+1)

recursion(1)

Output:

...
In recursion 995
In recursion 996
Traceback (most recent call last):
  File "recursion.py", line 7, in <module>
    recursion(1)
  File "recursion.py", line 5, in recursion
    recursion(n+1)
  File "recursion.py", line 5, in recursion
    recursion(n+1)
  File "recursion.py", line 5, in recursion
    recursion(n+1)
  [Previous line repeated 992 more times]
  File "recursion.py", line 4, in recursion
    print(f"In recursion {n}")
RecursionError: maximum recursion depth exceeded while calling a Python object

Set recurions limit

import sys

print(sys.getrecursionlimit())
sys.setrecursionlimit(10)

def recursion(n):
    print(f"In recursion {n}")
    recursion(n+1)

recursion(1)

Output:

1000
In recursion 1
In recursion 2
In recursion 3
In recursion 4
In recursion 5
In recursion 6
In recursion 7
Traceback (most recent call last):
  File "/home/gabor/work/slides/python/examples/functions/recursion_set_limit.py", line 10, in <module>
    recursion(1)
  File "/home/gabor/work/slides/python/examples/functions/recursion_set_limit.py", line 8, in recursion
    recursion(n+1)
  File "/home/gabor/work/slides/python/examples/functions/recursion_set_limit.py", line 8, in recursion
    recursion(n+1)
  File "/home/gabor/work/slides/python/examples/functions/recursion_set_limit.py", line 8, in recursion
    recursion(n+1)
  [Previous line repeated 4 more times]
  File "/home/gabor/work/slides/python/examples/functions/recursion_set_limit.py", line 7, in recursion
    print(f"In recursion {n}")
RecursionError: maximum recursion depth exceeded while calling a Python object

Variable assignment and change - Immutable

Details showed on the next slide

a = 42     # number or string
b = a      # This is a copy
print(a)   # 42
print(b)   # 42
a = 1
print(a)   # 1
print(b)   # 42

a = (1, 2)   # tuple
b = a        # this is a copy
print(a)     # (1, 2)
print(b)     # (1, 2)
# a[0] = 42  TypeError: 'tuple' object does not support item assignment
a = (3, 4, 5)
print(a)     # (3, 4, 5)
print(b)     # (1, 2)

Variable assignment and change - Mutable list

b = [5, 6]
a = b        # this is a copy of the *reference* only
             # if we change the list in a, it will
             # change the list connected to b as well
print(a)     # [5, 6]
print(b)     # [5, 6]

a[0] = 1
print(a)     # [1, 6]
print(b)     # [1, 6]

a = [7, 8]   # replace the whole list
print(a)     # [7, 8]
print(b)     # [1, 6]

Variable assignment and change - Mutabled dict

b = {'name' : 'Foo'}
a = b        # this is a copy of the *reference* only
             # if we change the dictionary in a, it will
             # change the dictionary connected to b as well
print(a)     # {'name' : 'Foo'}
print(b)     # {'name' : 'Foo'}

a['name'] = 'Jar Jar'
print(a)     # {'name' : 'Jar Jar'}
print(b)     # {'name' : 'Jar Jar'}


             # replace reference
a = {'name': 'Foo Bar'}
print(a)     # {'name': 'Foo Bar'}
print(b)     # {'name': 'Jar Jar'}

Parameter passing of functions

x = 3

def inc(n):
    n += 1
    return n

print(x)        # 3
print(inc(x))   # 4
print(x)        # 3

Passing references

numbers = [1, 2, 3]

def update(x):
    x[0] = 23

def change(y):
    y = [5, 6]
    return y

def replace_content(z):
    z[:] = [7, 8]
    return z


print(numbers)         # [1, 2, 3]

update(numbers)
print(numbers)         # [23, 2, 3]

print(change(numbers)) # [5, 6]
print(numbers)         # [23, 2, 3]


print(replace_content(numbers)) # [7, 8]
print(numbers)                  # [7, 8]

Function documentation

def f(name):
    """
    The documentation
    should have more than one lines.
    """
    print(name)


f("hello")
print(f.__doc__)

Immediately after the definition of the function, you can add a string - it can be a """ string to spread multiple lines - that will include the documentation of the function. This string can be accessed via the doc (2+2 underscores) attribute of the function. Also, if you 'import' the file - as a module - in the interactive prompt of Python, you will be able to read this documentation via the help() function. help(mydocs) or help(mydocs.f) in the above case.

Sum ARGV

import sys

def mysum(*numbers):
    print(numbers)
    total = 0
    for s in numbers:
        total += s
    return total

v = [int(x) for x in sys.argv[1:] ]
r = mysum( *v )
print(r)

Copy-paste code

a = [2, 3, 93, 18]
b = [27, 81, 11, 35]
c = [32, 105, 1]

total_a  = 0
for v in a:
    total_a += v
print("sum of a: {} average of a: {}".format(total_a, total_a / len(a)))

total_b  = 0
for v in b:
    total_b += v
print("sum of b: {} average of b: {}".format(total_b, total_b / len(b)))

total_c  = 0
for v in c:
    total_c += v
print("sum of c: {} average of c: {}".format(total_c, total_c / len(a)))


sum of a: 116 average of a: 29.0
sum of b: 154 average of b: 38.5
sum of c: 138 average of c: 34.5

Did you notice the bug?

Copy-paste code fixed

a = [2, 3, 93, 18]
b = [27, 81, 11, 35]
c = [32, 105, 1]

def calc(numbers):
    total  = 0
    for v in numbers:
        total += v
    return total, total / len(numbers)

total_a, avg_a = calc(a)
print("sum of a: {} average of a: {}".format(total_a, avg_a))

total_b, avg_b = calc(b)
print("sum of b: {} average of b: {}".format(total_b, avg_b))


total_c, avg_c = calc(c)
print("sum of c: {} average of c: {}".format(total_c, avg_c))
sum of a: 116 average of a: 29.0
sum of b: 154 average of b: 38.5
sum of c: 138 average of c: 46.0

Copy-paste code further improvement

data = {
    'a': [2, 3, 93, 18],
    'b': [27, 81, 11, 35],
    'c': [32, 105, 1],
}

def calc(numbers):
    total  = 0
    for v in numbers:
        total += v
    return total, total / len(numbers)

total = {}
avg   = {}
for name, numbers in data.items():
   total[name], avg[name] = calc(numbers)
   print("sum of {}: {} average of {}: {}".format(name, total[name], name, avg[name]))

Palindrome

An iterative and a recursive solution

def is_palindrome(s):
    if s == '':
        return True
    if s[0] == s[-1]:
        return is_palindrome(s[1:-1])
    return False

def iter_palindrome(s):
    for i in range(0, int(len(s) / 2)):
        if s[i] != s[-(i+1)]:
            return False
    return True

print(is_palindrome(''))      # True
print(is_palindrome('a'))     # True
print(is_palindrome('ab'))    # False
print(is_palindrome('aa'))    # True
print(is_palindrome('aba'))   # True
print(is_palindrome('abc'))   # False

print()
print(iter_palindrome(''))      # True
print(iter_palindrome('a'))     # True
print(iter_palindrome('ab'))    # False
print(iter_palindrome('aa'))    # True
print(iter_palindrome('aba'))   # True
print(iter_palindrome('abc'))   # False

Exit vs return vs break and continue

  • exit

  • return

  • break

  • continue

  • exit will stop your program no matter where you call it.

  • return will return from a function (it will stop the specific function only)

  • break will stop the current "while" or "for" loop

  • continue will stop the current iteration of the current "while" or "for" loop

Exercise: statistics

Create a file called statistics.py that has a function that will accept any number of numbers and return a list of values:

  • The sum
  • Average
  • Minimum
  • Maximum

Exercise: Pascal's triangle

  • Create a file called pascal_triangle.py that given a number N on the command line will print the first N rows of the Pascal's triangle.

Exercise: Pascal's triangle functions

  • Create a file called pascal_triangle_functions.py that will do exactly as the previous one, but this time make sure you have these functions:

  • A function that given a list of numbers (a row from the triangle, e.g. 1, 3, 3, 1) will return the next row (1, 4, 6, 4, 1). get_next_row

  • A function that given a depth N will return a list of the first N rows. get_triangle

  • A function that will print the triangle. print_triangle.

Exercise: recursive dependency tree

  • Create a file called recursive_dependency_tree.py

Give a bunch of files that has list of requirement in them. Process them recursively and print the resulting full list of requirements

b
c
d
e
d
f
g
$ python traversing_dependency_tree.py a

Processing a
Processing b
Processing e
Processing d
Processing c
Processing f
Processing g
Processing d

Exercise: dependency tree

  • Create a file called dependency_tree.py

That will process the files holding the dependency tree, but without recursive calls.

Exercise: Tower of Hanoi

  • Create a script called tower_of_hanoi.py providing a solution to Tower of Hanoi

There are 3 sticks. On the first stick there are n rings of different sizes. The smaller the ring the higher it is on the stick. Move over all the rings to the 3rd stick by always moving only one ring and making sure that never will there be a large ring on top of a smaller ring.

Exercise: Merge and Bubble sort

Exercise: Refactor previous solutions to use functions

  • Go over all of the previous exercises and their solutions (e.g. the games)
  • Take one (or more if you like this exercise) and change them to use functions.
  • If possible make sure you don't have any variable definitions outside of the functions and that each function has a single job to do.
  • For each case use the same filename just add at the end: with_functions.py

Exercise: Number guessing - functions

Take the number guessing game from the earlier chapter and move the internal while() loop to a function.

Solution: statistics

def stats(*numbers):
   total = 0

   average = None  # there might be better solutions here!
   minx = None
   maxx = None

   for val in numbers:
       total += val
       if minx == None:
           minx = maxx = val
       if minx > val:
           minx = val
       if maxx < val:
           maxx = val

   if len(numbers):
       average = total / len(numbers)


   return total, average, minx, maxx


ttl, avr, smallest, largest = stats(3, 5, 4)

print(ttl)
print(avr)
print(smallest)
print(largest)

Solution: Pascal triangle

import sys

if len(sys.argv) != 2:
    exit(f"Usage: {sys.argv[0]} N")

rows = int(sys.argv[1])

row = []
for current in range(0, rows):
    if row == []:
        next_row = [1]
    else:
        next_row = []
        temp_row = [0] + row + [0]
        for ix in range(len(temp_row)-1):
            next_row.append(temp_row[ix]+temp_row[ix+1])
    row = next_row
    print(row)

Solution: Pascal triangle functions

import sys

if len(sys.argv) != 2:
    exit(f"Usage: {sys.argv[0]} N")

def get_next_row(row):
    if row == []:
        next_row = [1]
    else:
        next_row = []
        temp_row = [0] + row + [0]
        for ix in range(len(temp_row)-1):
            next_row.append(temp_row[ix]+temp_row[ix+1])
    return next_row

def get_triangle(rows):
    triangle = []
    row = []
    for current in range(0, rows):
        row = get_next_row(row)
        triangle.append(row)
    return triangle

def print_triangle(triangle):
    for row in triangle:
        print(row)

triangle = get_triangle(int(sys.argv[1]))
print_triangle(triangle)

Solution: recursive

import sys
import os

if len(sys.argv) < 2:
   exit("Usage: {} NAME".format(sys.argv[0]))

start = sys.argv[1]

def get_dependencies(name):
   print("Processing {}".format(name))

   deps = set(name)
   filename = name + ".txt"
   if not os.path.exists(filename):
       return deps

   with open(filename) as fh:
       for line in fh:
           row = line.rstrip("\n")
           deps.add(row)
           deps.update( get_dependencies(row) )

   return deps

dependencies = get_dependencies(start)
print(dependencies)

Solution: Tower of Hanoi

def check():
    for loc in hanoi.keys():
        if hanoi[loc] != sorted(hanoi[loc], reverse=True):
            raise Exception(f"Incorrect order in {loc}: {hanoi[loc]}")

def move(depth, source, target, helper):
    if depth > 0:
        move(depth-1, source, helper, target)

        val = hanoi[source].pop()
        hanoi[target].append(val)
        print(f"Move {val} from {source} to {target}   Status A:{str(hanoi['A']):10}  B:{str(hanoi['B']):10}  C:{str(hanoi['C']):10}")
        check()

        move(depth-1, helper, target, source)
    check()

hanoi = {
    'A': [4, 3, 2, 1],
    'B': [],
    'C': [],
}

check()
move(len(hanoi['A']), 'A', 'C', 'B')
check()

def check():
    for loc in ['A', 'B', 'C']:
        print(f"{loc} {hanoi[loc]}", end=' ')
        if hanoi[loc] != sorted(hanoi[loc], reverse=True):
            raise Exception(f"Incorrect order in {loc}: {hanoi[loc]}")
    print('')

def move(source, target, helper):
    #if not hanoi[source]:
    #    return
    if len(hanoi[source]) == 1:
        disk = hanoi[source].pop()
        print(f"Move {disk} from {source} to {target}")
        hanoi[target].append(disk)
        return
    big_disk = hanoi[source].pop(0)   # pretend the biggest disk is not there
    move(source, helper, target)
    print(f"Move {big_disk} from {source} to {target}")
    move(helper, target, source)
    hanoi[target].insert(0, big_disk) # stop pretending
    check()


hanoi = {
    'A': [4, 3, 2, 1],
    'B': [],
    'C': [],
}

check()
move('A', 'C', 'B')
check()

Solution: Merge and Bubble sort


def bubble_sort(*values):
    values = list(values)
    for ix in range(len(values)-1):
        for jx in range(len(values)-1-ix):
            if values[jx] > values[jx+1]:
                values[jx], values[jx+1] = values[jx+1], values[jx]
    return values

print(bubble_sort(1, 2, 3))
print(bubble_sort(3, 2, 1))
print(bubble_sort(10, 9, 8, 7, 6, 5, 4, 3, 2, 1))

def iterative_bubble_sort(data):
    data = data[:]
    for end in (range(len(data)-1, 0, -1)):
        for i in range(end):
            if data[i] < data[i+1]:
                data[i], data[i+1] = data[i+1], data[i]
    return data

old = [1, 5, 2, 4, 8]
new = iterative_bubble_sort(old)
print(old)
print(new)
def recursive_bubble_sort(data):
    data = data[:]
    if len(data) == 1:
        return data

    last = data.pop()
    sorted_data = recursive_bubble_sort(data)
    for i in range(len(sorted_data)):
        if last > sorted_data[i]:
            sorted_data.insert(i, last)
            break
    else:
        sorted_data.append(last)
    return sorted_data


old = [1, 5, 2, 4, 8]
new = recursive_bubble_sort(old)
print(old)
print(new)

Modules

Goal of having modules

  • Code reuse: Allow multiple script to reuse the same function without copying the code.
  • Better code design.
  • Separation of concerns: Functions dealing with one subject are grouped together in one module.

Before modules

Let's take a very simple script that has a single, and very simple function in it.


def add(a, b):
    return a + b


z = add(2, 3)
print(z)       # 5

Create modules

A module is just a Python file with a set of functions that us usually not used by itself. For example the "my_calculator.py".

def add(a, b):
    return a + b

A user made module is loaded exactly the same way as the built-in module. The functions defined in the module are used as if they were methods with the dot-notation.

import my_calculator

z = my_calculator.add(2, 3)

print(z)  # 5

We can import specific functions to the current name space (symbol table) and then we don't need to prefix it with the name of the file every time we use it. This might be shorter writing, but if we import the same function name from two different modules then they will overwrite each other. So I usually prefer loading the module as in the previous example.

from my_calculator import add

print(add(2, 3))  # 5
  • Using with an alias
import my_calculator as calc

z = calc.add(2, 3)

print(z)  # 5

path to load modules from - The module search path

  • PYTHONPATH
  • .pth

There are several steps Python does when it searches for the location of a file to be imported, but the most important one is what we see on the next page in sys.path.

  1. The directory where the main script is located.
  2. The directories listed in PYTHONPATH environment variable.
  3. Directories of standard libraries.
  4. Directories listed in .pth files.
  5. The site-packages home of third-party extensions.

sys.path - the module search path

  • sys
  • path
import sys

print(sys.path)
['/Users/gabor/work/training/python/examples/package',
 '/Users/gabor/python/lib/python2.7/site-packages/crypto-1.1.0-py2.7.egg',
 ...
 '/Library/Python/2.7/site-packages', '/usr/local/lib/python2.7/site-packages']
[Finished in 0.112s]

Project directory layouts

  • Flat project
  • Absolute path
  • Relative path
  • Using submodules

Flat project directory structure

If our executable scripts and our modules are all in the same directory then we don't have to worry ad the directory of the script is included in the list of places where "import" is looking for the files to be imported.

project/
     script_a.py
     script_b.py
     my_module.py

Absolute path

If we would like to load a module that is not installed in one of the standard locations, but we know where it is located on our disk, we can set the "sys.path" to the absolute path to this directory. This works on the specific computer, but if you'd like to distribute the script to other computers you'll have to make sure the module to be loaded is installed in the same location or you'll have to update the script to point to the location of the module in each computer. This is not an ideal solution.

import sys

# On Linux
sys.path.insert(0, "/home/foobar/python/libs")

# On Windows
# sys.path.insert(0, r"c:\Users\FooBar\python\libs")

# import module_name

Relative path

  • file
  • dirname
  • abspath
  • sys.path
../project_root/
     bin/relative_path.py
     lib/my_module.py

We can use a directory structure that is more complex than the flat structure we had earlier. In this case the location of the modules relatively to the scripts is fixed. In this case it is "../lib". We can compute the relative path in each of our scripts. That will ensure we pick up the right module every time we run the script. Regardless of the location of the whole project tree.


def run():
    print("Hello from my_module")
import os
import sys

project_root = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
sys.path.insert(0, os.path.join(project_root, 'lib'))

import my_module
my_module.run()

Relative path explained

../project_root/
     bin/relative_path_explained.py
     lib/my_module.py
import os
import sys

print(__file__)
print(os.path.abspath(__file__))
print(os.path.dirname(os.path.abspath(__file__)))

project_root = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
print(project_root)

mypath = os.path.join(project_root, 'lib')
print(mypath)
sys.path.insert(0, mypath)

import my_module
my_module.run()

examples/project_root/bin/relative_path_explained.py
/home/gabor/work/slides/python/examples/project_root/bin/relative_path_explained.py
/home/gabor/work/slides/python/examples/project_root/bin
/home/gabor/work/slides/python/examples/project_root
/home/gabor/work/slides/python/examples/project_root/lib
Hello from my_module

Submodules

aproject/
    app.py
    mymodules/math.py
import mymodules.math
z = mymodules.math.add(2, 3)

print(z)
def add(x, y):
    return x + y

Python modules are compiled

  • pyc
  • pycache

When libraries are loaded they are automatically compiled to .pyc files. This provides moderate code-hiding and load-time speed-up. Not run-time speed-up. Starting from Python 3.2 the pyc files are saved in the __pycache__ directory.

How "import" and "from" work?

  • import
  1. Find the file to load.
  2. Compile to bytecode if necessary and save the bytecode if possible.
  3. Run the code of the file loaded.
  4. Copy names from the imported module to the importing namespace.

Execute at import time

import lib 

print("Hello")
print("import lib")

def do_something():
    print("do something")
import lib
Hello

Runtime loading of modules

The import statements in Python are executed at the point where they are located in the code. If you have some code before the import statement (print Start running) it will be executed before the importing starts.

During the importing any code that is outside of functions and classes in the imported module is executed. (print Loading mygreet).

Then you can call functions from the module (print Hello World).

Or call code that is in the importing program (print DONE).

def hello():
    print("Hello World")

print("Loading mygreet")
import mygreet
print("Start running")  # Start running

import mygreet          # Loading mygreet

print("import done")    # import done

mygreet.hello()         # Hello World

print("DONE")           # DONE

Conditional loading of modules

import random

print("Start running")
name = input("Your name:")

if name == "Foo":
    import mygreet
    mygreet.hello()
else:
    print('No loading')


print("DONE")

What is in our namespace?

print(dir())
import sys
print(dir())
from sys import argv
print(dir())

['__annotations__', '__builtins__', '__cached__', '__doc__',
      '__file__', '__loader__', '__name__', '__package__', '__spec__']

['__annotations__', '__builtins__', '__cached__', '__doc__',
      '__file__', '__loader__', '__name__', '__package__', '__spec__', 'sys']

['__annotations__', '__builtins__', '__cached__', '__doc__',
      '__file__', '__loader__', '__name__', '__package__', '__spec__', 'argv', 'sys']

Runtime import

  • We can use the name of a module that comes from an expression

Duplicate importing of functions

from mycalc import add
print(add(2, 3))  # 5

from mymath import add
print(add(2, 3))  # 6

from mycalc import add
print(add(2, 3))  # 5

The second declaration silently overrides the first declaration.

pylint can find such problems, along with a bunch of others.

pylint --disable=C  duplicate_add_from_module.py
************* Module duplicate_add_from_module
duplicate_add_from_module.py:4:0: W0404: Reimport 'add' (imported line 1) (reimported)
duplicate_add_from_module.py:7:0: W0404: Reimport 'add' (imported line 1) (reimported)

------------------------------------------------------------------
Your code has been rated at 6.67/10 (previous run: 5.00/10, +1.67)

Duplicate importing of functions - solved

import mycalc
print(mycalc.add(2, 3))  # 5

import mymath
print(mymath.add(2, 3))  # 6

import mycalc
print(mycalc.add(2, 3))  # 5

Script or library

  • main
  • name

We can have a file with all the functions implemented and then launch the run() function only if the file was executed as a stand-alone script.

def run():
    print("run in ", __name__)

print("Name space in mymodule.py ", __name__)

if __name__ == '__main__':
    run()

$ python mymodule.py
Name space in mymodule.py  __main__
run in  __main__

Script or library - import

If it is imported by another module then it won't run automatically. We have to call it manually.

import mymodule

print("Name space in import_mymodule.py ", __name__)
mymodule.run()

$ python import_mymodule.py
Name space in mymodule.py  mymodule
Name space in import_mymodule.py  __main__
run in  mymodule

Script or library - from import

from mymodule import run

print("Name space in import_mymodule.py ", __name__)
run()
$ python import_from_mymodule.py
Name space in mymodule.py  mymodule
Name space in import_mymodule.py  __main__
run in  mymodule

Scope of import

def div(a, b):
    return a/b
from __future__ import print_function
from __future__ import division

import mydiv

print(mydiv.div(3, 2))   # 1

print(3/2)               # 1.5

The importing of functions, and the changes in the behavior of the compiler are file specific. In this case the change in the behavior of division is only visible in the division.py script, but not in the mydiv.py module.

Import multiple times

import one
import two

print("Hello")
import common
print("loading one")

import common
print("loading two")
print("import common")
import common
loading one
loading two
Hello

Do not import *

  • Despite the examples you can use in various places, I'd recommend never to import "everything" using *.
from one import *
from two import *


run()
  • Where does run() come from?
  • What if both moduldes have the run() function? Then the order of the import will be important.
  • What if the one has the run() function, but a new version of two also adds one?

Exercise: Number guessing

Take the number guessing game and move the function out to a separate file and use it as a module.

Exercies: Scripts and modules

Take the number guessing game:

If I run it as a script like this: python game.py then execute the whole game. Allow the user to play several games each time with a new hidden number.

If I load it as a module, then let me call the function that runs a single game with one hidden number. For example:

import game

game.run_game()   # will generate a new hidden number

We should be able to even pass the hidden number as a parameter. Like this:

import game

game.run_game(42)

Exercise: Module my_sum

  • Create a file called my_simple_math.py with two functions: div(a, b), add(a, b), that will divide and add the two numbers respectively.

  • Add another two functions called test_div and test_add that will test the above two functions using assert.

  • Add code that will run the tests if someone execute python my_simple_math.py running the file as if it was a script.

  • Create another file called use_my_simple_math.py that will use the functions from my_math module to calculate 2 + 5 * 7

  • Make sure when you run python use_my_simple_math.py the tests won't run.

  • Add documentation to the "add" and "div" functions to examples that can be used with doctest.

  • Can you run the tests when the file is loaded as a module?

Exercise: Convert your script to module

  • Take one of your real scripts (from work or from a previous assignment). Create a backup copy.
  • Change the script so it can be import-ed as a module and then it won't automatically execute anything, but that it still works when executed as a script.
  • Add a new function to it called self_test and in that function add a few test-cases to your code using 'assert'.
  • Write another script that will load your real file as a module and will run the self_test.
  • Let me know what are the dificulties!

Exercise: Add doctests to your own code

  • Pick a module from your own code and create a backup copy. (from work)
  • Add a function called 'self_test' that uses 'assert' to test some of the real functions of the module.
  • Add code that will run the 'self_test' when the file is executed as a script.
  • Add documentation to one of the functions and convert the 'assert'-based tests to doctests.
  • Convert the mechanism that executed the 'self_test' to run the doctests as well.
  • Let me know what are the dificulties!

Solution: Module my_sum

def div(a, b):
    '''
    >>> div(8, 2)
    4
    '''
    return a/b

def add(a, b):
    '''
    >>> add(2, 2)
    4
    '''
    return a * b   # bug added on purpose!

def test_div():
    assert div(6, 3) == 2
    assert div(0, 10) == 0
    assert div(-2, 2) == -1
    #assert div(10, 0) == ??

def test_add():
    assert add(2, 2) == 4
    #assert add(1, 1) == 2


if __name__ == "__main__":
    test_div()
    test_add()
import my_simple_math
print(my_simple_math.my_sum(2, 3, 5))

print(dir(my_simple_math))
#my_sum_as_function.test_my_sum()

Loaded modules and their path

for mod in sorted(sys.modules.keys()):
    try:
        print(mod, sys.modules[mod].__file__)
    except Exception as err:
        print(mod)

Built-in modules

import sys

for mod in sys.builtin_module_names:
    print(mod)

assert to verify values

  • assert
  • raise
  • Exception
def add(x, y):
    return x * y

for x, y, z in [(2, 2, 4), (9, 2, 11), (2, 3, 5)]:
    print(f"add({x}, {y}) == {z}")
    if add(x, y) != z:
        raise Exception(f"add({x}, {y}) != {z}")
        #raise AssertionError
add(2, 2) == 4
add(9, 2) == 11
Traceback (most recent call last):
  File "examples/functions/raise_exception.py", line 7, in <module>
    raise Exception(f"add({x}, {y}) != {z}")
Exception: add(9, 2) != 11
def add(x, y):
    return x * y

for x, y, z in [(2, 2, 4), (9, 2, 11), (2, 3, 5)]:
    print(f"add({x}, {y}) == {z}")
    assert add(x, y) == z
add(2, 2) == 4
add(9, 2) == 11
Traceback (most recent call last):
  File "examples/functions/assert.py", line 6, in <module>
    assert add(x, y) == z
AssertionError

mycalc as a self testing module

  • file
import mycalc
print(mycalc.add(19, 23))
$ python use_mycalc.py
42
def test_add():
    print('Testing  {}'.format(__file__))
    assert add(1, 1) == 2
    assert add(-1, 1) == 0
    # assert add(-99, 1) == 0 # AssertionError

def add(a, b):
    return a + b

if __name__ == '__main__':
    test_add()
$ python mycalc.py
Self testing  mycalc.py

doctest

  • doctest
def fib(n):
    '''
    Before the tests
    >>> fib(3)
    2
    >>> fib(10)
    55
    >>> [fib(n) for n in range(11)]
    [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55]

    >>> fib(11)
    89
    After the tests
    '''
    values = [0, 1]

    if n == 11:
        return 'bug'

    while( n > len(values) -1 ):
        values.append(values[-1] + values[-2])
    return values[n]

#if __name__ == "__main__":
#    import doctest
#    doctest.testmod()
python -m doctest fibonacci_doctest.py
python examples/modules/fibonacci_doctest.py

**********************************************************************
File ".../examples/modules/fibonacci_doctest.py", line 12, in __main__.fib
Failed example:
    fib(11)
Expected:
    89
Got:
    'bug'
**********************************************************************
1 items had failures:
   1 of   4 in __main__.fib
***Test Failed*** 1 failures.

doctest

Export import

  • all

  • import

  • from

  • from mod import a,b,_c - import 'a', 'b', and '_c' from 'mod'

  • from mod import * - import every name listed in all of 'mod' if all is available.

  • from mod import * - import every name that does NOT start with _ (if all is not available)

  • import mod - import 'mod' and make every name in 'mod' accessible as 'mod.a', and 'mod._c'

def a():
    return "in a"

b = "value of b"

def _c():
    return "in _c"

def d():
    return "in d"
from my_module import a,b,_c

print(a())     # in a
print(b)       # value of b
print(_c())    # in _c

print(d())
# Traceback (most recent call last):
#   File ".../examples/modules/x.py", line 7, in <module>
#     print(d())
# NameError: name 'd' is not defined
from my_module import *

print(a())     # in a
print(b)       # value of b

print(d())     # in d


print(_c())

# Traceback (most recent call last):
#   File ".../examples/modules/y.py", line 9, in <module>
#     print(_c())    # in _c
# NameError: name '_c' is not defined

Export import with all

  • all
__all__ = ['a', '_c']

def a():
    return "in a"

b = "value of b"

def _c():
    return "in _c"

def d():
    return "in d"
from my_module2 import *

print(a())     # in a
print(_c())    # in _c

print(b)

# Traceback (most recent call last):
#   File ".../examples/modules/z.py", line 7, in <module>
#     print(b)       # value of b
# NameError: name 'b' is not defined

import module

import my_module

print(my_module.a())    # in a
print(my_module.b)      # value of b
print(my_module._c())   # in _c
print(my_module.d())    # in d

deep copy list

a = [
    {
      'name': 'Joe',
      'email': 'joe@examples.com',
    },
    {
      'name': 'Mary',
      'email': 'mary@examples.com',
    },
]


b = a
a[0]['phone'] = '1234'
a[0]['name'] = 'Jane'
a.append({
    'name': 'George'
})

print(a)
print(b)
[{'name': 'Jane', 'email': 'joe@examples.com', 'phone': '1234'}, {'name': 'Mary', 'email': 'mary@examples.com'}, {'name': 'George'}]
[{'name': 'Jane', 'email': 'joe@examples.com', 'phone': '1234'}, {'name': 'Mary', 'email': 'mary@examples.com'}, {'name': 'George'}]
a = [
    {
      'name': 'Joe',
      'email': 'joe@examples.com',
    },
    {
      'name': 'Mary',
      'email': 'mary@examples.com',
    },
]


b = a[:]
a[0]['phone'] = '1234'
a[0]['name'] = 'Jane'
a.append({
    'name': 'George'
})

print(a)
print(b)
[{'name': 'Jane', 'email': 'joe@examples.com', 'phone': '1234'}, {'name': 'Mary', 'email': 'mary@examples.com'}, {'name': 'George'}]
[{'name': 'Jane', 'email': 'joe@examples.com', 'phone': '1234'}, {'name': 'Mary', 'email': 'mary@examples.com'}]
from copy import deepcopy

a = [
    {
      'name': 'Joe',
      'email': 'joe@examples.com',
    },
    {
      'name': 'Mary',
      'email': 'mary@examples.com',
    },
]


b = deepcopy(a)
a[0]['phone'] = '1234'
a[0]['name'] = 'Jane'
a.append({
    'name': 'George'
})

print(a)
print(b)
[{'name': 'Jane', 'email': 'joe@examples.com', 'phone': '1234'}, {'name': 'Mary', 'email': 'mary@examples.com'}, {'name': 'George'}]
[{'name': 'Joe', 'email': 'joe@examples.com'}, {'name': 'Mary', 'email': 'mary@examples.com'}]

deep copy dictionary

a = {
    'name': 'Foo Bar',
    'grades': {
       'math': 70,
       'art' : 100,
    },
    'friends': ['Mary', 'John', 'Jane', 'George'],
}

b = a
a['grades']['math'] = 90
a['email'] = 'foo@bar.com'
print(a)
print(b)

{'name': 'Foo Bar', 'grades': {'math': 90, 'art': 100}, 'friends': ['Mary', 'John', 'Jane', 'George'], 'email': 'foo@bar.com'}
{'name': 'Foo Bar', 'grades': {'math': 90, 'art': 100}, 'friends': ['Mary', 'John', 'Jane', 'George'], 'email': 'foo@bar.com'}
  • [deepcopy](https://docs.python.org/library/copy.html#copy.deepcopy" %}
from copy import deepcopy

a = {
    'name': 'Foo Bar',
    'grades': {
       'math': 70,
       'art' : 100,
    },
    'friends': ['Mary', 'John', 'Jane', 'George'],
}

b = deepcopy(a)
a['grades']['math'] = 90
a['email'] = 'foo@bar.com'
print(a)
print(b)

{'name': 'Foo Bar', 'grades': {'math': 90, 'art': 100}, 'friends': ['Mary', 'John', 'Jane', 'George'], 'email': 'foo@bar.com'}
{'name': 'Foo Bar', 'grades': {'math': 70, 'art': 100}, 'friends': ['Mary', 'John', 'Jane', 'George']}

Python standard modules (standard packages)

Some Standard packages

math

math examples

import math

print(math.pi)            # 3.141592653589793
print(math.e)             # 2.718281828459045
print(math.sin(23))       # -0.8462204041751706

print(math.perm(3))       #  6 permutations
print(math.perm(4))       # 24 permutations
print(math.perm(4, 2))    # 12 permutations


print(math.lcm(120, 42))  # 840 least common multiple
print(math.gcd(120, 42))  #   6 greatest common divisor

sys

sys module

  • sys

  • argv

  • executable

  • path

  • version_info

  • sys

import sys

print(sys.argv) # the list of the values
    # on the command line sys.argv[0] is the name of the Python script

print(sys.executable)  # path to the python interpreter

# print(sys.path)
    # list of file-system path strings for searching for modules
    # hard-coded at compile time but can be changed via the PYTHONPATH
    # environment variable or during execution by modifying sys.path

print(sys.version_info)
# sys.version_info(major=2, minor=7, micro=12, releaselevel='final', serial=0)
# sys.version_info(major=3, minor=8, micro=2, releaselevel='final', serial=0)

print(sys.version_info.major)  # 2 or 3

print(sys.platform)    # darwin   or  linux   or  win32

['examples/sys/mysys.py']
/home/gabor/venv3/bin/python
sys.version_info(major=3, minor=9, micro=7, releaselevel='final', serial=0)
3
linux

Later we'll see also the platform module for more details of the Operating System.

Writing to standard error (stderr)

  • stdout
  • stderr
  • write
import sys

print("on stdout (Standard Output)")
print("on stderr (Standard Error)", file=sys.stderr)
sys.stderr.write("on stderr using write\n")


# x = 0
# print(1/x)

Redirection (Works on Linux/Mac/Windows):

python stderr.py > out.txt  2> err.txt
python stderr.py > all.txt 2>&1

python stderr.py 2> /dev/null            # On Linux and OSX
python stderr.py 2> nul                  # On Windows

exit prints to STDERR

import sys
if len(sys.argv) != 2:
    exit(f"Usage: {sys.argv[0]} NUMBER")

print(f"you sent in {sys.argv[1]}")

os

python which OS are we running on (os, platform)

import os
import platform

print("Name:        ", os.name)
print("System:      ", platform.system())
print("Release:     ", platform.release())
print("Architecture:", platform.architecture())
print("Machine:     ", platform.machine())
print("Processor:   ", platform.processor())
print("Release:     ", platform.release())
print("Version:     ", platform.version())

# On Windows:
# nt
# Windows
# 10


if platform.system() != 'Windows':
    print("Uname:       ", os.uname())

# On Windows uname is not available
  • Linux
Name:         posix
System:       Linux
Release:      5.13.0-37-generic
Architecture: ('64bit', 'ELF')
Machine:      x86_64
Processor:    x86_64
Release:      5.13.0-37-generic
Version:      #42-Ubuntu SMP Tue Mar 15 14:34:06 UTC 2022
Uname:        posix.uname_result(sysname='Linux', nodename='code-maven', release='5.13.0-37-generic', version='#42-Ubuntu SMP Tue Mar 15 14:34:06 UTC 2022', machine='x86_64')
  • MacOSX
Name:         posix
System:       Darwin
Release:      20.6.0
Architecture: ('64bit', '')
Machine:      x86_64
Processor:    i386
Release:      20.6.0
Version:      Darwin Kernel Version 20.6.0: Mon Aug 30 06:12:21 PDT 2021; root:xnu-7195.141.6~3/RELEASE_X86_64
Uname:        posix.uname_result(sysname='Darwin', nodename='FooBar', release='20.6.0',
              version='Darwin Kernel Version 20.6.0: Mon Aug 30 06:12:21 PDT 2021;
              root:xnu-7195.141.6~3/RELEASE_X86_64', machine='x86_64')

Current directory (getcwd, pwd, chdir)

  • getcwd
  • pwd
  • chdir
import sys
import os

to_dir = '..'
# to_dir = '/path/to/some/dir'
if len(sys.argv) == 2:
    to_dir = sys.argv[1]

current_dir = os.getcwd()
print(current_dir)

os.chdir(to_dir)

new_dir = os.getcwd()
print(new_dir)

Linux, OSX:

$ pwd

Windows: (cd without parameters prints the current working directory)

> cd

OS path

  • path
  • abspath
  • exists
  • basename
  • dirname
import sys
import os

path_to_thing = __file__
if len(sys.argv) == 2:
    path_to_thing = sys.argv[1]

print(path_to_thing)
print( os.path.basename(path_to_thing) )
print( os.path.dirname(path_to_thing) )
print( os.path.abspath(path_to_thing) )

print( os.path.exists(path_to_thing) )
print( os.path.isdir(path_to_thing) )
print( os.path.isfile(path_to_thing) )

os.path.join

  • os.path.join
  • join
import os

dirname = 'home'
subdirname = 'foo'
filename = 'work.txt'

path = f"{dirname}\\{subdirname}\\{filename}"
print(path)   # home\foo\work.txt

path = os.path.join(dirname, subdirname, filename)
print(path)

# Linux, OSX: home/foo/work.txt
# Windows:    home\foo\work.txt

Directory listing

  • dir
  • listdir
  • path
  • os.listdir
import os
import sys

if len(sys.argv) != 2:
    exit("Usage: {} directory".format(sys.argv[0]))

path = sys.argv[1]
things = os.listdir(path)

for name in things:
    print(name)
    print(os.path.join(path, name))

Directory listing using glob

  • glob
  • glob.glob
import sys
import glob

if len(sys.argv) == 2:
    exp = sys.argv[1]
    print(exp)
    items = glob.glob(exp)
    print(items)
else:
    files = glob.glob("?[abcdef]*.py")
    print(files)

    files = glob.glob("/usr/bin/*.sh")
    print(files)

Traverse directory tree - list directories recursively

  • walk
  • os.walk
import os
import sys

if len(sys.argv) != 2:
    exit("Usage: {} PATH_TO_DIRECTORY".format(sys.argv[0]))

root = sys.argv[1]

for dirname, dirs, files in os.walk(root):
    #print(dirname)     # relative path (from cwd) to the directory being processed
    #print(dirs)       # list of subdirectories in the currently processed directory
    #print(files)       # list of files in the currently processed directory

    for filename in files:
        print(os.path.join(dirname, filename))   # relative path to the "current" file

OS dir (mkdir, makedirs, remove, rmdir)

  • mkdir

  • makedirs

  • remove

  • unlink

  • rmdir

  • removedirs

  • rmtree

  • shutil

  • mkdir is like mkdir in Linux and Windows

  • makedirs is like mkdir -p in Linux

  • remove and unlink are like rm -f in Linux or del in Windows

  • rmdir is like rmdir

import os
import shutil

# create a single directory
path_to_new_dir = 'abc'
os.mkdir(path_to_new_dir)

# create also the parent directories, if needed
path_to_new_dir = 'dir/subdir/subdir'
# os.mkdir(path_to_new_dir) # will fail if 'dir' or 'dir/subdir' does not exist
os.makedirs(path_to_new_dir)


#  remove a file (both)
os.remove(path_to_file)
os.unlink(path_to_file)

# remove single empty directory
os.rmdir(path_to_dir)

# remove directory tree if there are no files in them
os.removedirs(path_to_dir)

# Remove a whole directory structure (subdirs and files)
# Like rm -rf
shutil.rmtree(path_to_dir)

expanduser - handle tilde ~ the home directory of the user

  • expanduser
  • ~
  • os.path.expanduser
import os

# The home directory of the current user
home_directory = os.path.expanduser("~")

print(home_directory)
# /home/gabor
# 'C:\\Users\\Gabor Szabo'

Get process ID

  • getpid

  • getppid

  • Works on all 3 Operating systems

import os

print(os.getpid())
print(os.getppid())
93518
92859

This is on Linux/OSX

echo $$

External command with system

  • os.system
  • system
import os

command = 'ls -l'

exit_code = os.system(command)

# $? on Linux/OSX
# %ERRORLEVEL% on Windows
print(exit_code)



exit_code = os.system('ls qqrq')
print(exit_code)
print(exit_code // 256)


exit_code = os.system('ls /root')
print(exit_code)
print(exit_code // 256)

If you wanted to list the content of a directory in an os independent way you'd use os.listdir('.') or you could use the glob.glob("*.py") function to have a subset of files.

Accessing the system environment variables from Python

  • os.environ
import os

print(os.environ['HOME']) # /Users/gabor
print(os.environ.get('HOME')) # /Users/gabor

for k in os.environ.keys():
    print("{:30} {}".format(k , os.environ[k]))

os.environ is a dictionary where the keys are the environment variables and the values are, well, the values.

Set environment variables on the fly

import os

print(os.environ.get('MYNAME'))
print(os.getenv('MYNAME'))
  • On Linux and macOS:
MYNAME=Foo python  examples/os/show_env.py

Reading the .env environment file in Python

  • .env file in the same folder where the program is.

{% embed include file="src/examples/os/.env)

import os

print(os.environ.get('MYNAME'))
print(os.getenv('MYNAME'))
pip install python-dotenv
python examples/os/read_env.py
SOME_THING=other python examples/os/read_env.py

Set env and run command

import os

os.system("echo hello")
os.system("echo $HOME")

os.system("echo Before $MY_TEST")
os.environ['MY_TEST'] = 'qqrq'
os.system("echo After $MY_TEST")

We can change the environment variables and that change will be visible in subprocesses, but once we exit from ou Python program, the change will not persist.

Pathlib

Pathlib example

from pathlib import Path

file = Path("python.json")
print(file)
print(file.__class__.__name__)  # PosixPath

Pathlib cwd

  • cwd
from pathlib import Path

cwd = Path.cwd()
print(cwd)
print(cwd.__class__.__name__)  # PosixPath

Pathlib get extension (suffix)

  • suffix
from pathlib import Path

file = Path("path/to/code.py")
print(file.suffix)  # .py
print(file.suffix.__class__.__name__)  # str

file = Path("path/to/code.yaml")
print(file.suffix)   # .yaml

file = Path("path/to/.bashrc")
print(file.suffix)   # (empty string)

folder = Path("path/to")
print(folder.suffix) #  (empty string)

Pathlib current file

  • file
from pathlib import Path

this = Path(__file__)

print(this)

Pathlib parents (dirname)

  • parent
  • patents
  • dirname
from pathlib import Path

this = Path(__file__)
print(this)
print(this.parent)       # dirname
print(this.parents[0])   # dirname (first parent)
print(this.parents[1])   # grandparent
...
print(this.parents[-1])  # /


Pathlib parts (basename)

  • basename
  • parts
from pathlib import Path

this = Path(__file__)
print(this)
print(this.parts[-1]) # (basename)
print(this.parts[0])  # /
print(this.parts)     # (each part of the path)

Pathlib exists

  • exists
from pathlib import Path

file = Path(__file__)
print(file.exists())  # True

file = Path("hello.txt")
print(file.exists())  # False


folder = Path(".")
print(folder.exists())  # True

Pathlib iterdir (flat)

  • iterdir

  • Iterate over the things (file names, folder names, etc.) in a folder.

from pathlib import Path

folder = Path(".")

for item in folder.iterdir():
    print(item)

Pathlib mkdir (makedir)

  • mkdir
  • makedir
from pathlib import Path

folder = Path("abc")
folder.mkdir()       # Creates "abc", fails if it already exists


Path("something").joinpath("else").mkdir(parents=True, exist_ok=True)
# partnes - create intermediate folders as well
# exist_ok - don't fail if folder already exists
from pathlib import Path


folder = Path("/")
print(folder)                         # /

subfolder = folder.joinpath("etc")
print(subfolder)                      # /etc

file1 = subfolder.joinpath("a.txt")
print(file1)                          # /etc/a.txt

file2 = subfolder / "b.txt"
print(file2)                          # /etc/b.txt

shutil

shutil module

  • shutil

  • cp

  • copy

  • copytree

  • move

  • rmtree

  • shutil - File Operations

import shutil

shutil.copy(source, dest)
shutil.copytree(source, dest)
shutil.move(source, dest)
shutil.rmtree(path)

time

time module

  • time
  • timezone
  • daylight
  • gmtime
  • strftime
import time

now = time.time()
print(now)             # 1351178170.85
print(type(now))       # <class 'float'>

print(time.timezone)   # -7200 = 2*60*60  (GMT + 2)
print(time.daylight)   # 1 (DST or Daylight Saving Time)

print(time.gmtime())   # time.struct_time
    # time.struct_time(tm_year=2012, tm_mon=10, tm_mday=25,
    # tm_hour=17, tm_min=25, tm_sec=34, tm_wday=3, tm_yday=299, tm_isdst=0)

ts = time.gmtime()
print(ts.tm_year) # 2012

print(time.strftime('%Y-%m-%d %H:%M:%S')) # 2012-10-25 17:16:10

timestamp = 1051178170
print(time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(timestamp))) # 2003-04-24 12:56:10
print(time.strftime('%Y-%m-%d %H:%M:%S', time.gmtime(0)))            # 1970-01-01 00:00:00

sleep in Python

  • sleep
import time

start = time.time()
print("hello " + str(start))

time.sleep(3.5)

end = time.time()
print("world " + str(end))
print("Elapsed time:" + str(end-start))
hello 1475217162.472256
world 1475217165.973437
Elapsed time:3.501181125640869

timer

More time-related examples.

import random
import time

# https://docs.python.org/3/library/time.html#time.struct_time

print(time.time())     # time since the epoch in seconds
print(time.asctime())  # current local time in human-readable format
print(time.strftime("%Y-%m-%d %H:%M:%S"))  # create your own human-readable format

print(time.gmtime(0))  # epoch
print(time.asctime(time.gmtime(0)))  # epoch in human-readable format

print(time.localtime()) # local time now
print(time.gmtime()) # time in London



print(time.process_time())
print(time.process_time_ns())

s = time.perf_counter()
ps = time.process_time()
print(time.monotonic())
time.sleep(0.1)
print(time.monotonic())
e = time.perf_counter()
for _ in range(100000):
    random.random()
pe = time.process_time()
print(s)
print(e)
print(e-s)
print(pe-ps)

# print(time.get_clock_info('monotonic'))

Current date and time datetime now

  • datetime
  • now
  • strftime
import datetime

now  = datetime.datetime.now()
print(now)             # 2015-07-02 16:28:01.762244
print(type(now))       # <type 'datetime.datetime'>

print(now.year)        # 2015
print(now.month)       # 7
print(now.day)         # 2
print(now.hour)        # 16
print(now.minute)      # 28
print(now.second)      # 1
print(now.microsecond) # 762244

print(now.strftime("%Y%m%d-%H%M%S-%f"))  # 20150702-162801-762244
print(now.strftime("%B %b %a %A"))       # July Jul Thu Thursday
print(now.strftime("%c"))                # Thu Jul  2 16:28:01 2015

Converting string to datetime (parsing date and time string)

  • strptime
import datetime

date = "2012-12-19"
some_day = datetime.datetime.strptime(date, '%Y-%m-%d') # YYYY-MM-DD
print(type(some_day))   # <type 'datetime.datetime'>
print(some_day)         # 2012-12-19

timestamp = "2013-11-04 11:23:45"  # YYYY-MM-DD HH:MM:SS
some_time = datetime.datetime.strptime(timestamp, '%Y-%m-%d %H:%M:%S')
print(type(some_time))  # <class 'datetime.datetime'>
print(some_time)        # 2013-11-04
print(some_time.minute) # 23


# Make sure you know how was the date formatted!

date = "12/3/2012"
dt = datetime.datetime.strptime(date, '%m/%d/%Y') # MM/DD/YYYY date format in USA
print(dt)   # 2012-12-03 00:00:00

dt = datetime.datetime.strptime(date, '%d/%m/%Y') # DD/MM/YYYY date format elsewher
print(dt)   # 2012-03-12 00:00:00

Parse datetime string with and without timezone information

  • strptime
  • tzinfo
import datetime

dt = datetime.datetime.strptime('Jun 7, 2022', '%b %d, %Y')
print(dt)
print(dt.tzinfo)


dt_utc = datetime.datetime.strptime('Jun 7, 2022 +0000', '%b %d, %Y %z')
print(dt_utc)
print(dt_utc.tzinfo)

datetime fromisoformat

  • fromisoformat
import datetime

dt = datetime.datetime.fromisoformat('2000-01-01')
print(dt)    # 2000-01-01 00:00:00

date fromisoformat (only date, no time)

  • date
  • fromisoformat
import datetime

date = datetime.date.fromisoformat('2000-01-01')
print(date)   # 2000-01-01

datetime arithmeticis (subtract)

  • timedelta
  • total_seconds
  • strptime
import datetime

t1 = "2013-12-29T11:23:45"
t2 = "2014-01-02T10:19:49"
dt1 = datetime.datetime.strptime(t1, '%Y-%m-%dT%H:%M:%S')
dt2 = datetime.datetime.strptime(t2, '%Y-%m-%dT%H:%M:%S')
print(dt1)      # 2013-12-29 11:23:45
print(dt2)      # 2014-01-02 10:19:49

diff = dt2-dt1
print(diff)        # 3 days, 22:56:04
print(type(diff))  # <type 'datetime.timedelta'>
print(diff.total_seconds())  # 341764.0

time_travel = dt1-dt2
print(time_travel) #  -4 days, 1:03:56
print(time_travel.total_seconds())  # -341764.0

# d = dt1+dt2
# TypeError: unsupported operand type(s) for +: 'datetime.datetime' and 'datetime.datetime'

Timezone aware datetime

  • tzinfo
import datetime

ts = "2022-12-20T11:23:45"

# Naive datetime object:
dt = datetime.datetime.strptime(ts, '%Y-%m-%dT%H:%M:%S')
now = datetime.datetime.now()
print(now)                      # 2022-12-25 22:39:39.093285
print(dt.tzinfo)                # None
print(now.tzinfo)               # None
elapsed  = now-dt
print(elapsed)                  # 5 days, 11:15:54.093285
print(elapsed.total_seconds())  # 472554.093285
print()

# (Timezone) aware datetime object:
dt_utc = datetime.datetime.strptime(f'{ts}+0000', '%Y-%m-%dT%H:%M:%S%z')
now_utc = datetime.datetime.now(datetime.timezone.utc)
print(now_utc)                     # 2022-12-25 21:39:39.093880+00:00
print(dt_utc.tzinfo)               # UTC
print(now_utc.tzinfo)              # UTC
elapsed_utc  = now_utc-dt_utc
print(elapsed_utc)                 # 5 days, 10:15:54.093880
print(elapsed_utc.total_seconds()) # 468954.09388

datetime arithmeticis (compare, sort)

import datetime

t1 = "2013-12-29T11:23:45"
t2 = "2014-01-02T10:19:49"
dt1 = datetime.datetime.strptime(t1, '%Y-%m-%dT%H:%M:%S')
dt2 = datetime.datetime.strptime(t2, '%Y-%m-%dT%H:%M:%S')
dt3 = datetime.datetime.strptime(t2, '%Y-%m-%dT%H:%M:%S')
print(dt1)      # 2013-12-29 11:23:45
print(dt2)      # 2014-01-02 10:19:49

print(dt2 > dt1)   # True
print(dt1 > dt2)   # False
print(dt2 == dt3)  # True
print(dt2 == dt1)  # False

dates = [dt2, dt1, dt3]

print(dates)
# [datetime.datetime(2014, 1, 2, 10, 19, 49), datetime.datetime(2013, 12, 29, 11, 23, 45), datetime.datetime(2014, 1, 2, 10, 19, 49)]
print(sorted(dates))
# [datetime.datetime(2013, 12, 29, 11, 23, 45), datetime.datetime(2014, 1, 2, 10, 19, 49), datetime.datetime(2014, 1, 2, 10, 19, 49)]

datetime arithmeticis (addition)

import datetime

timestamp = "2013-12-29T11:23:45"
ts = datetime.datetime.strptime(timestamp, '%Y-%m-%dT%H:%M:%S')
print(type(ts))
diff = datetime.timedelta(days = 3)
print(diff)
nts = ts + diff
print(type(nts))
print(ts)
print(nts)       # 2014-01-01 11:23:45

Rounding datetime object to nearest second (removing microseconds)

  • microseconds
  • microsecond
import datetime

# Old solution
now = datetime.datetime.now()
rounded = now - datetime.timedelta(microseconds=now.microsecond)
print(now)     # 2019-11-01 07:11:19.930974
print(rounded) # 2019-11-01 07:11:19

# A simpler solution
ts = datetime.datetime.now().replace(microsecond=0)
print(ts)      # 2019-11-01 07:11:20

Rounding datetime object to date (removing hours, minutes, seconds)

  • microsecond
  • second
  • minute
  • hour
import datetime

ts = datetime.datetime.now().replace(
    microsecond=0,
    second=0,
    minute=0,
    hour=0,
)
print(ts)      # 2023-01-19 00:00:00


Convert datetime object to date object

  • datetime
  • now
  • date
import datetime

ts = datetime.datetime.now().date()
print(ts)      # 2023-01-19


Convert datetime object to time object

  • time
import datetime

ts = datetime.datetime.now().time()
print(ts)      # 09:07:02.846346

Today (date)

  • date
  • today
import datetime

now = datetime.datetime.now()
print(now.date())
print(type(now.date()))


today = datetime.date.today()
print(today)
print(type(today))
2023-04-17
<class 'datetime.date'>
2023-04-17
<class 'datetime.date'>

subprocess

External CLI tool to demo subprocess

  • subprocess
  • call
  • execute

The process.py is a simple script we are going to use to demonstrate how an external program can be executed from within Python. It is a Python program, but you could do the exact same thing with any command-line application written in any language. We use this Python script as an example because we know you already have Python on your computer.

The external command:

import time
import sys
import os

if len(sys.argv) != 3:
    exit(f"{sys.argv[0]} SECONDS EXIT_CODE")

print(f"process ID: {os.getpid()}  parent ID: {os.getppid()}")

seconds = int(sys.argv[1])
exit_code = int(sys.argv[2])

for sec in range(seconds):
   print("OUT {}".format(sec), flush=True)
   print("ERR {}".format(sec), file=sys.stderr)
   time.sleep(1)

exit(exit_code)

Try it on the command line: python process.py 3 7

Run with os.system

import os
import sys

exit_code = os.system(f"python process.py 5 2")
print(f'exit code: {exit_code // 256}')

Output:

OUT 0
ERR 0
OUT 1
ERR 1
OUT 2
ERR 2
OUT 3
ERR 3
OUT 4
ERR 4
exit code: 2

Run external process let STDOUT and STDERR through

import subprocess
import time
import os
import psutil

def run_process(command):
    print(f"Before Popen {os.getpid()}")
    proc = subprocess.Popen(command)  # This starts runing the external process
    print(f"After Popen of {proc.pid}")
    psproc = psutil.Process(proc.pid)
    print(f"name: {psproc.name()}")
    print(f"cmdline: {psproc.cmdline()}")
    time.sleep(1.5)

    print("Before communicate")
    proc.communicate()
    print("After communicate")

    exit_code = proc.returncode
    return exit_code

print("Before run_process", flush=True)
exit_code = run_process(['python', 'process.py', '5', '0'])
print("After run_process", flush=True)

print(f'exit code: {exit_code}', flush=True)

Output:

Before run_process
Before Popen
After Popen
OUT 0
ERR 0
OUT 1
ERR 1
Before communicate
OUT 2
ERR 2
OUT 3
ERR 3
OUT 4
ERR 4
After communicate
After run_process
exit code: 0

Run external process and capture STDOUT and STDERR separately

import subprocess
import time

def run_process(command):
    print("Before Popen")
    proc = subprocess.Popen(command,
        stdout = subprocess.PIPE,
        stderr = subprocess.PIPE,
    )  # This starts runing the external process
    print("After Popen")
    time.sleep(1.5)

    print("Before communicate")
    out, err = proc.communicate()
    print("After communicate")

    # out and err are two strings
    exit_code = proc.returncode
    return exit_code, out, err

print("Before run_process")
exit_code, out, err = run_process(['python', 'process.py', '5', '0'])
print("After run_process")

print("")
print(f'exit code: {exit_code}')

print("")
print('out:')
for line in out.decode('utf8').split('\n'):
    print(line)

print('err:')
for line in err.decode('utf8').split('\n'):
    print(line)

Output:

Before run_process
Before Popen
After Popen
Before communicate
After communicate
After run_process

exit code: 0

out:
OUT 0
OUT 1
OUT 2
OUT 3
OUT 4

err:
ERR 0
ERR 1
ERR 2
ERR 3
ERR 4

Run external process and capture STDOUT and STDERR merged together

import subprocess
import time

def run_process(command):
    print("Before Popen")
    proc = subprocess.Popen(command,
        stdout = subprocess.PIPE,
        stderr = subprocess.STDOUT,
    )  # This starts runing the external process
    print("After Popen")
    time.sleep(1.5)

    print("Before communicate")
    out, err = proc.communicate()
    print("After communicate")

    # out and err are two strings
    exit_code = proc.returncode
    return exit_code, out, err

print("Before run_process")
exit_code, out, err = run_process(['python', 'process.py', '5', '0'])
print("After run_process")

print("")
print(f'exit code: {exit_code}')

print("")
print('out:')
for line in out.decode('utf8').split('\n'):
    print(line)

print('err:')
print(err)

Output:

Before run_process
Before Popen
After Popen
Before communicate
After communicate
After run_process

exit code: 0

out:
OUT 0
ERR 0
OUT 1
ERR 1
OUT 2
ERR 2
OUT 3
ERR 3
OUT 4
ERR 4

err:
None

In this case stderr will always be None.

subprocess in the background

In the previous examples we ran the external command and then waited till it finishes before doing anything else.

In some cases you might prefer to do something else while you are waiting - effectively running the process in the background. This also makes it easy to enforce a time-limit on the process. If it does not finish within a given amount of time (timeout) we raise an exception.

In this example we still collect the standard output and the standard error at the end of the process.

import subprocess
import sys
import time

def run_process(command, timeout):
    print("Before Popen")
    proc = subprocess.Popen(command,
       stdout = subprocess.PIPE,
       stderr = subprocess.PIPE,
    )
    print("After Popen")

    while True:
       poll = proc.poll()  # returns the exit code or None if the process is still running
       print(f"poll: {poll}")
       time.sleep(0.5)  # here we could actually do something useful
       timeout -= 0.5
       if timeout <= 0:
           break
       if poll is not None:
           break

    print(f"Final: {poll}")
    if poll is None:
        raise Exception("Timeout")

    exit_code = proc.returncode
    out, err = proc.communicate()
    return exit_code, out, err

exit_code, out, err = run_process([sys.executable, 'process.py', '3', '0'], 6)

print("-----")
print(f"exit_code: {exit_code}")
print("OUT")
print(out.decode())
print("ERR")
print(err.decode())

Output:

Before Popen
After Popen
poll: None
poll: None
poll: None
poll: None
poll: None
poll: None
poll: None
poll: 0
Final: 0
-----
exit_code: 0
OUT
OUT 0
OUT 1
OUT 2

ERR
ERR 0
ERR 1
ERR 2

subprocess collect output while external program is running

For this to work properly the external program might need to set the output to unbuffered. In Python by default prining to STDERR is unbuffered, but we had to pass flush=True to the print function to make it unbuffered for STDOUT as well.

import subprocess
import sys
import time

def run_process(command, timeout):
    print("Before Popen")
    proc = subprocess.Popen(command,
        stdout = subprocess.PIPE,
        stderr = subprocess.PIPE,
        universal_newlines=True,
        bufsize=0,
    )
    print("After Popen")
    out = ""
    err = ""

    while True:
        exit_code = proc.poll()
        print(f"poll: {exit_code} {time.time()}")
        this_out = proc.stdout.readline()
        this_err = proc.stderr.readline()
        print(f"out: {this_out}", end="")
        print(f"err: {this_err}", end="")
        out += this_out
        err += this_err
        time.sleep(0.5)  # here we could actually do something useful
        timeout -= 0.5
        if timeout <= 0:
            break
        if exit_code is not None:
            break

    print(f"Final: {exit_code}")
    if exit_code is None:
        raise Exception("Timeout")

    return exit_code, out, err

exit_code, out, err = run_process([sys.executable, 'process.py', '4', '3'], 20)
#exit_code, out, err = run_process(['docker-compose', 'up', '-d'], 20)

print("-----")
print(f"exit_code: {exit_code}")
print("OUT")
print(out)
print("ERR")
print(err)

Output:

Before Popen
After Popen
poll: None 1637589106.083494
out: OUT 0
err: ERR 0
poll: None 1637589106.6035957
out: OUT 1
err: ERR 1
poll: None 1637589107.6047328
out: OUT 2
err: ERR 2
poll: None 1637589108.6051855
out: OUT 3
err: ERR 3
poll: None 1637589109.6066446
out: err: poll: 0 1637589110.6227856
out: err: Final: 0
-----
exit_code: 0
OUT

ERR

Exercise: Processes

Given the following "external application":

import sys
import random
import time

def add_random(result_filename, count, wait, exception=''):
    total = 0
    for _ in range(int(count)):
        total += random.random()

    time.sleep(float(wait))
    if exception:
        raise Exception(exception)

    with open(result_filename, 'w') as fh:
        fh.write(str(total))


if __name__ == '__main__':
    if len(sys.argv) != 4 and len(sys.argv) != 5:
        exit(f"Usage: {sys.argv[0]} RESULT_FILENAME COUNT WAIT [EXCEPTION]")
    add_random(*sys.argv[1:])

It could be run with a command like this to create the a.txt file:

python examples/process/myrandom.py a.txt 3 1

Or like this, to raise an exception before creating the b.txt file:

python examples/process/myrandom.py b.txt 3 1 "bad thing"

Or it could be used like this:

from myrandom import add_random

add_random('b.txt', 2, 3)
add_random('c.txt', 2, 3, 'some error')

Write a program that will do "some work" that can be run in parallel and collect the data. Make the code work in a single process by default and allow the user to pass a number that will be the number of child processes to be used. When the child process exits it should save the results in a file and the parent process should read them in.

The "some work" can be accessing 10-20 machines using "ssh machine uptime" and creating a report from the results.

It can be fetching 10-20 URLs and reporting the size of each page.

It can be any other network intensive task.

Measure the time in both cases

Subprocess TBD

Some partially ready examples

import time
import sys
import os

if len(sys.argv) != 2:
    exit(f"Usage: {sys.argv[0]} SECONDS")

print(f"{int(time.time())} - start will take {sys.argv[1]} seconds (pid: {os.getpid()})", flush=True)
time.sleep(int(sys.argv[1]))
print(f"{int(time.time())} - started", flush=True)

while True:
    time.sleep(1)
    print(f"{int(time.time())} - running", flush=True)
#import os
#import time
#import signal
import subprocess
import sys
import time


#def test_hello():
#    run_process([sys.executable, "examples/Hello-World.py"], )
    #pid = os.fork()
    #if pid is None:
    #    raise Exception("Could not fork")

    #if pid:
    #    print(f"parent of {pid}")
    #    time.sleep(5)
    #    os.kill(pid, signal.SIGKILL)
    #else:
    #    print("child")
    #    os.environ['PYTHONPATH'] = '.'
    #    os.exec("python examples/Hello-World.py")


def run_process(command, start_timeout):
    sleep_time = 0.5
    print(command)
    proc = subprocess.Popen(command,
        stdout = subprocess.PIPE,
        stderr = subprocess.STDOUT,
        universal_newlines=True,
        bufsize=0,
    )

    out = ""
    while True:
        print("Loop")
        exit_code = proc.poll()  # returns the exit code or None if the process is still running
        if exit_code is not None:
            raise Exception("Server died")
        print(exit_code)
        this_out = proc.stdout.readline()
        #this_err = proc.stderr.readline()
        out += this_out
        print(f"Before sleep {sleep_time} for a total of {start_timeout}")
        time.sleep(sleep_time)
        start_timeout -= sleep_time
        if start_timeout <= 0:
            proc.terminate()
            raise Exception("The service has not properly started")

        if "started" in out:
            print(out)
            print("--------")
            print("It is now running")
            print("--------")
            break

    print("Do something interesting here that takes 2 seconds")
    time.sleep(2)
    proc.terminate()

    exit_code = proc.returncode
    out, _ = proc.communicate()
    return exit_code, out

print("Before")
exit_code, out = run_process([sys.executable, 'slow_starting_server.py', '3'], 4)
print("-----")
print(f"exit_code: {exit_code}")
print("OUT")
print(out)

import subprocess
import time
import os
import psutil

def run_process(*commands):
    print(f"Before Popen {os.getpid()}")
    processes = []
    for command in commands:
        proc = subprocess.Popen(command)  # This starts runing the external process
        print(f"After Popen of {proc.pid}")
        psproc = psutil.Process(proc.pid)
        print(f"name: {psproc.name()}")
        print(f"cmdline: {psproc.cmdline()}")
        processes.append(proc)

    time.sleep(1.5)

    print("Before communicate")
    for proc in processes:
        proc.communicate()
    print("After communicate")

    exit_codes = [proc.returncode for proc in processes]
    return exit_codes

print("Before run_process", flush=True)
exit_codes = run_process(
    ['python', 'process.py', '5', '0'],
    ['python', 'process.py', '4', '1'],
    ['python', 'process.py', '3', '2'],
    )
print("After run_process", flush=True)

print(f'exit code: {exit_codes}', flush=True)

Command line arguments with argparse

Command line arguments

myprog.py  data1.xls data2.xls
myprog.py --input data1.xls --output data2.xls
import argparse

parser = argparse.ArgumentParser()
parser.add_argument('--input', required=True)
parser.add_argument('--output',   help="Some description")

args = parser.parse_args()

print(f"input:   {args.input}")
print(f"output: {args.output}")


Modules to handle the command line

  • argparse

You would like to allow the user to pass arguments on the command line. For example:

myprog.py server_name name True True

myprog.py --machine server_name --test name --verbose --debug
myprog.py -v -d
myprog.py -vd
myprog.py -dv
myprog.py -v -d -m server_name
myprog.py -vdm server_name
myprog.py file1 file2 file3
myprog.py file1 file2 file3
myprog.py --machine server_name --debug file1 file2 file3
myprog.py file1 file2 file3 --machine server_name --debug

argparse

import argparse

parser = argparse.ArgumentParser()
parser.add_argument('--fname')     # optional named parameter that requires a value
parser.add_argument('--lname',   help="Some description")

parser.add_argument('--max',     help='max number of somthing', type=int) # check and convert to integer
parser.add_argument('--verbose', action='store_true') # "flag" no value is expected

parser.add_argument('--color', '-c') # short name also accepted


#parser.add_argument('files',  help="filenames(s)")   # a required positional argument
#parser.add_argument('dirs',   nargs="*")   # 0 or more positional
#parser.add_argument('places', nargs="+")   # 1 or more positional
#parser.add_argument('ords', nargs="?")   # 0 or 1 positional

parser.add_argument('--things', nargs="+")  # --things a.txt b.txt c.txt


args = parser.parse_args()

print(f"fname:   {args.fname}")
print(f"verbose: {args.verbose}")
print(f"things:  {args.things}")
print(f"color:   {args.color}")
print(f"max:     {args.max}")

if args.verbose:
    print("we are making progress....")

Basic usage of argparse

Setting up the argparse already has some (little) added value.

import argparse

parser = argparse.ArgumentParser()
parser.parse_args()

print('the code...')

Running the script without any parameter will not interfere...

$ python argparse_basic.py
the code...

If the user tries to pass some parameters on the command line, the argparse will print an error message and stop the execution.

$ python argparse_basic.py foo
usage: argparse_basic.py [-h]
argparse_basic.py: error: unrecognized arguments: foo
$ python argparse_basic.py -h
usage: argparse_basic.py [-h]

optional arguments:
  -h, --help  show this help message and exit

The minimal set up of the argparse class already provides a (minimally) useful help message.

Positional argument

import argparse

parser = argparse.ArgumentParser()
parser.add_argument('name', help='your full name')
args = parser.parse_args()

print(args.name)
$ python argparse_positional.py
usage: argparse_positional.py [-h] name
argparse_positional.py: error: too few arguments
$ python argparse_positional.py -h
usage: argparse_positional.py [-h] name

positional arguments:
  name        your full name

optional arguments:
  -h, --help  show this help message and exit
$ python argparse_positional.py Foo
Foo
$ python argparse_positional.py Foo Bar
usage: argparse_positional.py [-h] name
argparse_positional.py: error: unrecognized arguments: Bar
$ python argparse_positional.py "Foo Bar"
Foo Bar

Many positional argument

import argparse

parser = argparse.ArgumentParser()
parser.add_argument('files', help='filename(s)', nargs='+')
args = parser.parse_args()

print(args.files)

$ python argparse_positional_many.py 
usage: argparse_positional_many.py [-h] files [files ...]
argparse_positional_many.py: error: too few arguments
air:python gabor$ python argparse_positional_many.py a.txt b.txt
['a.txt', 'b.txt']

Convert to integers

import argparse

parser = argparse.ArgumentParser()
parser.add_argument('number', help='the number to take to the square')
args = parser.parse_args()

print(args.number * args.number)
$ python argparse_number.py abc
Traceback (most recent call last):
  File "examples/argparse/argparse_number.py", line 10, in <module>
    print(args.number * args.number)
TypeError: can't multiply sequence by non-int of type 'str'

Trying to the argument received from the command line as an integer, we get a TypeError. The same would happen even if a number was passed, but you could call int() on the parameter to convert to an integer. However there is a better solution.

The same with the following

$ python argparse_number.py 23
Traceback (most recent call last):
  File "examples/argparse/argparse_number.py", line 10, in <module>
    print(args.number * args.number)
TypeError: can't multiply sequence by non-int of type 'str'

Convert to integer

import argparse

parser = argparse.ArgumentParser()
parser.add_argument('number', help='the number to take to the square', type=int)
args = parser.parse_args()

print(args.number * args.number)
$ argparse_type.py abc
usage: argparse_type.py [-h] number
argparse_type.py: error: argument number: invalid int value: 'abc'

We got a much better error message as argparse already found out the argument was a string and not a number as expected.

$ python argparse_type.py 3.14
usage: argparse_type.py [-h] number
argparse_type.py: error: argument number: invalid int value: '3.14'
$ argparse_type.py 23
529

The type parameter can be used to define the type restriction and type conversion of the attributes.

Named arguments

import argparse

parser = argparse.ArgumentParser()
parser.add_argument('--color', help='The name of the color')
args = parser.parse_args()

print(args.color)

python argparse_named.py --color Blue

Blue

python argparse_named.py

None

Named parameters are optional by default. You can pass the required=True parameter to make them required.

Boolean Flags

import argparse

parser = argparse.ArgumentParser()
parser.add_argument('--color',   help='The name of the color')
parser.add_argument('--verbose', help='Print more data',
    action='store_true')
args = parser.parse_args()

print(args.color)
print(args.verbose)

python argparse_boolean.py --color Blue --verbose

Blue
True

python argparse_boolean.py

None
False

Short names

import argparse

parser = argparse.ArgumentParser()
parser.add_argument('--color', '-c', help='The name of the color')
parser.add_argument('--verbose', '-v', help='Print more data',
    action='store_true')
args = parser.parse_args()

print(args.color)
print(args.verbose)

python argparse_shortname.py -c Blue -v
python argparse_shortname.py -vc Blue

argparse print help explicitely

  • print_help
import argparse

parser = argparse.ArgumentParser()
parser.add_argument('--age', help='Your age in years', type=float, required=True)
args = parser.parse_args()

if args.age < 0:
    parser.print_help()
    exit(1)

print(args.age)

Argparse xor - mutual exlucise - only one - exactly one

import argparse

parser = argparse.ArgumentParser()
parser.add_argument('--name')

action = parser.add_mutually_exclusive_group(required=True)
action.add_argument('--add', action='store_true')
action.add_argument('--remove', action='store_true')

args = parser.parse_args()
$ python argparse_xor.py
usage: argparse_xor.py [-h] [--name NAME] (--add | --remove)
argparse_xor.py: error: one of the arguments --add --remove is required

$ python argparse_xor.py --add
$ python argparse_xor.py --remove

$ python argparse_xor.py --add --remove
usage: argparse_xor.py [-h] [--name NAME] (--add | --remove)
argparse_xor.py: error: argument --remove: not allowed with argument --add


$ python argparse_xor.py --help
usage: argparse_xor.py [-h] [--name NAME] (--add | --remove)

optional arguments:
  -h, --help   show this help message and exit
  --name NAME
  --add
  --remove

Argparse argument with default and optional value

  • nargs

  • const

  • Instead of default we use the const parameter here

  • We tell argparse that the value of the parameter is optional by nargs='?'

import argparse

parser = argparse.ArgumentParser()
parser.add_argument('--level',     help='Some level', type=int, const=10, nargs='?')
args = parser.parse_args()

print(args.level)
$ python argument_with_optional_value.py
None
$ python argument_with_optional_value.py --level
10
$ python argument_with_optional_value.py --level 20
20

Conditional required parameter with argparse

import argparse
import sys

# Python Argparse conditionally required arguments

print(sys.argv)
main_parser = argparse.ArgumentParser(add_help=False)
main_parser.add_argument('--commit',    help='Commit the downloaded data to git', action='store_true')
main_parser.add_argument('--html',      help='Generate the HTML report', action='store_true')
main_parser.add_argument('--collect',   help='Get the data from the Forem API', action='store_true')
main_args, _ = main_parser.parse_known_args()

#print(main_args)
print(main_args.commit)
print(main_args.html)
print(main_args.collect)
print(sys.argv)

parser = argparse.ArgumentParser(parents=[main_parser])
if main_args.collect:
    parser.add_argument('--username',  help='The username on the Forem site', required=main_args.collect)
    parser.add_argument('--host',      help='The hostname of the Forem site', required=main_args.collect)
    parser.add_argument('--limit',     help='Max number of pages to fetch', type=int)

args = parser.parse_args()



print(args.collect)
if args.collect:
    print(args.username)
    print(args.host)
    print(args.limit)

print(args.html)
print(args.commit)

Exercise: Command line parameters

Take the code from the color selector exercise in the files section and change it so the user can supply the name of the file where the colors are listed using the --file filename option.

If the user supplies an incorrect color name (which is not listed among the accepted colors) give an error message and stop execution.

Allow the user to supply a flag called --force that will override the color-name-validity checking and will allow any color name.

Exercise: argparse positional and named

Create a script that can accept any number of filenames, the named parameter --machine and the flag --verbose. Like this:

python ex.py file1 file2 file3 --machine MACHINE --verbose

Other

import argparse
import datetime

def vali_date(text: str) -> datetime.datetime:
    #return datetime.datetime.strptime(text, "%Y-%m-%d")
    try:
        return datetime.datetime.strptime(text, "%Y-%m-%d")
    except ValueError:
        raise argparse.ArgumentTypeError(f"This {text!r} is not a valid date.")

parser = argparse.ArgumentParser()
parser.add_argument(
        "--date",
        help = "Date in format YYYY-MM-DD",
        required = True,
        type = vali_date
)

args = parser.parse_args()
print(args.date)
import argparse

def is_age(age: str) -> float:
    try:
        new_age = float(age)
    except ValueError:
        raise argparse.ArgumentTypeError(f"This: {age!r} is not a valid number.")

    if new_age < 0:
        raise argparse.ArgumentTypeError(f"It must be a non-negative number. We received {age!r} ")

    return new_age

parser = argparse.ArgumentParser()
parser.add_argument("--age", type=is_age, required=True)

args = parser.parse_args()
print(args.age)
import getpass

secret = getpass.getpass()
print(secret)

JSON

JSON - JavaScript Object Notation

  • json

JSON is basically the data format used by JavaScript. Because its universal availability it became the de-facto standard for data communication between many different languages. Most dynamic languages have an fairly good mapping between JSON and their own data structures. Lists and dictionaries in the case of Python.

Documentation of the Python json library.

Examples:

{% embed include file="src/examples/json/data.json)

JSON dumps

  • dumps

  • Dictionaries and lists are handles

  • Tuples are indistinguishable from lists

  • Always Double-quotes

  • null instead of None

  • No trailing comma

import json

data = {
  "fname" : 'Foo',
  "lname" : 'Bar',
  "email" : None,
  "children" : [
     "Moo",
     "Koo",
     "Roo",
  ],
  "fixed": ("a", "b"),
}
print(data)

json_str = json.dumps(data)
print(json_str)

with open('data.json', 'w') as fh:
    fh.write(json_str)
{'fname': 'Foo', 'lname': 'Bar', 'email': None, 'children': ['Moo', 'Koo', 'Roo'], 'fixed': ('a', 'b')}
{"fname": "Foo", "lname": "Bar", "email": null, "children": ["Moo", "Koo", "Roo"], "fixed": ["a", "b"]}

dumps can be used to take a Python data structure and generate a string in JSON format. That string can then be saved in a file, inserted in a database, or sent over the wire.

JSON loads

  • loads
import json

with open('data.json') as fh:
    json_str = fh.read()

print(json_str)
data = json.loads(json_str)
print(data)
{"fname": "Foo", "lname": "Bar", "email": null, "children": ["Moo", "Koo", "Roo"], "fixed": ["a", "b"]}
{'fname': 'Foo', 'lname': 'Bar', 'email': None, 'children': ['Moo', 'Koo', 'Roo'], 'fixed': ['a', 'b']}

dump

  • dump
import json

data = {
    "fname" : 'Foo',
    "lname" : 'Bar',
    "email" : None,
    "children" : [
        "Moo",
        "Koo",
        "Roo",
    ],
}

print(data)

with open('data.json', 'w') as fh:
    json.dump(data, fh)

As a special case dump will save the string in a file or in other stream.

load

  • load
import json

with open('data.json', 'r') as fh:
    data = json.load(fh)
print(data)

Round trip

  • loads
  • dumps
import json
import os
import time
import sys

if len(sys.argv) != 2:
    exit("Usage: {sys.argv[0]}  NAME")

data = {
    'name': [],
    'time': [],
}
filename = 'mydata.json'

if os.path.exists(filename):
    with open(filename) as fh:
        json_str = fh.read()
        # print(json_str)
        data = json.loads(json_str)

data['name'].append(sys.argv[1])
data['time'].append(time.time())



with open(filename, 'w') as fh:
   json_str = json.dumps(data, indent=4)
   fh.write(json_str)

Pretty print JSON

import json

data = {
    "name" : "Foo Bar",
    "grades" : [23, 47, 99, 11],
    "children" : {
        "Peti Bar" : {
            "email": "peti@bar.com",
        },
        "Jenny Bar" : {
            "phone": "12345",
        },
    }
}

print(data)
print(json.dumps(data))
print(json.dumps(data, indent=4, separators=(',', ': ')))
{'name': 'Foo Bar', 'grades': [23, 47, 99, 11], 'children': {'Peti Bar': {'email': 'peti@bar.com'}, 'Jenny Bar': {'phone': '12345'}}}
{"name": "Foo Bar", "grades": [23, 47, 99, 11], "children": {"Peti Bar": {"email": "peti@bar.com"}, "Jenny Bar": {"phone": "12345"}}}
{
    "name": "Foo Bar",
    "grades": [
        23,
        47,
        99,
        11
    ],
    "children": {
        "Peti Bar": {
            "email": "peti@bar.com"
        },
        "Jenny Bar": {
            "phone": "12345"
        }
    }
}

Serialize Datetime objects in JSON

Sort keys in JSON

import json

data = {
    "name" : "Foo Bar",
    "grades" : [23, 47, 99, 11],
    "children" : {
        "Peti Bar" : {
            "email": "peti@bar.com",
        },
        "Jenny Bar" : {
            "phone": "12345",
        },
    }
}

print(json.dumps(data, sort_keys=True, indent=4, separators=(',', ': ')))

{
    "children": {
        "Jenny Bar": {
            "phone": "12345"
        },
        "Peti Bar": {
            "email": "peti@bar.com"
        }
    },
    "grades": [
        23,
        47,
        99,
        11
    ],
    "name": "Foo Bar"
}

Set order of keys in JSON - OrderedDict

  • collections
  • OrderedDict
from collections import OrderedDict
import json

d = {}
d['a'] = 1
d['b'] = 2
d['c'] = 3
d['d'] = 4

planned_order = ('b', 'c', 'd', 'a')
e = OrderedDict(sorted(d.items(), key=lambda x: planned_order.index(x[0])))
print(e)

out = json.dumps(e, sort_keys=False, indent=4, separators=(',', ': '))
print(out)

print('-----')

# Create index to value mapping dictionary from a list of values
planned_order = ('b', 'c', 'd', 'a')
plan = dict(zip(planned_order, range(len(planned_order))))
print(plan)

f = OrderedDict(sorted(d.items(), key=lambda x: plan[x[0]]))
print(f)
out = json.dumps(f, sort_keys=False, indent=4, separators=(',', ': '))
print(out)

OrderedDict([('b', 2), ('c', 3), ('d', 4), ('a', 1)])
{
    "b": 2,
    "c": 3,
    "d": 4,
    "a": 1
}
-----
{'b': 0, 'c': 1, 'd': 2, 'a': 3}
OrderedDict([('b', 2), ('c', 3), ('d', 4), ('a', 1)])
{
    "b": 2,
    "c": 3,
    "d": 4,
    "a": 1
}

Exercise: Counter in JSON

Write a script that will provide several counters. The user can provide an argument on the command line and the script will increment and display that counter. Keep the current values of the counters in a single JSON file. The script should behave like this:

$ python counter.py foo
1

$ python counter.py foo
2

$ python counter.py bar
1

$ python counter.py foo
3
  • Extend the exercise so if the user provides the --list flag then all the indexes are listed (and no counting is done).
  • Extend the exercise so if the user provides the --delete foo parameter then the counter foo is removed.

Exercise: Phone book in JSON

Write a script that acts as a phonebook. As "database" use a file in JSON format.

$ python phone.py Foo 123
Foo added

$ python phone.py Bar
Bar is not in the phnebook

$ python phone.py Bar 456
Bar added

$ python phone.py Bar
456

$ python phone.py Foo
123
  • If the user provides Bar 123 save 123 for Bar.
  • If the user provides Bar 456 tell the user Bar already has a phone number.
  • To update a phone-number the user must provide --update Bar 456
  • To remove a name the user must provide --delete Bar
  • To list all the names the user can provide --list

Solution: Counter in JSON

import json
import sys
import os

filename = 'counter.json'

if len(sys.argv) != 2:
    print("Usage: " + sys.argv[0] + " COUNTER")
    exit()

counter = {}

if os.path.exists(filename):
    with open(filename) as fh:
        json_str = fh.read()
        counter = json.loads(json_str)

name = sys.argv[1]
if name in counter:
    counter[name] += 1
else:
    counter[name] = 1

print(counter[name])


with open(filename, 'w') as fh:
    json_str = json.dumps(counter)
    fh.write(json_str)

Solution: Phone book

import sys
import json
import os

def main():
    filename = 'phonebook.json'
    phonebook = {}
    if os.path.exists(filename):
        with open(filename) as fh:
            json_str = fh.read()
            phonebook = json.loads(json_str)

    if len(sys.argv) == 2:
        name = sys.argv[1]
        if name in phonebook:
            print(phonebook[name])
        else:
            print("{} is not in the phonebook".format(name))
        return

    if len(sys.argv) == 3:
        name = sys.argv[1]
        phone = sys.argv[2]
        phonebook[name] = phone
        with open(filename, 'w') as fh:
            json_str = json.dumps(phonebook)
            fh.write(json_str)
        return

    print("Invalid number of parameters")
    print("Usage: {} username [phone]".format(sys.argv[0]))

if __name__ == '__main__':
    main()

YAML

YAML - YAML Ain't Markup Language

Read YAML

  • load
  • Loader

# A comment

Course:
  Language:
    Name: Ladino
    IETF BCP 47: lad
  For speakers of:
    Name: English
    IETF BCP 47: en
  Special characters: []

Modules:
  - basic/
  - words/
  - verbs/
  - grammar/
  - names/
  - sentences/
import yaml

filename = "data.yaml"

with open(filename) as fh:
    data = yaml.load(fh, Loader=yaml.Loader)

print(data)

Write YAML

  • dump
  • Dumper
import yaml

filename = "out.yaml"

data = {
    "name": "Foo Bar",
    "children": ["Kid1", "Kid2", "Kid3"],
    "car": None,
    "code": 42,
}


with open(filename, 'w') as fh:
    yaml.dump(data, fh, Dumper=yaml.Dumper)

car: null
children:
- Kid1
- Kid2
- Kid3
code: 42
name: Foo Bar

Exercise: Counter in YAML

Exactly like the same exercise in the JSON chapter, but use a YAML file as the "database".

Exercise: Phone book in YAML

Exactly like the same exercise in the JSON chapter, but use a YAML file as the "database".

Solution: Counter in YAML

import sys
import os
import yaml

filename = "counter.yaml"

if len(sys.argv) > 2:
    exit(f"Usage: {sys.argv[0]} [NAME]")


counter = {}
if os.path.exists(filename):
    with open(filename) as fh:
        counter = yaml.load(fh, Loader=yaml.Loader)



if len(sys.argv) == 1:
    if counter:
        for key, value in counter.items():
            print("{key} {value}")
    else:
        print("No counters were found")
    exit()

name = sys.argv[1]

if name not in counter:
    counter[name] = 0
counter[name] += 1
print(counter[name])

with open(filename, 'w') as fh:
    yaml.dump(counter, fh, Dumper=yaml.Dumper)

Exception handling

0
1
3
def read_and_divide(filename):
    print("before " + filename)
    with open(filename, 'r') as fh:
        number = int(fh.readline())
        print(100 / number)
    print("after  " + filename)
import sys
import module

files = sys.argv[1:]

for filename in files:
    try:
        module.read_and_divide(filename)
    except Exception as err:
        print(f"  There was a problem in '{filename}'", file=sys.stderr)
        print(f"  Text: {err}", file=sys.stderr)
        print(f"  Name: {type(err).__name__}", file=sys.stderr)
    print('')


before one.txt
100.0
after  one.txt

before zero.txt
  There was a problem in 'zero.txt'
  Text: division by zero
  Name: ZeroDivisionError

before two.txt
  There was a problem in 'two.txt'
  Text: [Errno 2] No such file or directory: 'two.txt'
  Name: FileNotFoundError

before three.txt
33.333333333333336
after  three.txt