- multiprocess
- Pool
- map
Multiprocess N files: Pool
In this example we "analyze" files by counting how many characters they have, how many digits, and how many spaces.
Analyze N files in parallel.
examples/multiprocess/multiprocess_files.py
import multiprocessing as mp import os import sys def analyze(filename): print("Process {:>5} analyzing {}".format(os.getpid(), filename)) digits = 0 letters = 0 spaces = 0 other = 0 total = 0 with open(filename) as fh: for line in fh: for char in line: total += 1 if char.isdigit(): digits += 1 break if char.isalnum(): letters += 1 break if char == ' ': spaces += 1 break other += 1 return { 'filename': filename, 'total': total, 'digits': digits, 'spaces': spaces, 'letters': letters, 'other': other, } def main(): if len(sys.argv) < 3: exit(f"Usage: {sys.argv[0]} POOL_SIZE FILEs") size = int(sys.argv[1]) files = sys.argv[2:] with mp.Pool(size) as pool: results = pool.map(analyze, files) for res in results: print(res) if __name__ == '__main__': main()
$ python multiprocess_files.py 3 multiprocess_*.py
Process 12093 analyzing multiprocess_files.py Process 12093 analyzing multiprocess_pool_async.py Process 12095 analyzing multiprocess_load.py Process 12094 analyzing multiprocessing_and_logging.py Process 12094 analyzing multiprocess_pool.py {'filename': 'multiprocess_files.py', 'total': 47, 'digits': 0, 'spaces': 37, 'letters': 6, 'other': 4} {'filename': 'multiprocessing_and_logging.py', 'total': 45, 'digits': 0, 'spaces': 27, 'letters': 11, 'other': 7} {'filename': 'multiprocess_load.py', 'total': 32, 'digits': 0, 'spaces': 20, 'letters': 7, 'other': 5} {'filename': 'multiprocess_pool_async.py', 'total': 30, 'digits': 0, 'spaces': 16, 'letters': 6, 'other': 8} {'filename': 'multiprocess_pool.py', 'total': 21, 'digits': 0, 'spaces': 11, 'letters': 6, 'other': 4}
We asked it to use 3 processes, so looking at the process ID you can see one of them worked twice.
The returned results can be any Python datastructure. A dictionary is usually a good idea.