Cloud City’s Senior Developer Pamela McA’Nulty delivered a talk at PyCon 2019 on Saturday, May 4th based on her previous blog post on multiprocessing in Python.
Feeling inspired from seeing all the #trashtag content on social media over the past month, but don’t know where to start? We have just the solution! But first, some backstory…
“Some people, when confronted with a problem, think ‘I know, I’ll use multithreading’. Nothhw tpe yawrve o oblems.” (Eiríkr Åsheim, 2012)
If multithreading is so problematic, though, how do we take advantage of systems with 8, 16, 32, and even thousands, of separate CPUs? When presented with large Data Science and HPC data sets, how to you use all of that lovely CPU power without getting in your own way? How do you tightly coordinate the use of resources and processing power needed by servers, monitors, and Internet of Things applications - where there can be a lot of waiting for I/O, many distinct but interrelated operations, and non-sharable resources - and you still have to crank through the preprocessing of data before you send it somewhere?