Main content
AP®︎/College Computer Science Principles
Course: AP®︎/College Computer Science Principles > Unit 4
Lesson 4: Parallel and distributed computingTry parallel computing yourself
Now that we've discussed parallel computing in theory, let's actually see it in action. Normally, when an engineer wants to run a parallel computing solution, they will use dedicated high-performance computers.
However, thanks to modern web technology, we can also do parallel computing in our browser. That's right, you can watch tasks perform in parallel from the comfort of your own home or classroom. Ready?
Visit the link below to try a parallelized cat detection program:
Configuring the program
The goal of a parallel computing solution is to improve efficiency. It's helpful to have parameters that we can change and observe the effects.
This program provides two parameters:
- Number of worker threads: In order to execute tasks in parallel, this program is using a browser technology called web workers. The webpage detects how many your computer can run concurrently based on what your hardware reports and suggests using that many workers. However, it also lets you try fewer workers so that you can see the effect on the speedup.
- Number of images: Generally, there's a bigger benefit from parallel processing on larger data sets, so the program defaults to processing the max number of images. If you'd like, you can ask it to process fewer images and observe the difference in performance.
Monitoring the execution
Watching a parallel program execute is like watching a relay race. How long will the program take? Which worker will complete the most work the fastest? It's very exciting.
You can watch the workers progress in the chart on the webpage. The program starts off with a short setup, a sequential portion of initializing the images array and queuing up the tasks. Then the workers are off to the races!
On many computers, you can also monitor your CPU activity at the same time so that you can see how your CPU is being utilized and how the work is spread across the cores of your CPU.
Here's what my laptop reports when the program runs with four workers:
Once the workers start going, the CPU history shows that usage shoots up to 100% across the 4 cores. The activity monitor shows that Chrome processes are using more than 320% of the CPU (each core has its own 100%) and system processes are using the rest.
When I'm just using my laptop to write an article, the activity monitor typically reports that most of the CPU is not being utilized. This parallelized program is definitely putting it to work.
Calculating speedup
Exactly how much more efficient is this program when it's run in parallel? Let's find out by calculating the speedup: the ratio of the time taken to run the program sequentially to the time taken to run the parallelized program. Since we have the option to try the program with varying numbers of parallel workers (as much as our hardware allows), we can calculate the speedup per each number of workers.
First, we run the program with the maximum number of images for each number of workers and record the duration each time.
Here are four runs from my laptop:
Workers | Duration (seconds) |
---|---|
1 | 53, point, 91 |
2 | 32, point, 95 |
3 | 28, point, 81 |
4 | 27, point, 66 |
Running the program sequentially is basically the same as running the program with a single worker, so we can calculate the speedup by dividing the first duration by each of the other durations.
Workers | Duration (seconds) | Speedup |
---|---|---|
1 | 53, point, 91 | 1 |
2 | 32, point, 95 | left parenthesis, 53, point, 91, slash, 32, point, 95, right parenthesis, equals, 1, point, 64 |
3 | 28, point, 81 | left parenthesis, 53, point, 91, slash, 28, point, 81, right parenthesis, equals, 1, point, 87 |
4 | 27, point, 66 | left parenthesis, 53, point, 91, slash, 27, point, 66, right parenthesis, equals, 1, point, 95 |
We can also graph the speedup to visualize how it changes as the number of workers increases:
🔍 Try this from the computer you're using now. How do the results compare? If there are big differences, what do you think is responsible for those differences?
Factors that affect performance
My computer got close to a 2x speedup but nowhere near a 4x speedup, which is what we might have expected with 4 workers. Why not?
There are many factors that can affect the amount of time the computer takes to complete the program:
Hyperthreading
Even though my computer reports that it can run four threads concurrently, I discovered that my CPU only has two cores:
Those two cores use a technology called hyperthreading, however. Intel invented hyperthreading to enable a single CPU core to run two threads concurrently. Since Intel is a very popular manufacturer of CPUs, many personal computers now come with hyperthreaded CPUs.
Hyperthreading works well when two threads are doing different kinds of computation. For example, one task could be doing arithmetic operations while the other task is processing input. Those two tasks are utilizing different parts of the CPU and can be sped up by hyperthreading. However, if two tasks are running identical instructions, hyperthreading can't speed them up.
The fact that my laptop has only two (hyperthreaded) physical cores is the most likely explanation for why the speedup approaches two but never gets close to four.
🔍 If you see similar behavior on your machine, do a little investigation to find out how many physical cores the CPU has.
Other CPU activity
When this program runs from a web browser on a computer, it's competing for CPU time with other processes.
Before I started the program on my laptop, the CPU was already running over 400 processes, a mix of system processes and user applications:
It might be confusing to hear that a computer with 2 cores can run over 400 processes at once. Most of the time, when a computer runs multiple processes "at once", it's actually switching rapidly between them, so quickly that the user doesn't notice. When a computer runs two processes truly in parallel, then it no longer needs to switch between them.
The program can't complete as quickly when the CPU is also executing instructions from other processes, but it's hard to know exactly how much the program's duration is affected. That uncertainty affects our speedup measurements, since the run with 4 workers might have been more or less affected by other CPU activity than the run with 1 worker.
🔍 For the most accurate measurements, quit as many other applications as possible and wait until your CPU monitor shows very low levels of activity. Then hit that button and see what happens when more of your computer's CPU resources are freed up to work on the program.
User interface updates
The webpage that runs this program includes many visual elements: the constantly updating chart, the images and their loading indicators, the status text. Whenever a webpage needs to update a visual element, the CPU is doing work to calculate the new pixels and render them to the screen. That additional work slows down the execution time.
As an experiment, I disabled the UI updates in the program and saw the duration go from 30 seconds to 22 seconds, a significant decrease.
🔍 Try for yourself on this UI-less version of the cat detection program.
Improving the performance
Now that we've thoroughly explored the performance of this parallelized program, we have a better idea how to improve the performance. If we were running this program in a production environment, like for a company or research project, then we might make these changes:
- Use hardware with as many physical CPU cores as possible. More physical cores means more tasks that can truly run in parallel.
- Run the program on a dedicated machine, a computer that isn't running other user processes. It will still be running a few systems processes to keep the operating systems running, but nowhere near as many as a typical home computer runs.
- Run the program from the command line, not a webpage. Eliminating the graphical user interface removes the need for any UI updates.
🤔 What other ideas do you have for improving the program?
🙋🏽🙋🏻♀️🙋🏿♂️Do you have any questions about this topic? We'd love to answer—just ask in the questions area below!
Want to join the conversation?
- Why when i run the program, differents outputs of how many cats are found? (with 1,2,4 i got 20 cats found but with 8 threads i got 22)(1 vote)
- Everytime you click on the start processing button it will load a new set of pictures, its possible that you simply got more cats in one run through then the other.
Another possibility is a bug. If you overload your system the program seems to start having trouble counting, I just managed to get 69 cats detected despite there only being 44 pictures.(4 votes)
- Why can you speed up different tasks using hyperthreading, but not identical tasks?(1 vote)
- I guess that's because inside a physical core there is individual processor for different types of job, for example, there is a processor for arithmetic, there is processor for displaying output. So, if assuming there are 2 arithmetic task with one core (which only has one arithmetic processor), the core can only execute the task sequentially. Please correct me if I'm wrong.(3 votes)
- is it possible for a computer to have 2(or more) CPUs in it?(1 vote)
- How would someone set up a cat detector? That sounds very complex, given the wide variation in cat appearances and the similarities between many species (like domestic cats versus tigers).(0 votes)