If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

### Course: AP®︎/College Computer Science Principles>Unit 4

Lesson 4: Parallel and distributed computing

# Parallel computing

One way to speed up some types of algorithms is to use parallel computing to spread the algorithm across multiple processors running simultaneously.

## Sequential computing

The standard programming model is sequential computing: the computer executes each operation of the program in order, one at a time.
Consider this program that process images and reports how many are cats:
images ← [ "pet1.jpg", "pet2.jpg", "pet3.jpg", "pet4.jpg"]
numCats ← 0
FOR EACH image IN images
{
foundCat ← detectCat(image)
IF foundCat
{
numCats ← numCats + 1
}
}
The program begins with a list of filenames and a variable to store the number of images of cats, initialized to 0. A loop iterates through the list, repeatedly calling a function that detects whether the image has a cat and updating the variable.
Here's a visualization of the program executing on 4 images:
Now let's analyze the program further, by annotating each operation with how many seconds it takes and how many times it's called:
TimeCallsOperation
31images ← [ "pet1.jpg", "pet2.jpg", "pet3.jpg", "pet4.jpg"]
11numCats ← 0
14FOR EACH image IN images
104 foundCat ← detectCat(image)
14 IF foundCat
24 numCats ← numCats + 1
Note: the values in the time column are made up, the actual time would vary based on the implementation of this pseudocode and the hardware running it.
We can calculate the total time taken by multiplying the number of seconds for each operation by the number of calls and summing the results. For this program, that'd be:
$\left(3×1\right)+\left(1×1\right)+\left(1×4\right)+\left(10×4\right)+\left(1×4\right)+\left(2×4\right)=60$
This timeline visualizes the time taken by the computer:

## Parallel computing

The sequential model assumes that only one operation can be executed at a time, and that is true of a single computer with a single processor. However, most modern computers have multi-core processors, where each core can independently execute an operation.
Multi-core processors can take advantage of parallel computing, a computational model that breaks programs into smaller sequential operations and performs those smaller operations in parallel.
Can we modify the cat detection program so that some of its operations can be executed in parallel? Only if there are operations that are not dependent on each other.
🤔 Take a look at the program again and see if you can identify operations that don't need to happen sequentially.
For this program, we could parallelize the operations inside the loop, since the detectCat function does not depend on the results from other calls to detectCat. That means telling the computer that these lines of code can be executed in parallel:
    foundCat ← detectCat(image)
IF foundCat
{
numCats ← numCats + 1
}
The exact mechanisms for parallelizing a program depend on the particular programming language and computing environment, which we won't dive into here.
Here's a visualization of the program executing those operations in parallel:
Now how long will the program take to execute? Remember the analysis from before:
TimeCallsOperation
31images ← [ "pet1.jpg", "pet2.jpg", "pet3.jpg", "pet4.jpg"]
11numCats ← 0
14FOR EACH image IN images
104 foundCat ← detectCat(image)
14 IF foundCat
24 numCats ← numCats + 1
The non-parallelized operations take the same amount of time as before:
$\left(3×1\right)+\left(1×1\right)=4$
The time taken by the parallelized operations depends on the number of parallel processors executing operations at once. If I run this program on a computer with 4 cores, then each core can process one of the 4 images, so the parallel portion will only take as long as a single images takes:
$1+10+1+2=14$
This timeline visualizes the time taken by a computer that splits the program across 4 processes:

## Comparing sequential vs. parallel execution

One way to evaluate the benefits of parallelizing a program is to measure the speedup: the ratio of the time taken to run the program sequentially to the time taken to run the parallelized program.
Our example cat detection program takes $60$ seconds to run sequentially. When run in parallel on four processors, with each image requiring $14$ seconds, the program takes $18$ seconds to run. We calculate the speedup by dividing $60$ by $18$:
$60/18=3.\stackrel{―}{33}$
We do not achieve a speedup of exactly $4$, despite using that many processors. The initial sequential portion still takes the same amount of time even if we have infinitely many processors. So, this initial sequential portion limits the maximum speedup of parallel computing.
Parallelized programs typically run on input sizes in the thousands or millions. If we ran this program on thousands of images, the initial sequential part would take a tiny fraction of time compared to the parallelized part.
When run in parallel on two processors, with each image requiring $14$ seconds, the program takes $32$ seconds. What is the speedup?

### Variable execution times

The operations inside the parallelized program segments may not always take the same amount of time, especially if they process program input.
For example, the detectCat function in our example may be using an algorithm whose time varies based on the image size or complexity. The operation may only take $10$ seconds on one image but $22$ seconds on another.
This timeline visualizes that situation:
The total time taken has now increased to 30 seconds, as the parallel portion takes as long as the longest of the parallelized operations.
The original fully sequential program took 60 seconds. If the parallelized program takes 30 seconds, what is the speedup?

To compensate for the fact that there's often variation in execution times, the program can be clever about how it sends work to processors. Instead of pre-allocating tasks to processes, it can just send the next task to whichever processor is ready and waiting.
For example, imagine we ran the cat detection program on seven images and the detectCat() operation took a long time on the third image. The program doesn't need to wait on that to complete before it analyzes the seventh image; it can send that image to a free processor instead:
When run on seven images, the sequential program takes 102 seconds. If the parallelized program takes 32 seconds, what is the speedup?
(You can enter the speedup as a fraction)

🙋🏽🙋🏻‍♀️🙋🏿‍♂️Do you have any questions about this topic? We'd love to answer—just ask in the questions area below!

## Want to join the conversation?

• How is a program parallelized?
• From the author:Great question, it depends on the programming language and environment. What programming languages are you familiar with?
• How do you make the cat detection program?
• Likely with AI image recognition.
I'd recommend this series if you want to learn more :D
https://www.youtube.com/watch?v=aircAruvnKk&list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi
(don't click the link, copy the URL into the address bar)
(1 vote)
• Which programing language is best for parallel computing?
• It's hard to point to a single, best language. It really depends on what you want to do. For low-level concurrent programming, you could use C/C++ or Rust. For high-level concurrent programming, you can try Python or Go.
• How come in the "Sequential Computing" section, it says the "numCats ← numCats + 1" operation is called 4 times even though there was only 1 cat in the example? Doesn't the code automatically stop/exit at "IF foundCat" line for images where a cat isn't detected?
• The code will not automatically stop/exit; rather, the code will evaluate the conditional expression as false and will then continue to the next iteration of the loop without executing "numCats ← numCats + 1".

Also, it seems that the two examples are disparate. If the timetable given related to the example above it, then you are right in your assumption that "numCats ← numCats + 1" would only execute once, but in this case the example and table seem to be separate entities.
• On the "check your understanding," how is the time taken to run the program sequentially 60 seconds? My understanding was that the initial sequential part which is unchanged in both programs is 18 seconds (32 total sec parallel - 14 sec on the images parallel), and since there are two images to be processed sequentially, each taking 14 seconds, the total time taken to run the program sequentially would be 18 + 2(14) = 46 seconds. Please correct me if I'm wrong.
(1 vote)
• The check your understanding is based on the cat detection program written above it which was shown to take 60 seconds to run sequentially. When parallelized on two processors, the time is equal to 4 (for the array initialization and variable assignment at the beginning) + 2 (for 4 images split between 2 processors) * 14 (the amount of time to process one image) = 32 seconds.