Why can you speed up different tasks using hyperthreading, but not identical tasks?

I guess that's because inside a physical core there is individual processor for different types of job, for example, there is a processor for arithmetic, there is processor for displaying output. So, if assuming there are 2 arithmetic task with one core (which only has one arithmetic processor), the core can only execute the task sequentially. Please correct me if I'm wrong.

Why when i run the program, differents outputs of how many cats are found? (with 1,2,4 i got 20 cats found but with 8 threads i got 22)

Everytime you click on the start processing button it will load a new set of pictures, its possible that you simply got more cats in one run through then the other. Another possibility is a bug. If you overload your system the program seems to start having trouble counting, I just managed to get 69 cats detected despite there only being 44 pictures.

Main content

Course: AP®︎/College Computer Science Principles > Unit 4

Lesson 4: Parallel and distributed computing

Try parallel computing yourself

Google Classroom

Now that we've discussed parallel computing in theory, let's actually see it in action. Normally, when an engineer wants to run a parallel computing solution, they will use dedicated high-performance computers.

However, thanks to modern web technology, we can also do parallel computing in our browser. That's right, you can watch tasks perform in parallel from the comfort of your own home or classroom. Ready?

Visit the link below to try a parallelized cat detection program:

👉🏽 Cat detection

Configuring the program

The goal of a parallel computing solution is to improve efficiency. It's helpful to have parameters that we can change and observe the effects.

This program provides two parameters:

Number of worker threads: In order to execute tasks in parallel, this program is using a browser technology called web workers. The webpage detects how many
your computer can run concurrently based on what your hardware reports and suggests using that many workers. However, it also lets you try fewer workers so that you can see the effect on the speedup.
Number of images: Generally, there's a bigger benefit from parallel processing on larger data sets, so the program defaults to processing the max number of images. If you'd like, you can ask it to process fewer images and observe the difference in performance.

Monitoring the execution

Watching a parallel program execute is like watching a relay race. How long will the program take? Which worker will complete the most work the fastest? It's very exciting.

You can watch the workers progress in the chart on the webpage. The program starts off with a short setup, a sequential portion of initializing the images array and queuing up the tasks. Then the workers are off to the races!

On many computers, you can also monitor your CPU activity at the same time so that you can see how your CPU is being utilized and how the work is spread across the cores of your CPU.

Here's what my laptop reports when the program runs with four workers:

Once the workers start going, the CPU history shows that usage shoots up to 100% across the 4 cores. The activity monitor shows that Chrome processes are using more than 320% of the CPU (each core has its own 100%) and system processes are using the rest.

When I'm just using my laptop to write an article, the activity monitor typically reports that most of the CPU is not being utilized. This parallelized program is definitely putting it to work.

Calculating speedup

Exactly how much more efficient is this program when it's run in parallel? Let's find out by calculating the speedup: the ratio of the time taken to run the program sequentially to the time taken to run the parallelized program. Since we have the option to try the program with varying numbers of parallel workers (as much as our hardware allows), we can calculate the speedup per each number of workers.

First, we run the program with the maximum number of images for each number of workers and record the duration each time.

Here are four runs from my laptop:

Workers	Duration (seconds)
$1$ ‍	$53.91$ ‍
$2$ ‍	$32.95$ ‍
$3$ ‍	$28.81$ ‍
$4$ ‍	$27.66$ ‍

Running the program sequentially is basically the same as running the program with a single worker, so we can calculate the speedup by dividing the first duration by each of the other durations.

Workers	Duration (seconds)	Speedup
$1$ ‍	$53.91$ ‍	$1$ ‍
$2$ ‍	$32.95$ ‍	$(53.91 / 32.95) = 1.64$ ‍
$3$ ‍	$28.81$ ‍	$(53.91 / 28.81) = 1.87$ ‍
$4$ ‍	$27.66$ ‍	$(53.91 / 27.66) = 1.95$ ‍

We can also graph the speedup to visualize how it changes as the number of workers increases:

🔍 Try this from the computer you're using now. How do the results compare? If there are big differences, what do you think is responsible for those differences?

Factors that affect performance

My computer got close to a 2x speedup but nowhere near a 4x speedup, which is what we might have expected with 4 workers. Why not?

There are many factors that can affect the amount of time the computer takes to complete the program:

Hyperthreading

Even though my computer reports that it can run four threads concurrently, I discovered that my CPU only has two cores:

Screenshot of Apple system information screen with title of "Hardware Overview" and the following table:

Column	Value
Model Name:	MacBook Pro
Model Identifier:	MacBookPro14,2
Processor Name:	Intel Core i5
Processor Speed:	3.1 GHz
Number of Processors:	1
Total Number of Cores:	2

Hardware details from my Apple laptop system overview

Those two cores use a technology called hyperthreading, however. Intel invented hyperthreading to enable a single CPU core to run two threads concurrently. Since Intel is a very popular manufacturer of CPUs, many personal computers now come with hyperthreaded CPUs.

Hyperthreading works well when two threads are doing different kinds of computation. For example, one task could be doing arithmetic operations while the other task is processing input. Those two tasks are utilizing different parts of the CPU and can be sped up by hyperthreading. However, if two tasks are running identical instructions, hyperthreading can't speed them up.

The fact that my laptop has only two (hyperthreaded) physical cores is the most likely explanation for why the speedup approaches two but never gets close to four.

🔍 If you see similar behavior on your machine, do a little investigation to find out how many physical cores the CPU has.

Other CPU activity

When this program runs from a web browser on a computer, it's competing for CPU time with other processes.

Before I started the program on my laptop, the CPU was already running over 400 processes, a mix of system processes and user applications:

Screenshot from Apple Activity Monitor. The center shows an area chart titled "CPU Load".

The left side displays this table:

Column	Value
System:	7.76%
User:	15.27%
Idle:	76.97%

The right side displays this table:

Column	Value
Threads:	2968
Processes:	484

It might be confusing to hear that a computer with 2 cores can run over 400 processes at once. Most of the time, when a computer runs multiple processes "at once", it's actually switching rapidly between them, so quickly that the user doesn't notice. When a computer runs two processes truly in parallel, then it no longer needs to switch between them.

The program can't complete as quickly when the CPU is also executing instructions from other processes, but it's hard to know exactly how much the program's duration is affected. That uncertainty affects our speedup measurements, since the run with 4 workers might have been more or less affected by other CPU activity than the run with 1 worker.

🔍 For the most accurate measurements, quit as many other applications as possible and wait until your CPU monitor shows very low levels of activity. Then hit that button and see what happens when more of your computer's CPU resources are freed up to work on the program.

User interface updates

The webpage that runs this program includes many visual elements: the constantly updating chart, the images and their loading indicators, the status text. Whenever a webpage needs to update a visual element, the CPU is doing work to calculate the new pixels and render them to the screen. That additional work slows down the execution time.

As an experiment, I disabled the UI updates in the program and saw the duration go from 30 seconds to 22 seconds, a significant decrease.

🔍 Try for yourself on this UI-less version of the cat detection program.

Improving the performance

Now that we've thoroughly explored the performance of this parallelized program, we have a better idea how to improve the performance. If we were running this program in a production environment, like for a company or research project, then we might make these changes:

Use hardware with as many physical CPU cores as possible. More physical cores means more tasks that can truly run in parallel.
Run the program on a dedicated machine, a computer that isn't running other user processes. It will still be running a few systems processes to keep the operating systems running, but nowhere near as many as a typical home computer runs.
Run the program from the command line, not a webpage. Eliminating the graphical user interface removes the need for any UI updates.

🤔 What other ideas do you have for improving the program?

🙋🏽🙋🏻‍♀️🙋🏿‍♂️Do you have any questions about this topic? We'd love to answer—just ask in the questions area below!

Want to join the conversation?

Sort by:

layaz7717
Posted 4 years ago. Direct link to layaz7717's post “Why can you speed up diff...”
Why can you speed up different tasks using hyperthreading, but not identical tasks?
Button navigates to signup pageButton navigates to signup page
(3 votes)
Answer
- Dzaka H. Athif
  Posted 4 years ago. Direct link to Dzaka H. Athif's post “I guess that's because in...”
  I guess that's because inside a physical core there is individual processor for different types of job, for example, there is a processor for arithmetic, there is processor for displaying output. So, if assuming there are 2 arithmetic task with one core (which only has one arithmetic processor), the core can only execute the task sequentially. Please correct me if I'm wrong.
  Button navigates to signup page
  (5 votes)
Big Daniel
Posted a year ago. Direct link to Big Daniel's post “is it possible for a comp...”
is it possible for a computer to have 2(or more) CPUs in it?
Button navigates to signup pageComment on Big Daniel's post “is it possible for a comp...”
(5 votes)
Answer
trungdo3224
Posted 4 years ago. Direct link to trungdo3224's post “Why when i run the progra...”
Why when i run the program, differents outputs of how many cats are found? (with 1,2,4 i got 20 cats found but with 8 threads i got 22)
Button navigates to signup pageButton navigates to signup page
(1 vote)
Answer
- Martin
  Posted 4 years ago. Direct link to Martin's post “Everytime you click on th...”
  Everytime you click on the start processing button it will load a new set of pictures, its possible that you simply got more cats in one run through then the other.
  Another possibility is a bug. If you overload your system the program seems to start having trouble counting, I just managed to get 69 cats detected despite there only being 44 pictures.
  Button navigates to signup page
  (8 votes)
Gideon
Posted 6 months ago. Direct link to Gideon's post “My MacBook Pro (M2 Max) r...”
My MacBook Pro (M2 Max) reports that it can run 8 threads now, and the total number of cores is 12.
So what's the main difference between 2 cores (4 threads) and 12 cores (8 threads) ?
Button navigates to signup pageComment on Gideon's post “My MacBook Pro (M2 Max) r...”
(2 votes)
Answer
layaz7717
Posted 4 years ago. Direct link to layaz7717's post “How would someone set up ...”
How would someone set up a cat detector? That sounds very complex, given the wide variation in cat appearances and the similarities between many species (like domestic cats versus tigers).
Button navigates to signup pageComment on layaz7717's post “How would someone set up ...”
(1 vote)
Answer
- Astro8333
  Posted 7 months ago. Direct link to Astro8333's post “It's mostly done using AI...”
  It's mostly done using AI. It does a lot of comparison. It will take an image, and compare it to other images in its database. The way it compares is by taking sections of the image, such as the nose, and comparing it (pixel by pixel) to the other images, as the nose for cats is usually the same. The way it compares it to a lion is that it takes the background and compares it, it will also into account the size of the animal. (in comparison to the background). And also, it looks at the different lion images in its database, and will determine if it looks more like a cat or a lion.
  Button navigates to signup page
  (1 vote)
emmett.parker
Posted a month ago. Direct link to emmett.parker's post “so why does this happen w...”
so why does this happen when gordons are sigmas?
Button navigates to signup pageButton navigates to signup page
(0 votes)
Answer
hani issa
Posted a year ago. Direct link to hani issa's post “Hyperthreading is cheatin...”
Hyperthreading is cheating
Button navigates to signup pageComment on hani issa's post “Hyperthreading is cheatin...”
(0 votes)
Answer